Lately I’ve been thinking about data maturity models within large organizations, and how to measure maturity and ability to use data. Specifically, what it means to be data literate or data fluent or whatever buzzword is used to mean “hip and with it like the cool kids” when it comes to being able to collect, sort, order, and use data to the greatest extent possible. To help think this through, I’m putting down a few thoughts here. My goal is to figure out ways to personally use data better and help the same in others.

What is Maturity?

I have not fond memories of CMMI maturity models for software engineering being applied by VCs to companies I worked for. I’m sure at some scale this model is helpful, but for the ~100 person dev shop in the ~$200M/year company it had a of effort trying to figure out stages of maturity to equate that to how well the company’s engineering was doing. It seems to measure the wrong things (lines of code, whether code review was performed, whether programmers had formal training, etc) whether they just existed instead of what those processes accomplished. So the joke was, among a few programmers, that they raised their score by just doing a code review when the person doing it wasn’t even familiar with code or underlying code base.

This isn’t intended to rehash programmer grunt work of yore, but to say that models can be helpful, but the aren’t what I’m trying to understand when I wonder how to improve my org’s capability to use data. I’m trying to figure out how to find ways to identify curiosity and potential ability and improve that. Also, I’m always curious as to whether training programs work, so I want to think of ways to measure training working.

So I do read through a lot of material on maturity models (CMMI comically, DAMA, US Data Governance Playbook, Gartner has lots, but this is all I could find publicly).

How About Competence Instead of Maturity?

So if my goal is really about doing more “good things” with data, maybe it’s better to measure competence or ability. Then I use that as a proxy for maturity.

Breaking this down further into types of competence:

  • Individual data competence means things that a person or group can do. Maybe this is data literacy. Being able to find a book to read, read it, and do the thing I learn from the book.
  • Enterprise data competence means things the organization needs to be able to do, and do. Maybe this is maturity. Building libraries, sponsoring the arts, using Dewey Decimal Classification.

Big, Imaginary Survey

So if I could survey everyone who matters in an organization, and have complete response (or at least enough to be statistically significant, although I’m no statistician), then I could have a proxy baseline of where we are. And maybe identify some areas to start improving.

“Everyone who matters” is hard to define, but philosophically would mean everyone who has the spark of potential to work with data (eg, read in raw data and calculate important values, link data together, predict values, etc) and everyone who needs to be able to accurately interpret and use data (eg, pick the right metric, use the right chart in their powerpoint, etc). So it’s already a really big percentage of most organizations, but I expect that not everyone really cares that much. But the idea is it’s not just the machine learning wizards who we need to know.

Organizational Questions

  • Do you know whose job it is to help you with any data related problems? No idea my current baseline, but think the target should be 100%
  • Who is that? Unknown baseline, but target should be 50% figuring some people won’t be able to list that on the spot.
  • Are you confident that they will help you? Unknown baseline, but think target is maybe 80% since there are lots of diverse data problems and it’s probably impossible to help with everything.
  • How many times have they helped you? Unknown baseline. Target should be 75% having one or more helps since we want everyone doing data stuff, even it’s just finding the report server.
  • How many times have they tried and failed to help you? Unknown baseline. Target should also be 75% having one or more fails, figuring that just as many failures are needed to have good helps. Next question tries to suss out healthy failures vs. malignant failures.
  • How many tries do you think were productive? A productive failure is one that creates validated learning, helps with future attempts, or otherwise benefits the workplace by being tried and known. Unknown baseline. Target is 100% where the answer is the sum of the tries question.
  • Please point to the source code, blog post, knowledge base, or other evidence of the most recent time they helped you. Unknown baseline. Target is 100% of the previous question. Providing an example helps encourage knowledge collaboration and filter out low effort responses, I think.

Individual Questions

  • Are you confident that you, personally, can gather and use enough data to either help make a decision, or to specify the statement of objectives for a consultant or service to do it for you? Unknown, guessing is 1%. Target is 100%.
  • Are there things you need that would increase the likelihood of you answering the previous question positively? Please list them… Unknown baseline, but guessing 100%. This one high is bad. True target is probably 25% since there are likely false positives of people being unaware of available training, services, tools, etc. Don’t want it to be zero, as there’s always new stuff coming out and we want to perpetually scan for new things using customer input.

Conclusion

This helps me a bit because the concept of data literacy as something everyone has was weird to me, since it’s hard to teach people to read if there are no libraries and printing presses. I don’t think everyone will be a viz swat team. There are some things the organization needs to be competent in, that enables literacy. We need both to thrive, but I worry that if we focus on getting all the individuals better, and then lack the organization infrastructure necessary, it’s not going to be as useful as doing both.

I also worry that we will create crappy organization stuff. Crappy is highly subjective, bu I think we can balance it by following human-centered design principles in a way that is auditable to customers. It’s weird how anti-HCD it is to say “I do that” and then not let anyone see the inputs and interstitial analyses and products that exist as part of building. Or even more kafka-esque, saying “we do that” but the only way to see the design material is to be lucky enough to know who is planning projects and lobby them to spend a bunch of time telling stories. I think a good measure of maturity is the visibility that comes from seeing skyscrapers going up for new, wonderful libraries and focusing literacy training in those neighborhoods.

Stuff I read while working on this post