The Missing Piece to Democratize Data Is Not A Catalog
As of the nineties, with the advent of Business Intelligence, organizations do attempt to install data driven decision making and do aim to get insights in the hands of employees.
More recently, with the introduction of self-service BI and data discovery, these organizations start maximizing the number or data consumers, and decreasing the dependency on a central data team.
Unfortunately preparing and modeling the data, still requires this same central data team, which at a certain scale can not be expected to master every business domain, business goals and source application data models. Making them as dependent on the business and IT teams as well.
Nowadays, we are surfing a third wave of data democratization, one where organizations also maximize the number of data producers. Because data lives throughout your entire organization, this implies that everyone could become both a data consumer and producer, feeding back the insights they create for others to reuse, or elaborate upon.
By tightening the collaboration between business and data experts into data product teams, organizations remove the bottleneck of a central data team and, at the same time, improve the quality of data by incorporating the business logic. As this introduces many data handovers, we see the emerging need of applying product thinking to data, which has the goal to streamline the collaboration of all these people, while keeping costs, security and many more aspects under control.
What is a data product?
Applying product thinking to data leads to the term data products. But this is where it becomes tricky: there is not yet a consensus on what a data product is. Or as Sanjeev Mohan, a former Research VP, Big Data & Advanced Analytics did describe it as follows: “everyone in my circles had a different take on what a data product is”.
Just to provide a few examples:
- Jean-Georges Perrin of Bitol describes a data product as a bag of data contracts
- Jochen Christ of datamesh manager defines a data product as a logical unit that contains all components to process and store domain data for analytical or data-intensive use cases and makes them available to other teams and output ports
- Gartner refers to a data products as a set of data, metadata, semantics and templates.
I myself bluntly summarize it as “the data and everything you need to independently make use of it”. Which also implies, but bluntly ignores, that you need to be able to create the data product, hence that you can process data.
Regardless of the definition, even regardless of the angle of the provider of the definition, the term data product always implies combining technical and business ownership, shifting responsibilities left, early in the development chain and empowering data consumers to consume the data product.
This is happening right now…
Data product thinking is emerging right now. Some organizations go all in on data mesh, others start pragmatically with shifting the responsibility of data ingestion to IT teams. Rather than having the central data team to ingest data from all operational systems, the IT teams, building and maintaining those tools, become responsible to share their modeled data with the central data team.
Without truly maximizing the number of data producers, these source oriented data products can bring value by stabilizing data pipelines. This might even be the end state for smaller organizations in which central data teams can still grasp the entire business.
… But something is missing
Governing an increased number of data producers, possibly federated throughout the organization, has proven to be a challenge. Firstly, data products created across the entire organization, need to be made discoverable to everyone. Current data catalogs however are centered around data assets and their fields, not around data products.
Secondly, product thinking shifts the responsibility of capturing metadata to earlier in the development chain. This is where data contracts come in, which allow you to provide metadata up-front. These data contracts do not restrict themselves to schema metadata, but can also includes SLA information, versioning and many more. Again, the granular level of information in these data contracts conflicts with the one of data catalogs, and might require another location to be displayed.
Thirdly, as a data product is the combination of data, metadata, and business logic, this overview location should allow you to navigate easily between all these entities. Again something what current solutions are not designed for.
These three challenges lead to organizations struggling to govern data products. And then we did not even discuss processes. How do you govern who is able to create a new data product? How can someone request access to a data product? Or to tools and infra to build one? And how can someone approve and implement such a request?
These processes risk becoming the new bottleneck, because again they heavily rely on central teams to build or govern solutions. Automations is key to avoid this pitfall and to enable data product owners to smoothly take control on their responsibilities.
And that’s the data product portal!
A new category of tools is bursting on the scene: data product catalogs. We rather refer to it as our data product portal. This subtle change emphasizes that such a tool does not only provide an overview, but allows you to navigate across all tools of your data landscape.
At its core, our portal provides you with the overview of existing data products and their datasets, and manages the processes around them: access request and approval, create a new one, … Yet through the power of integrations, you can
- Navigate to a data catalog to visit more in depth descriptions
- Be directed to your data product development toolbox, like Conveyor, to build and run data products and even manage infrastructure
- Manage access and get previews, or even full access to the data
The is the missing piece of the puzzle to scale data product development across your organization and strangely enough it is the central piece, maybe even your starting point. One more reason to offer the data product portal as an open source initiative. Check our the repo and start adopting data product thinking today!