The four principles of Data Mesh

Blog series | From Data to Business Value with Data Mesh | #2

Authors: Nicola Storz & Yves Brise 

"For once, you don't have to license a new product" was the good news in our first blog in the series on Data Mesh. IT managers love to hear that. The benefits it promises are also appealing: bringing more data drive into the organization and realizing ROI on IT infrastructure investments. But what is data mesh now if it is not a product?

Data Mesh is an organizational concept for data and for the organization that manages the data. It comes from Thoughtworks [Data Mesh, ISBN: 978-1492092391] and is similar in principle to the Domain Driven Design approach that has been in use in software development for some time. The basic idea: decentralization of data, maximum technological support by a platform and minimum central governance to ensure interoperability and achieve scale-out for data. 

Four principles make up the Data Mesh

Data Mesh is currently one of the most discussed hype terms in IT and especially in the data and analytics contexts. This is reason enough for us to shed some light on it. Regardless of the technology, a data mesh is based on four principles, which we explain here and illustrate with an example (

Data Mesh Prinzipien_en_final.jpg
Image 1: Data Mesh Principles

1. Domain ownership - decentralization and distributed responsibility

The responsibility for the data is to be taken over by the domain and no longer lies with a central DWH or data engineering team. The operational domain teams themselves take responsibility for their data and make it available. In an enterprise, this allows the sales team to make its sales data available to additional teams beyond its domain. 

The data is made available for two purposes: First, for operational use in non-domain, operational systems, and second, for analytical purposes.

2. Data as a product - product thinking for data

In order to apply product thinking to data, the data product of a domain must fulfill certain quality characteristics. Data is no longer treated as raw material that is obtained via ETL, cleansed, and then provided in a structured manner. The data product must be trustworthy, well-structured, understandable across domain boundaries, and findable. 

For our sales team, this means that they group the sales data by sales contract, set the contract ID as the key, and structure the contract details in an understandable way. They guarantee the quality and timeliness of the data per SLA. For example, they can be associated with the customer data in an analytical use case.

3. Self Service

To ensure that data exchange between a large number of teams scales, the infrastructure for providing the data and the data products themselves must be ready. Self-services are central to the infrastructure and the data products themselves.

When provisioning sales data, the sales team in our example can self-service directly to order the platform infrastructure, use it to develop their data product, and offer the product for consumption. The Customer Analytics team finds the sales data in the data catalog and can order it via self-service request. They can correlate the data with the customer data to calculate the sales potential per customer from this data and in turn offer this value again as a data product. It is important to give users the right tools that fit their needs and skill sets.

4. Federated governance

To ensure that the first three principles mesh seamlessly, certain standards must be adhered to. These are discussed and enforced by a cross-domain committee. These include requirements for the structure and interfaces of data products, naming conventions, versioning, documentation, and data sensitivity. Access to the data is secured via policies.  

For our sales team, this means that the data product must be made available via a standard description (e.g. Async API specification). In this specification, the functional data owner, the domain "Sales" and the naming must be specified. The customer reference "Customer ID" is then given the same name by the customer domain as in the sales domain. This allows the Customer Analytics team to later correlate this data to calculate sales potential.

The four principles enable domain teams to take ownership of their data and share it between different domains. The usefulness of existing data can be increased by providing data products that are of good quality and can be correlated.

The implications for the organization are considerable

In software development, the term DevOps has become well known. Software teams are responsible not only for the development, but also for the operation and support of their software. In Data Mesh, there is basically the same approach with data ( Applying this principle to data has become known as DataOps. Data Mesh is in some ways an evolution of this and allows DataOps to scale.

The organization must provide two things for Data Mesh to be successfully implemented:

  1. A central platform team that provides the tools for the DataOps teams and automates as much of it as possible (principle 3). Furthermore, this team is a good place to anchor federated governance (principle 4).

  2. Capable domain teams with the right skills to define and deliver their own data products (Principle 1 and Principle 2).

Image 1: Data Mesh Domains

Next up: How do I put this into practice?

We explained what a Data Mesh is in this second post of the blog series. The four principles of Data Mesh are catchy and simple. How the value proposition from the first blog can now actually be realized and technologically implemented is something we will soon be looking at in blog 3. In Blog 4, we will also report on concrete experiences that our customers have had with Data Mesh.

Blog series: 

#1 Blog series Data Mesh: Blog 1

#2 Blog series Data Mesh: Blog 2

#3 Blog series Data Mesh: Blog 3

#4 Blog series Data Mesh: Blog 4


Your ipt experts

We look forward to hearing from you