The ideology of data mesh suggests that the current data estate challenges faced by enterprises cannot be solved by throwing more technology at them.

DATA MESHThe solution lies in reorganizing the three major players in the enterprise: the people, the processes, and the tools.

Thoughtworks, calls data mesh a “socio-technical” paradigm. How does a large enterprise, having complex interconnected systems, stay agile and yet derive maximum value out of its data estate?

Data mesh proposes that enterprises need to start looking at their data estate from a unique perspective encapsulated in the four foundational principles. The principles are elaborated in the subsequent sections to provide insight into the data mesh construct. These principles are inter-related, and they should be understood in the same sequence as outlined.

The four data mesh principles:

Decentralized ownership of data

his principle is primarily focused around the people and the processes. It calls for a mindset shift in the enterprise, a shift from considering analytical data as an asset that should be stored to considering analytical data as a product that should be served.

The domains should consider analytical data as a first-class product rather than considering it a by-product of their business operations. They should also apply all the aspects of product development to make it valuable, useful, reliable, and customer-focused.

A few broader-level aspects need to be understood while considering the principle of data as a product:

  • A data product is an autonomous architectural quantum that forms the fundamental building block of the data mesh.
  • A domain can have one or more data products.
  • Data products should interoperate; it can consume outputs from other data products and produce its own output. Eventually, multiple data products interacting with one another will form a mesh of data products.
  • All the technical plumbing such as sourcing data, data modeling, ETL etc. will be abstracted under the data product. The data product team will have authority to design and implement the technical solution.
  • Some data products will be aligned to source systems or operational systems. For example, in retail banking industry, current account savings accounts (CASA) can become a data product. CASA data product can produce output such as real-time account transactions, account balances, monthly expenditure, monthly income etc.
  • Some data products will consume output from the source aligned data products along with other data products and generate value added output. Extending the same example, categorization of spending on transactions.
  • Some data products will be aligned to the extreme right side of the value chain. An example could be a data product producing data for BI report.

This is a major change from the way enterprises think about analytical data. For bringing these changes certain new roles must be carved out with-in the domains. The most important role is the data product owner. This role is responsible for:

  • Creating the vision and the feature roadmap of the data product
  • Customer satisfaction and ensuring the data product is used within the enterprise
  • Ensuring availability, quality, traceability, and maintaining service levels

What problems does product thinking attempt to solve?

  • Trustworthiness — With ownership of data realigned to domain, the data product owner is accountable for the data product. The data product owner ensures that the quality, traceability, and security of the data product is maintained and reported through appropriate metrics, SLOs etc.
  • Discoverability — Each product is cataloged and advertised on the enterprise data marketplace and is self-explanatory. The documentation clearly explains the usability topics such as interfaces, schema, business schematics, and relationship with other data products and SLOs. This ensures data consumers get complete visibility about the data product and they can take informed decisions regarding its usage.
  • Agility — A data product is an autonomous unit of architecture that has its own independent feature roadmap and release cycles. Data products team do not wait for some central platform team to provision the environment and provide data for their work to begin. There is no wastage of time in establishing authenticity, traceability, or in re-work due to non-alignment of input dataset SLOs with the use case SLOs.
  • Productivity — Data consumer productivity automatically increases when aspects like agility, discoverability and trustworthiness are taken care of.

Although the concept of data product seems to bring in many benefits, it might result in an increase in the overall operating cost due to multiple independent infrastructures and multiple small highly skilled teams. Suboptimal utilization of highly skilled teams will drive the operating cost high. The third principle of data mesh attempts to address some of these challenges.

Data as a product

This principle is primarily focused around the people and the processes. It calls for a mindset shift in the enterprise, a shift from considering analytical data as an asset that should be stored to considering analytical data as a product that should be served.

The domains should consider analytical data as a first-class product rather than considering it a by-product of their business operations. They should also apply all the aspects of product development to make it valuable, useful, reliable, and customer-focused.

A few broader-level aspects need to be understood while considering the principle of data as a product:

  • A data product is an autonomous architectural quantum that forms the fundamental building block of the data mesh.
  • A domain can have one or more data products.
  • Data products should interoperate; it can consume outputs from other data products and produce its own output. Eventually, multiple data products interacting with one another will form a mesh of data products.
  • All the technical plumbing such as sourcing data, data modeling, ETL etc. will be abstracted under the data product. The data product team will have authority to design and implement the technical solution.
  • Some data products will be aligned to source systems or operational systems. For example, in retail banking industry, current account savings accounts (CASA) can become a data product. CASA data product can produce output such as real-time account transactions, account balances, monthly expenditure, monthly income etc.
  • Some data products will consume output from the source aligned data products along with other data products and generate value added output. Extending the same example, categorization of spending on transactions.
  • Some data products will be aligned to the extreme right side of the value chain. An example could be a data product producing data for BI report.

This is a major change from the way enterprises think about analytical data. For bringing these changes certain new roles must be carved out with-in the domains. The most important role is the data product owner. This role is responsible for:

  • Creating the vision and the feature roadmap of the data product
  • Customer satisfaction and ensuring the data product is used within the enterprise
  • Ensuring availability, quality, traceability, and maintaining service levels

What problems does product thinking attempt to solve?

  • Trustworthiness — With ownership of data realigned to domain, the data product owner is accountable for the data product. The data product owner ensures that the quality, traceability, and security of the data product is maintained and reported through appropriate metrics, SLOs etc.
  • Discoverability — Each product is cataloged and advertised on the enterprise data marketplace and is self-explanatory. The documentation clearly explains the usability topics such as interfaces, schema, business schematics, and relationship with other data products and SLOs. This ensures data consumers get complete visibility about the data product and they can take informed decisions regarding its usage.
  • Agility — A data product is an autonomous unit of architecture that has its own independent feature roadmap and release cycles. Data products team do not wait for some central platform team to provision the environment and provide data for their work to begin. There is no wastage of time in establishing authenticity, traceability, or in re-work due to non-alignment of input dataset SLOs with the use case SLOs.
  • Productivity — Data consumer productivity automatically increases when aspects like agility, discoverability and trustworthiness are taken care of.

Although the concept of data product seems to bring in many benefits, it might result in an increase in the overall operating cost due to multiple independent infrastructures and multiple small highly skilled teams. Suboptimal utilization of highly skilled teams will drive the operating cost high. The third principle of data mesh attempts to address some of these challenges.

A self-serve platform

This principle is majorly focused around the people and the tools. It states that enterprises should invest in a central data infrastructure to facilitate data product life cycle. Central infrastructure should be self-serve and support tenancy to facilitate autonomy, at the same time, provide multiple tools out of the box. Self-serve tools need to be very thoughtfully built. With the objective of reducing the overall cognitive load on data product teams, they should bring enough abstraction over low level technical components, facilitating faster development and standardization of data product.

The self-serve platform should do the following:

  • Be built and maintained by a smaller, central, highly skilled team — It should be made available to the data product teams as a service through a subscription model.
  • Provide standard purpose-built tools that reduce the cognitive load on the data product teams — Such tools will lower the need for maintaining a large highly skilled team. The team could be composed largely of generalists with a small set of specialists. Examples of such tools include simple user interfaces to define and register events and event schema, to provision a messaging platform, to provision serverless data streaming pipelines, to design and test the transformation scripts. The idea is to bring abstraction to a level where most of the complex functions are hidden and automated.
  • Support multi-tenancy and should be able to onboard tenants — Tenants in this case are data products. This is similar to what the cloud providers have been doing for a while now.
  • Provide standard input and output interfaces — Standard consumer integration patterns and standard producer integration patterns. The input and output could be in the form of data files, data APIs, data streams, etc.
  • Automate service provisioning — Enterprise level cross-cutting concerns such as cost, security, regulatory support, machine learning feature store, data marketplace, etc. should be supported by the platform out of the box.
  • Provide polyglot storage options and processing options — Examples could be relational storage, key value storage, in-memory data grid, SQL querying engines etc.
  • Attempt to converge the technology underlying the operational application and analytical applications — For example, if operational applications are implemented as microservices and are deployed on Kubernetes, streaming applications can also be deployed on Kubernetes as the orchestrator. Spark-based batch processing can also happen on a Kubernetes based cluster.
  • Provide AI/ML toolkit and support MLOps — Provision environment with popular data science tools such as notebook, libraries such as TensorFlow, XGBoost, Keras, etc. that facilitate model training and MLOPs that eases model deployment.

What problems does the self-serve data platform solve?

  • Agility — Autonomous data product team could directly use self-service platform and does not have to depend upon the central infrastructure team to provide the data and infrastructure resources. This leads to a faster development cycle of data products.
  • Cost of Ownership — From an infrastructure perspective, the cost of ownership reduces because it is still centrally provisioned.
  • Skills — Since the platform abstracts technical complexity, the composition of teams shifts toward more generalists and fewer specialists. This reduces the need for a large highly skilled team. |

These principles, if implemented appropriately, seem to address most of the challenges that the enterprises are currently facing. There is still an area that needs to be considered. Most of the data products need to operate across domains. How to determine cust_id in domain A is same as entity_customer_no in domain B? How should you harmonize data across domains? And thus comes the discussion on governance modeling which takes us to the last principle of data mesh.

Federated computational governance

This principle involves all the three major players in the enterprise, the people, the process, and the tools. Federated computational governance is a major shift in ideology from the conventional central governance implementations. The shift is is related to the following ideology concepts:

  • How governance teams are organized
  • How infrastructure should support governance
  • Systemic approaches that can be leveraged to control loosely coupled federated player

Governance should be divided into local governance and global governance:

  • Local governance body is local to a data product and is responsible for defining the local governance policies, frameworks, processes and is accountable for their implementation alongside constant adherence. This implementation is, in a sense, moving away from central governing bodies that used to create policies, validate, and certify adherence on the data estate. In federated governance, aspects like data quality, data modeling, local access policies etc. are managed by the data product owner. This is a significant shift, from implementing large canonical data models to smaller data models purpose build to serve the requirements of the data product. Logical extension of the model is achieved by another data product which takes input from the first data product and builds functionality on top of it to generate value.
  • Global governance body is a thin, cross-functional body that has subject matter experts in various specializations in the enterprise such as legal, security, domains, infrastructure, and technology. They formulate policies that are overarching and are required for the data products to interoperate. Some examples could be a) the decision on which data product aligns with which domain b) agreement on global attributes so that data products can join or union data from multiple data products c) legal policies such as GDPR, Sox d) Security policies related to encryption, obfuscation, masking e) data classification policies. Global governance body is responsible for formulating the policies and the local governance body is accountable for implementation and constant adherence.

Data mesh will be in a constant state of change as new data products will be launched and older ones will retire. Data mesh advocates de-centralization of governance and proposes that system thinking should be applied to bring about an equilibrium between data product autonomy and mesh harmony. This is a deeper topic for discussion but simply put, instead of global governance dictating which data products should be launched, indicators such as data product usage statistics, consumer satisfaction, etc. can decide the life of the product. An analogy could be courses at Udemy. We find many courses on the same topic, however, the user statistics decide the longevity of the course. Similarly, operational metrics such as data product downtime, change failure rate decide the stability of the data product and whether it needs to be redesigned.

There is a lot of emphasis on computational in this principle. As much as possible, the governance policies — local or global — should be automated, coded in the platform, and executed automatically. Platform SMEs are part of the global governance, and they ensure that standards-as-code, policy-as-code, automated test cases, and automated observability is implemented. Examples include automated documentation framework, automated masking/obfuscations, mandatory test cases, automated observability metrics calculation; reporting standard authentication framework, standard data transfer frameworks, and system audit reports.

What We Think

All these principles are essential for holistic implementation of data mesh in an enterprise. The degree of implementation can vary but as we have seen, each principle overcomes the shortcomings of the other. The larger the mesh the greater the value generated from the data. One could question though; a larger mesh equals a complex mesh. We agree, however we also believe that to make the external interfaces of a system simple and easy to use, the internals of the system must be overly complex. Look at the human beings and their anatomy!!

Reach out to us to learn more about Data Mesh and how to Implement it successfully

Get Started with Digital Transformation

Looking for help? Get in touch with us