Data mesh is the solution to the long-standing problem of data platform scalability

Jerrold Stolk
20 Nov, 2023

Data mesh is hot in the world of data platforms and Azure

Data mesh is hot. This is understandable, as data mesh offers the solution to a long-standing problem: scalability of data platforms. Perhaps data mesh can provide a breakthrough within your organization as well. In this blog, I'll tell you what the main benefits and challenges of a data mesh architecture are.

The term data mesh was introduced by ThoughtWorks consultant Zhamak Dehghani. It is an architecture in which distributed data products are developed and managed by data engineers and data product owners in domain teams. A shared infrastructure is used to host, prepare and offer data. As central data teams are acknowledging en masse that they are running into limits, data mesh has become a major trend in the world of data platforms. How did this situation arise?

Some history of the data warehouse

In the 1980s, the data warehouse emerged: a central data environment to report from. It was the answer to the question of how to get an overall picture of the state of an organization. This centralization, in turn, brought new challenges, for example in terms of technology, knowledge and staffing. Finally, the logical consequence of an ever-growing data platform, an ever-growing server and an ever-growing team.

Parallel processing

A solution to the technical challenge arrived in the year 2000 or thereabouts: parallel processing, on the Hadoop ecosystem, among others. Before the advent of parallel processing, the most common solution to performance problems was a larger server: scale up.

With the advent of parallel processing, multiple servers could be deployed to host the data platform: scale out.

Scalability problem not yet solved

But having a data warehouse and parallel processing does not solve the scalability problem for knowledge and staffing. A larger data platform still requires a larger central team with centrally collected data engineering knowledge: scale up. This is why vertical splits are often made in IT environments, with data engineers and data analysts working in different teams. The disadvantage of this split is that different teams are needed for each data product.

The big advantage of data mesh is that it provides a full-fledged scale-out solution:

Splitting the central data team and the knowledge around it into domain teams, each with its own expertise. This enables domain teams to deliver optimal business value within their own areas of expertise. With the right standards, tools and knowledge, domain teams are able to deliver data products themselves and offer them centrally.

• The domain team manages data quality and can monitor and improve it well;
• The domain team knows the right definitions and can apply and share them well;
• The domain team knows the data users, can serve them well and give them peace of mind;

In turn, however, data mesh comes with challenges

Essential questions that every organization must have answered prior to a data mesh implementation are:

  • How (de)centrally is my organization set up?
  • What is the size of my organization?

An implementation of data mesh makes sense only if the benefits of decentralization outweigh the investment in setting up the platform and standards. Therefore, data mesh is an appropriate solution for (in particular) organizations with multiple divisions and/or an international character.

New role of IT teams

Also, data mesh requires a new role for IT teams, both in support and control. The IT teams must support the domain teams with the platform and appropriate tools. In addition, they must control the domain teams by overseeing the application of uniform standards.

Support

With multiple domain teams each delivering their own data products, good support is needed in the following areas: standards for accessible description of data products, support for modern tooling, and understandable data transformation standards.

Control

You have probably already asked yourself the question: how can I maintain control in an environment with multiple independent teams? The answer: standardization and policy. Establishing standards ensures that there is no proliferation of code and descriptions. When managing domain teams, firmly delineated policies are needed: it should not be possible to release code or documentation that does not conform to standards relating to naming, structure and tagging.

Data mesh in the Azure cloud

Data mesh is not a cloud service that you just turn on or off. It's a combination of the right approach with the right tools. When optimally applying a data mesh architecture in the Azure cloud, the following services deliver maximum value:

  • Azure Purview: a central place for data governance, supporting descriptions of data products and their origins (lineage)
  • Azure Synapse: a scalable cloud data platform, with an easy-to-use interface for unified data processing
  • Azure Data Lake: a scalable and cost-effective storage option
  • Azure Machine Learning: supporting machine learning and securing the processes around it
  • Azure Policies: these are available for the above services and enforce a standardized way of working

Data mesh within your organization?

Want to learn more about the key benefits and challenges of data mesh within your organization, or get started on it immediately? Connect with our Data & Analytics experts!

By using this form you agree to the storage and processing of the data you provide, as indicated in our privacy policy. You can unsubscribe from sent messages at any time. Please review our privacy policy for more information on how to unsubscribe, our privacy practices and how we are committed to protecting and respecting your privacy.

Read more

From our Data, Analytics and AI experts