Data Mesh: A New Architectural Approach
In the era of Big Data, we find ourselves daily supporting companies in defining data-driven strategies and helping them to become or remain competitive in the reference markets. In this sense, data represent a central aspect and one of the most critical of archiving architectures. They must be able to guarantee the management of ever-increasing amounts of data with the right level of democratization and efficiency. Unfortunately, despite the best practices suggested for digital evolution, many companies remain anchored to old architectural standards, although they are becoming increasingly obsolete.
To overcome this problem, many companies have started to implement Data Warehouses and Data Lakes, structures that, thanks to their characteristics, can guarantee data availability, processing, and transformation capabilities in real-time. However, even these architectural choices are often not enough as they fail to adhere to the required levels of data democratization and efficiency.
What Is Data Mesh?
Data Mesh is a relatively new concept and has quickly become one of the fastest-growing trends in recent years. It is proposed as the extension of the paradigm shift introduced by microservice architectures applied to data architectures, allowing agile and scalable analyzes and easy access to Machine Learning and Artificial Intelligence. The Data Mesh is essentially the modern alternative to the organizational and architectural model of the data lake with a distributed and decentralized architecture, of Data Mesh, designed to support companies in the process of increasing their agility and business scalability, reducing the time-to-market and decreasing maintenance costs, thus also allowing a fairer and more transparent allocation of internal costs.
The Data Mesh is, therefore, an objective decentralized approach to the Data Platform, a new architectural idea that allows each domain ( Business Domain ) to have its storage and methods for managing dedicated data and processes. The Data Mesh as an architectural change is based on four key concepts:
- Decentralized and domain-oriented ownership, architecture, and data ownership: Therefore ad hoc teams consist of data engineers and dedicated roles modeled for specific domains. Decentralized ownership does not make each group detached from the others, but it still implies the need for a common strategy to define the tools and the architecture uniformly;
- Data as a product: Each domain is a producer of data, owner, and manager, both from a functional point of view to the business and from a technical/technological point of view. In this way, each domain can move at its speed, with the most suitable technology and providing valuable results in a reasonable time to any potential data consumer;
- Self-service data infrastructure as a platform at every company level, from business users to software development teams. The Data Mesh favors the discovery and use of data products from other domains through ad-hoc services;
- Federated computational data governance to better manage standardization, monitor developments, and provide high control and flexibility within the different domains. The Data Mesh enriches the federated Data Governance approach, in which users manage data for users.
How Does A Data Mesh Work?
A Data Mesh implies a deep-rooted cultural change, especially in how companies approach and use their data: data are no longer seen as “by-products” of processes but take on their dignity and become their actual product. In traditional architectures, the infrastructure managers have ownership of the data. At the same time, in the Data Mesh, the emphasis is placed on a new way of thinking about the data seen as a product, so the ownership of the data shifts to their producers as subject matter experts. Their skills also allow us to better understand who the data users are and how they will use the operational and analytical data of the domain. In this way, you are also in a position to design the APIs suitable for the context and value. This design implies new responsibilities for data producers as they become responsible for semantic definitions, metadata cataloging, and setting up permissions and usage policies.
While this can be frightening, we must not forget that there is a cross-team to manage the centralized data governance that guarantees the application of the standards and a centralized data engineering cross-team. The data mesh uses functional domains to manage and organize data, thus allowing it to be honestly treated as a “business product” that users throughout the organization can access within the terms they are entitled. This is similar to what happens in microservice architectures, where lightweight services are associated with each other to provide complex functionality to business applications. In this way, the data mesh allows a more flexible data integration. All users can immediately use the data produced and managed by the different domains for their business activities, be they business analytics or Data Science experiments.
Implementation Difficulties And Advantages Of The Data Mesh
The Data Mesh offers companies a solution that allows a systematic approach and defines clear ownership by entrusting the various teams with their respective responsibilities. In some cases, the Data Mesh can become the starting point for accelerating innovation processes, helping speed them up and avoiding the pitfalls associated with creating giant monoliths. In other cases, Data Mesh has supported migrations from on-premise architectures to cloud solutions.
As with any innovation, even the path toward adopting Data Mesh is not without obstacles. Sometimes, despite its countless advantages, one perceives that technology, instead of increasing productivity, makes our work more difficult. And more complex. This happens, for example, when it is implemented with too strict rules or logic that does not follow the user’s needs. This does not mean that the Data Mesh must be abandoned in the bud, but only that it must be set up in a way that suits the context to see the advantages. Speaking of the advantages obtainable with Data Mesh, we can summarize them in some main aspects:
- Better democratization of data: data mesh architectures facilitate self-service applications from multiple data sources by expanding access: not only technical figures such as data scientists, data engineers, and developers but also users closest to the business can access the data. In this way, data silos and operational bottlenecks are reduced, allowing, on the one hand, business users to initiate a more informed and rapid decision-making process and, on the other allowing technical users to prioritize the activities they need more stringently of their skills.
- Cost reduction: Data Mesh moves away from batch data processing to the advantage of adopting cloud platforms and streaming pipelines to collect data in real-time. The cloud offers a significant cost advantage thanks to the pay-per-use payment model, i.e., paying only for the space and resources used. This means an increase in the transparency of storage costs, therefore, a better allocation of budget and resources.
- Security and compliance: as already mentioned, the data mesh promotes solid data governance through a cross-team that applies standards and manages access to sensitive data. This ensures that companies comply with current regulations. The log data guarantees the system’s observability, allowing the auditors to understand which users see which data fully.
- Greater interoperability: data owners agree on how domain-independent information is standardized, which facilitates interoperability: when a domain team structures its dataset, it will apply the rules selected to allow data linking between domains quickly and easily. This cross-domain consistency enables data users to easily interface through APIs and develop applications to suit their business needs more appropriately.
What is essential to keep in mind is that a well-made Data Mesh architecture requires that a profound cultural and organizational change take place in the company, a change of mentality that leads the entire company to consider data as a product, thus freeing itself from the necks of the bottle and the limits related to the use of a Data Lake, and reaping the benefits of a distributed architecture. For this reason, creating a Data Mesh is a significant undertaking. It is not just a question of “having” the data in the company. Still, a change of mindset is necessary to exploit them to the fullest; otherwise, it will not be possible to derive any benefit. In this, a Data Mesh architecture indeed represents a turning point.