Data Mesh

A data mesh is a practice born in industry to manage data across different domains with a focus on scalability of the data management infrastructure and a logical division based on data domains

--- title: data mesh logical view --- flowchart LR subgraph domain 1 A@{ shape: rect, label: data product} B@{ shape: rect, label: data product} end subgraph domain 2 C@{ shape: rect, label: data product} D@{ shape: rect, label: data product} E@{ shape: rect, label: data product} end

Data product

A data product is the minor component of a data mesh it’s composed by:

Formally:

Cite

an independently deployable, high-cohesive component encompassing all the structural elements required for its function (architectural quantum)

Data mesh Component relations

The mesh start to appear when components at different levels interact with component on the same level, for example data products can consume input from other data products or components at upper levels can aggregate data from different domains to correlate them

Data product

A Data product has to accomplish this requirements:

--- title: data product architecture --- flowchart LR subgraph inputs A@{shape: in-out, label: input port} end subgraph outputs direction TB B@{shape: in-out, label: output port} G@{shape: in-out, label: Discovery port} B ~~~ G end subgraph core C@{shape: db, label: data storage} D@{shape: doc, label: documentation} E@{shape: rect, label: CI CD pipelines} F@{shape: rect, label: observability} H@{shape: rect, label: tests} I@{shape: rect, label: data trasformation} C ~~~ D ~~~ H E ~~~ F ~~~ I end inputs ~~~ core ~~~ outputs

Data contracts

Data exposed by a data product are formatted in datasets specified by data contracts that define the data format for the dataset

Data structure

A data product storage information in a domain-driven way where data are presented trough mutable entities and immutable events

Data retrival

To obtain Domain events and Domain entities from the data source the pipeline is the following

flowchart LR subgraph operational_data direction LR A@{shape: proc, label: microservice} B@{shape: bolt, label: messaging} C@{shape: db, label: database} A -- publish --> B A -- persist --> C C -- CDC --> B end subgraph Analytical_data direction LR D@{shape: proc, label: raw} E@{shape: proc, label: events} F@{shape: proc, label: entities} D -- clean --> E & F end B --> D C --> D

The database publish CDC events to notify the analytical layer to pull data that are ready for the cleanup process

Events and entities are then published through the use of a data contract

🔷 Note

For analytical purposes entities history can be stored alongside the latest state of the data entities

References