Approaches to data management for microservices

In this paper I discuss four different approaches to data management in microservices.

Central database

Author: Simon Flühmann

The use of a common database for all microservices of an application would be a traditional solution. The advantages of a centralized database have been proven in software design and are obvious: Data of a system is available to all modules at any time and at the most current status. Consistent is no problem here. In addition, there is a uniform data format: Although agreeing on a common schema is a compromise, it simplifies development and maintenance enormously.

Nevertheless, when implementing a microservice architecture, it is advisable to refrain from using a centralized storage system for the following reasons:

  • Contradicts the principle of decoupling - high dependencies
  • Efficient maintainability impossible
  • Autonomous deployment impossible
  • Write accesses cause expensive locking of data
  • Common data model is a compromise

Microservices should therefore be decoupled with regard to data. However, it is difficult to implement this consistently - especially in classic operation, maintaining an application with polyglot persistence is very complex, since the entire operational organization must be set up and kept running for each technology (SLAs, backup, recovery, knowledge development, 2nd level support). DevOps naturally brings some relief here.

If the advantages of a central database cannot be dispensed with, the use of completely separate schemes per microservice within a database offers a middle way. In this way, the requirements regarding separate data model and decoupling are architecturally fulfilled and the disadvantages mentioned above are eliminated.

However, new difficulties arise. In order to meet the requirement for consistency among microservices, the data must be replicated at great expense and the data records converted to other schemas in order to store them in an integration DB. This is actually something that one would like to avoid, because the schema transformations are expensive and network traffic also inhibits performance. Furthermore, this approach also reduces resilience. The shared database becomes a single-point-of-failure and a failure results in a service interruption of the whole application due to the high coupling.

Microservices-1.png
Using a fictitious example, the graphic shows what a microservice landscape with a central database could look like.

In summary, it can be said that maintaining a central database is contrary to the guiding principle of decoupling microservices and brings with it various stumbling blocks. The use of separate schemas of a central database may be the right choice in certain applications. However, it is generally not advisable to operate a microservice system without a dedicated data store per service. A central database destroys the most important advantages of a modular design.

Independent persistence technology for each microservice

Complete decoupling and resulting autonomy are the most important characteristics of microservice architectures. In the best case, each microservice implements its own solution for persisting its data. The paradigm of data sovereignty of each microservice is thus fulfilled. Furthermore, a persistence technology that perfectly matches the business logic can be selected. This approach promises highly available data and high reliability. The scaling of the data storage is very convenient - supported by automated, autonomous deployment of all modules. The communication between the microservices is done via APIs. If you choose this approach, you enjoy all the advantages of a microservice architecture.

Typical for highly distributed systems, however, there is a risk of inconsistent data - this is called «eventual consistency». Not all data stores can synchronize their states continuously over the entire software context. The consequence can be that a microservice does not work with current data. Frequent calculations and data queries via APIs can also lead to reduced system performance under certain circumstances. Another disadvantage of this approach is, as already mentioned, the increased operating effort.

Microservices-2.png
The diagram shows the schematic structure of a microservice architecture.

Consistency through replication of data

The consistency of data among microservices can be achieved with replication. For this purpose, the approach of autonomous databases «per service» is supplemented with a replication job. This job periodically replicates the states of the microservices, transports them into a uniform data model and ensures a synchronized data stock in an integration database, from which the states in the system are updated again.

This approach partially solves the problem of inconsistent data. However, consistency remains a utopia here as well. It can take some time until the data is replicated throughout the system. The characteristic of «eventual consistency» must be considered in any case.

Microservices-3.png
The diagram shows the schematic structure of a microservice architecture with replication.

Event-driven data management with event sourcing

The approaches discussed are only partially satisfactory in terms of performance and consistency. A new approach is Event Sourcing. Traditionally, in database systems the current states of data records are managed by CRUD operations. Only the last state of a data set is stored by overwriting the previous state. In contrast, Event Sourcing does not save states (stateless), but a sequence of events that led to the current state of the object.

Any state in the history of the microservice can be restored by executing the events again or, for example, replicated on another platform.

Event-driven data management brings decisive advantages in connection with microservices:

  • In case of an unforeseen event, events can be triggered subsequently and messages can be delivered again. Thanks to asynchronous communication, several independent delivery attempts can be made.
  • The senders and recipients of events are decoupled and have no knowledge of each other. This decoupling and asynchronous communication ensures that a failure of one component ideally has no side effects on surrounding microservices.
  • Expensive data manipulations (update, delete) are not necessary and no locking on data sets is necessary.
  • Events are immutable and thus enable an unchangeable view of the state of a microservice.
  • Simple replication of components, for example during scaling, failure scenarios or even when changing the data model.
  • The time-consuming and compromising creation of object-relational models during software design is no longer necessary.
  • The storage of all events in the system has a very high business value: In-depth analysis of data is possible at any time, legal traceability is ensured at all times, reproducibility of errors is easy to achieve and efficient debugging thanks to deterministic analysis.

..but also challenges:

  • Event Sourcing makes an already complex design even more complex.
  • «Eventual consistency» is also a fact in an event-driven architecture. Therefore, the following applies to data consistency: «Prioritization is key» - at best together with all stakeholders involved.

CQRS

In connection with event sourcing, another pattern must be mentioned: Command Query Responsibility Segregation (CQRS). This is an architecture pattern that divides an application internally into two areas of responsibility: the command part and the query part. The command part processes state changes, while the query part is only responsible for executing many read queries quickly and efficiently. The new, internal "Separation of Concerns" of reading and writing not only reduces complexity in the overall context, but also improves performance.

If additional microservices require access to the data of a module, events can be subscribed to in order to react to a «StateChangedEvent» or the data is queried via the query API of the microservice. If necessary, a common message broker is also available in the application, where the data can be retrieved centrally.

Microservices-4.png
The following graphic example shows the possible implementation of a microservice with event sourcing. The service exposes both a command API and a query API (separation by CQRS). "StateChangedEvents" are lined up in an event queue and processed by an event handler. The events are persisted in the event store and also made available to the query page. The query page answers status queries via query API with the current information.