In this paper I discuss four different approaches to data management in microservices.
Author: Simon Flühmann
The use of a common database for all microservices of an application would be a traditional solution. The advantages of a centralized database have been proven in software design and are obvious: Data of a system is available to all modules at any time and at the most current status. Consistent is no problem here. In addition, there is a uniform data format: Although agreeing on a common schema is a compromise, it simplifies development and maintenance enormously.
Nevertheless, when implementing a microservice architecture, it is advisable to refrain from using a centralized storage system for the following reasons:
Microservices should therefore be decoupled with regard to data. However, it is difficult to implement this consistently - especially in classic operation, maintaining an application with polyglot persistence is very complex, since the entire operational organization must be set up and kept running for each technology (SLAs, backup, recovery, knowledge development, 2nd level support). DevOps naturally brings some relief here.
If the advantages of a central database cannot be dispensed with, the use of completely separate schemes per microservice within a database offers a middle way. In this way, the requirements regarding separate data model and decoupling are architecturally fulfilled and the disadvantages mentioned above are eliminated.
However, new difficulties arise. In order to meet the requirement for consistency among microservices, the data must be replicated at great expense and the data records converted to other schemas in order to store them in an integration DB. This is actually something that one would like to avoid, because the schema transformations are expensive and network traffic also inhibits performance. Furthermore, this approach also reduces resilience. The shared database becomes a single-point-of-failure and a failure results in a service interruption of the whole application due to the high coupling.
In summary, it can be said that maintaining a central database is contrary to the guiding principle of decoupling microservices and brings with it various stumbling blocks. The use of separate schemas of a central database may be the right choice in certain applications. However, it is generally not advisable to operate a microservice system without a dedicated data store per service. A central database destroys the most important advantages of a modular design.
Complete decoupling and resulting autonomy are the most important characteristics of microservice architectures. In the best case, each microservice implements its own solution for persisting its data. The paradigm of data sovereignty of each microservice is thus fulfilled. Furthermore, a persistence technology that perfectly matches the business logic can be selected. This approach promises highly available data and high reliability. The scaling of the data storage is very convenient - supported by automated, autonomous deployment of all modules. The communication between the microservices is done via APIs. If you choose this approach, you enjoy all the advantages of a microservice architecture.
Typical for highly distributed systems, however, there is a risk of inconsistent data - this is called «eventual consistency». Not all data stores can synchronize their states continuously over the entire software context. The consequence can be that a microservice does not work with current data. Frequent calculations and data queries via APIs can also lead to reduced system performance under certain circumstances. Another disadvantage of this approach is, as already mentioned, the increased operating effort.
The consistency of data among microservices can be achieved with replication. For this purpose, the approach of autonomous databases «per service» is supplemented with a replication job. This job periodically replicates the states of the microservices, transports them into a uniform data model and ensures a synchronized data stock in an integration database, from which the states in the system are updated again.
This approach partially solves the problem of inconsistent data. However, consistency remains a utopia here as well. It can take some time until the data is replicated throughout the system. The characteristic of «eventual consistency» must be considered in any case.
The approaches discussed are only partially satisfactory in terms of performance and consistency. A new approach is Event Sourcing. Traditionally, in database systems the current states of data records are managed by CRUD operations. Only the last state of a data set is stored by overwriting the previous state. In contrast, Event Sourcing does not save states (stateless), but a sequence of events that led to the current state of the object.
Any state in the history of the microservice can be restored by executing the events again or, for example, replicated on another platform.
Event-driven data management brings decisive advantages in connection with microservices:
..but also challenges:
In connection with event sourcing, another pattern must be mentioned: Command Query Responsibility Segregation (CQRS). This is an architecture pattern that divides an application internally into two areas of responsibility: the command part and the query part. The command part processes state changes, while the query part is only responsible for executing many read queries quickly and efficiently. The new, internal "Separation of Concerns" of reading and writing not only reduces complexity in the overall context, but also improves performance.
If additional microservices require access to the data of a module, events can be subscribed to in order to react to a «StateChangedEvent» or the data is queried via the query API of the microservice. If necessary, a common message broker is also available in the application, where the data can be retrieved centrally.