3 reasons why Kafka is driving data-driven business

Companies of all sizes are trying to determine the business value of their data in order to remain competitive. So-called data-driven businesses are already able to make more accurate decisions by making all types of data available immediately and securely so that it can be processed and analyzed throughout the enterprise.

Authors: Yves Brise & Fabio Hecht

Until recently, making large amounts of data (several million messages per second) available in real time on a single platform was an enormous challenge for companies. However, this is changing.

Develop data

Apache Kafka is an open source, highly available, cloud-native, horizontally scalable, real-time streaming enterprise platform that solves this problem. It is popular with developers for its elegant design, APIs, toolset and support. On the other hand, Kafka is popular with business users because of the possibilities it offers for collecting and analyzing large amounts of data in real time.

In his blog post David Konatschnig explains what Kafka is, shows the business value of the platform and points out the differences to other messaging systems. In this blog article, we illustrate three reasons why companies should use Kafka as a streaming platform for their data-driven transformation

1. Kafka is (also) a highly scalable and secure messaging system
At its core, Kafka is a pub sub-messaging system. It scales particularly well, supports a large number of producers and consumers and new servers increase capacity almost linearly. Therefore, in many cases it is possible to process several million messages per second.

Its security features support data encryption and access control policy enforcement. It can therefore be used as a central messaging platform for the entire company.

These two characteristics (scalability and security) enable Kafka to replace obsolete and poorly scaled MQ systems, whose maintenance is costly, to decouple components and cache messages (see Fig. 1).

Kafka_Abb-1.jpg
Fig. 1: Kafka decouples systems

2. Kafka breaks silos and integrates both old and modern systems

As business grows, technologies evolve and events such as acquisitions and mergers occur, companies must adapt quickly and cost-effectively. This can be achieved by extracting data from silos of information - including legacy systems - and making it available in a cloud-native, microservice-based way without changing existing core systems. Such a process is important in breaking up large, monolithic applications into smaller, agile pieces.

With Kafka Connect, data from a variety of systems can be transferred to and from Kafka. The supported systems range from relational databases and messaging systems to cloud and mobile apps. Both batch and real-time data integration modes are supported (see Figure 2).

Data streaming in Kafka is often configurable with Kafka Connect (here is a list of available connectors). However, thanks to the modern architecture, documentation and availability of open source connectors, new connectors can be developed relatively easily.

This makes Kafka a suitable platform also for the following use cases:

  • Real-time data integration: Send data to legacy and modern systems such as databases, MQs, JMS, Salesforce, Amazon S3, HDFS, SAP HANA and many more. This allows the development of new, loosely coupled microservices that are integrated with legacy systems.
  • Website activity tracking: Publishing activities such as page views, searches, etc. as event streams to feed monitoring systems and data warehouses. For example, marketing, strategy and IT departments can use real-time data to make decisions in their areas.
  • Log Aggregator: Sending application logs from different source systems to operate analysis tools such as Splunk and Elasticsearch. This allows the appropriate tool to be used depending on the situation.
  • IoT: Due to its scalability, Kafka is well suited to collect data from a large number of devices, e.g. via MQTT protocol.
Abb-2_Kafka-als- Silo-Integrator_figure2-200dpi-1200x469.png
Fig. 2: Kafka as silo integrator

3. Data can be analyzed in real time inside or outside of Kafka

Data-driven business requires that data is not only collected and made available throughout the company, but also analyzed. Data that flows through Kafka can be analyzed with Kafka tools as well as with external tools.

Kafka's own tools include the Kafka Streams API, a Java library aimed at experienced developers, and KSQL, a form of expression for data analysts. The advantage of analyzing data within Kafka is that no additional system needs to be operated. It can be guaranteed that all events are processed exactly once (exactly-once semantics).

If necessary, data can also be used by external data analysis systems (native if supported, or via Kafka Connect) that are already established in a company or that can solve specific problems (see Fig. 3).

figure1-200dpi.png
Fig. 3: Data analyzed inside or outside of Kafka

Conclusion

Kafka is an important component in the data-driven enterprise because it makes data securely available to the entire company in real time. Kafka can also be used as a messaging platform to decouple software components and increase the speed of innovation. It can also recover data from legacy systems without affecting proven and functioning systems that are difficult to change. Finally, different departments can access and analyze cross-divisional data to generate data-driven insights.