A few steps to a self-managed Kafka Cluster in the public cloud

Until now, running a Kafka Cluster was considered a big challenge, as it requires a lot of know-how and experience.

This has changed with the development of Kubernetes operators. An operator provides support by automating complex work steps. The use of an operator is particularly suitable for PoCs (Proof-of-Concept), as it is very easy to get to an operational cluster and thus gain experience quickly. The public cloud is ideal as a basis for an elastic setup.

Author: David Konatschnig

Since I wrote my blog post Apache Kafka and what the hype is about 3 years ago, a lot has happened in the world of stream processing. Apache Kafka is definitely no longer hype, as impressive figures prove: According to Confluent

  • 6 out of 10 companies in the travel industry
  • 7 out of 10 of the big banks
  • 8 out of 10 insurance companies
  • and 9 out of 10 telecommunications companies are using Kafka productively. 

Last but not least, one of the main reasons is probably the wide range of possible use cases. Mario Maric describes some of these in his blog post.
The great popularity has also created a strongly growing community that works intensively on the development of Apache Kafka as well as on tools and products in the Kafka ecosystem. 
However, setting up and running a Kafka cluster and the ecosystem (Kafka Connect, Schema Registry, KSQL, ...) poses challenges for IT departments. It requires a lot of technical know-how and an appropriately positioned team.

There are different approaches to running Kafka clusters:

  • Fully-managed (or on Kafka-as-a-Service): Offered by Confluent, for example, this means that they take care of the complete cluster setup and operation in the desired public cloud. This also includes automatic scaling as the load increases. Billing is based on consumption directly via the corresponding public cloud provider.
  • Self-managed: The self-managed approach means nothing other than that you are responsible for setting up and operating the cluster yourself. This can be in the public cloud, private cloud or on-premise. In contrast to the fully-managed offer, you have full control over the setup, but this does not necessarily have only advantages. Monitoring and scaling, for example, have to be implemented yourself. Nevertheless, thanks to operators, it is now easier than ever to run a cluster yourself.

In this blog, I would like to show you what you need to deploy a self-managed Kafka cluster in the public cloud with the help of an operator in just a few steps.


Kafka in containers

Containers are indispensable in today's cloud-native world. This is also true for Kafka. Although Kafka can also be deployed on bare-metal and VMs, the de-facto standard today is container-based with an orchestrator like Kubernetes. Besides the ease of scaling that Kubernetes brings, there are many other advantages. One of them is the Kubernetes operators.

The Kubernetes Operator Pattern

A Kubernetes operator is nothing more than an extension to Kubernetes that creates a desired state on a cluster. In simple terms, it defines a state, i.e. what should be deployed on the cluster (e.g. a Kafka deployment with 3 brokers). The operator interprets this state and performs corresponding operations to achieve it, and permanently checks that this state remains so. Especially in a complex deployment like Kafka, where many configurations have to interact, which are usually maintained "manually", an operator is an enormous support.


Various commercial and opensourced projects have taken advantage of this operator pattern by developing Kafka cluster operators. Different approaches have proven successful, with one constructed by Confluent, the driving force behind the open source Apache Kafka project, and an opensource project called Strimzi.

Infrastructure in the cloud

More and more companies are venturing into the cloud - and rightly so, because the advantages are obvious: time-to-market, optimisation of costs, more agility, etc.
Especially in terms of agility, the cloud can score points: Managed Kubernetes clusters can be deployed and scaled in the cloud with just a few clicks or CLI commands. Gone are the days when you wait several days for a service or system to be made available after creating an order ticket.
While you can easily click together the infrastructure for a PoC, for a production setup, you want this process to be predictable and repeatable. This task can be accomplished with the help of infrastructure-as-code (IaC) tools. One of the best-known tools in this area is Terraform from Hashicorp. With Terraform, cloud infrastructure can be described declaratively (i.e. on the basis of a desired target state) and provisioned accordingly, and this across all large hyperscalers such as Google, Amazon, Microsoft or even Alibaba Cloud. This can be attractive if you want to pursue a hybrid cloud strategy and thus serve several cloud providers simultaneously.

Due to the infrastructure description based on state files, the whole thing can also be integrated very well into a GitOps process. This means that changes to the infrastructure are versioned and can only be carried out with an approval.


Possible procedure for creating a self-managed cluster in the public cloud

The following steps should be considered if you want to run Kafka self-managed in the public cloud:

  • Choosing a cloud provider
    Managed Kubernetes offers are now available from all hyperscalers, i.e. Google Cloud, Amazon and Microsoft. Depending on how far along your company already is with the cloud journey, the provider may already be predetermined. It is also imperative to consider governance aspects here. This includes questions like: 

    Where is the data? 

    How must the data be encrypted? 

    Who has access to the data?

  • Evaluate Kafka operators
    There are both commercial and open sourced Kafka operators. Here, the requirements should be carefully examined. Not all features can be found in both variants.
    It is a good idea to test the operators with a PoC, preferably on the cloud provider you have chosen in the first step.
  • Determine the provisioning tool
    As already mentioned, it is relatively easy and quick to provision infrastructure via the web interface of the cloud providers. This may be sufficient for the start, but for productive setups this should ideally be done using an IaC tool. Terraform is one possible tool, but there are several alternatives.


Combine the capabilities of a Kafka operator with the advantages of the public cloud and you have the best of both worlds. The following points speak for their use:

  • Compared to classic on-premise landscapes, the public cloud convinces with agility and reduced costs, among other things, and all this while maintaining very high security standards.
  • Kubernetes operators massively simplify deployments on Kubernetes by automatically mapping a desired state on a Kubernetes cluster.
  • There are commercial as well as open source Kafka operators.
  • Kafka operators also include components of the Kafka ecosystem such as Kafka Connect, Schema Registry, etc.
  • Infrastructure in the cloud can be provisioned in a predictable and repeatable way using IaC tools
  • The GitOps approach can be used for both the infrastructure and the Kafka Cluster state, in which the corresponding artefacts are stored versioned in a repository. Changes are only possible via Approvals.
  • The scaling of the cluster is very simple and can be adapted to the workload.

Your ipt expert

I look forward to hearing from you