Introduction to Kafka

Introduction

Apache Kafka is an open source distributed event and stream processing platform written in Java and built for processing real-time data feeds. It is inherently scalable, with high throughput and availability. Developed by the Apache Software Foundation, Kafka has been widely adopted for its reliability, ease of use, and fault tolerance. It is used by the world's largest organizations to manage large volumes of data in a distributed and efficient manner.

In this tutorial, you will download and set up Apache Kafka. You will learn how to create and delete topics, as well as send and receive events using the provided scripts. You will also learn about similar projects with the same goal and how Kafka compares.

Prerequisites

A device with at least 4 GB of RAM and 2 CPUs.
Java 8 or higher installed on your Droplet or local machine.

Step 1 – Download and Configure Apache Kafka

In this section, you will download and extract Apache Kafka on your machine. For added security, you will set it up under your own user account. Then, you will configure and run it using KRaft.

First, you create a separate user under which Kafka will run. By running the following command, you will create a user named Kafka Create:

sudo adduser kafka

You will be asked for your account password. Enter a strong password and press Enter. ENTER For each field, skip filling in additional information.

Finally, switch to the specific Kafka user:

su kafka

Next, you will download the Kafka release package from the official downloads page. At the time of writing, the latest version is 3.6.1. If you are using macOS or Linux, you can download Kafka with curl.

Use this command to download Kafka and install it in /tmp Place:

curl -o /tmp/kafka.tgz https://dlcdn.apache.org/kafka/3.6.1/kafka_2.13-3.6.1.tgz

You have the version under ~/kafka, you will save it in the main directory. Create it by running:

mkdir ~/kafka

Then, by running it, ~/kafka Extract:

tar -xzf /tmp/kafka.tgz -C ~/kafka --strip-components=1

Since the archive you downloaded contains a root folder with the same name as the Kafka version, –strip-components=1 will skip it and extract everything in it.

At the time of writing, Kafka 3 was the last major release that supported two systems for metadata management: Apache ZooKeeper and Kafka KRaft (short for Kafka Raft). ZooKeeper is an open source project that provides a standard way to coordinate distributed data for applications, also developed by the Apache Software Foundation.

However, starting with Kafka 3.3, support for KRaft was introduced. KRaft is a purpose-built system for coordinating only Kafka instances, simplifying the installation process and allowing for much greater scalability. With KRaft, Kafka itself takes full responsibility for the data instead of keeping administrative metadata externally.

While still available, ZooKeeper support is expected to be removed from Kafka 4 and beyond. In this tutorial, you will set up Kafka using KRaft.

You need to create a unique identifier for your new Kafka cluster. For now, it will consist of just one node. Go to the directory where Kafka currently lives:

cd ~/kafka

Kafka with KRaft configures itself in config/kraft/server.properties saves, while the configuration file ZooKeeper config/server.properties It is.

Before running it for the first time, you need to override some of the default settings. Open the file for editing by running:

nano config/kraft/server.properties

...
############################# Log Basics #############################
# A comma separated list of directories under which to store log files
log.dirs=/tmp/kafka-logs
...

Settings log.dirs Specifies where Kafka keeps its log files. By default, it stores them in /tmp/kafka-logs saves them, as they are guaranteed to be writable, albeit temporarily. Replace the value with the specified path:

...
############################# Log Basics #############################
# A comma separated list of directories under which to store log files
log.dirs=/home/kafka/kafka-logs
...

Since you created a separate user for Kafka, you will place the log directory path under the user's home directory. If it doesn't exist, Kafka will create it. When you are done, save and close the file.

Now that you have configured Kafka, run the following command to generate a random cluster ID:

KAFKA_CLUSTER_ID="$(bin/kafka-storage.sh random-uuid)"

Then create storage space for the log files by running the following command and entering the ID:

bin/kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c config/kraft/server.properties

The output will be:

Output
Formatting /home/kafka/kafka-logs with metadata.version 3.6-IV2.

Finally, you can start the Kafka server for the first time:

bin/kafka-server-start.sh config/kraft/server.properties

The end output will be similar to this:

Output
...
[2024-02-26 10:38:26,889] INFO Awaiting socket connections on 0.0.0.0:9092. (kafka.network.DataPlaneAcceptor)
[2024-02-26 10:38:26,890] INFO [BrokerServer id=1] Waiting for all of the authorizer futures to be completed (kafka.server.BrokerServer)
[2024-02-26 10:38:26,890] INFO [BrokerServer id=1] Finished waiting for all of the authorizer futures to be completed (kafka.server.BrokerServer)
[2024-02-26 10:38:26,890] INFO [BrokerServer id=1] Waiting for all of the SocketServer Acceptors to be started (kafka.server.BrokerServer)
[2024-02-26 10:38:26,890] INFO [BrokerServer id=1] Finished waiting for all of the SocketServer Acceptors to be started (kafka.server.BrokerServer)
[2024-02-26 10:38:26,890] INFO [BrokerServer id=1] Transition from STARTING to STARTED (kafka.server.BrokerServer)
[2024-02-26 10:38:26,891] INFO Kafka version: 3.6.1 (org.apache.kafka.common.utils.AppInfoParser)
[2024-02-26 10:38:26,891] INFO Kafka commitId: 5e3c2b738d253ff5 (org.apache.kafka.common.utils.AppInfoParser)
[2024-02-26 10:38:26,891] INFO Kafka startTimeMs: 1708943906890 (org.apache.kafka.common.utils.AppInfoParser)
[2024-02-26 10:38:26,892] INFO [KafkaRaftServer nodeId=1] Kafka Server started (kafka.server.KafkaRaftServer)

The output shows that Kafka has successfully initialized using KRaft and is creating connections in 0.0.0.0:9092 Accepts.

When CTRL + C Press , the process will exit. Since it is not preferable to run Kafka with a session open, in the next step you will create a service to run Kafka in the background.

Step 2 – Create a systemd service for Kafka

In this section, you will create a systemd service to run Kafka in the background at all times. Systemd services can be started, stopped, and restarted continuously.

Put the service configuration in a file named code-server.service In the list /lib/systemd/system You save, where systemd It stores your services. Create it using your text editor:

sudo nano /etc/systemd/system/kafka.service

Add the following lines:

[Unit]
Description=kafka-server
[Service]
Type=simple
User=kafka
ExecStart=/bin/sh -c '/home/kafka/kafka/bin/kafka-server-start.sh /home/kafka/kafka/config/kraft/server.properties > /home/kafka/kafka/kafka.log 2>&1'
ExecStop=/home/kafka/kafka/bin/kafka-server-stop.sh
Restart=on-abnormal
[Install]
WantedBy=multi-user.target

Here you first specify the service description. Then in the [Service] You define the service type (simple means that the command should be run simply) and provide the command that will be run. You also specify the user that will be run as Kafka is, and the service should restart automatically if Kafka exits.

Section [install] instructs the system to start this service when it is possible to log in to your server. When finished, save and close the file.

Start the Kafka service by running the following command:

sudo systemctl start kafka

Check that it started correctly by viewing its status:

sudo systemctl status kafka

You will see output similar to the following:

Output
● kafka.service - kafka-server
Loaded: loaded (/etc/systemd/system/kafka.service; disabled; preset: enabled)
Active: active (running) since Mon 2024-02-26 11:17:30 UTC; 2min 40s ago
Main PID: 1061 (sh)
Tasks: 94 (limit: 4646)
Memory: 409.2M
CPU: 10.491s
CGroup: /system.slice/kafka.service
├─1061 /bin/sh -c "/home/kafka/kafka/bin/kafka-server-start.sh /home/kafka/kafka/config/kraft/server.properties > /home/kafka/kafka/kafka.log 2>&1"
└─1062 java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -XX:MaxInlineLevel=15 -Djava.awt.headless=true "-Xlog:gc*:file=/home/kafka/kafka/bin/../logs/kaf>
Feb 26 11:17:30 kafka-test1 systemd[1]: Started kafka.service - kafka-server.

To automatically start Kafka after server restart, enable its service by running the following command:

sudo systemctl enable kafka

At this point, you have created and enabled a systemd service for Kafka, so that it starts on every server boot. Next, you will learn how to create and delete topics in Kafka, as well as how to produce and consume text messages using the available scripts.

Step 3 – Producing and consuming topical messages

Now that you have set up a Kafka server, you will be introduced to topics and how to manage them using the provided scripts. You will also learn how to send and receive messages from a topic. As explained in the Event Stream article, publishing and receiving messages are related to topics. A topic can be related to a category to which a message belongs.

From the provided script kafka-topics.sh You can manage topics in Kafka via CLI Used to create a topic called first-topic Run the following command:

bin/kafka-topics.sh --create --topic first-topic --bootstrap-server localhost:9092

All provided Kafka scripts require that the server address be specified with --bootstrap-server Specify.

The output will be:

Output
Created topic first-topic.

To list all available topics, instead of --create In --list Send:

bin/kafka-topics.sh --list --bootstrap-server localhost:9092

You see the topic you created:

Output
first-topic

You can find detailed information and statistics on the topic by going to --describe Get:

bin/kafka-topics.sh --describe --topic first-topic --bootstrap-server localhost:9092

The output will look like this:

Output
Topic: first-topic TopicId: VtjiMIUtRUulwzxJL5qVjg PartitionCount: 1 ReplicationFactor: 1 Configs: segment.bytes=1073741824
Topic: first-topic Partition: 0 Leader: 1 Replicas: 1 Isr: 1

The first line specifies the topic name, ID, and recurrence factor, which is 1 because the topic only exists on the current machine. The second line is intentionally indented and shows information about the first (and only) partition of the topic. Kafka allows you to partition the topic, meaning that different parts of a topic can be distributed across different servers, increasing scalability. Here, there is only one partition.

Now that you have created a topic, you will produce messages for it using the kafka-console-producer.sh script. Run the following command to start the producer:

bin/kafka-console-producer.sh --topic first-topic --bootstrap-server localhost:9092

You will see a blank notification:

The producer is waiting for your SMS. Enter the test and ENTER Press . The notification will look like this:

>test
>

The producer is now waiting for the next message, meaning the previous message has been successfully delivered to Kafka. You can enter any number of messages for testing. To exit the producer, CTRL+C Press .

To retrieve messages from a topic, you need a consumer. Kafka provides a simple consumer in the form kafka-console-consumer.sh It offers. Run it by running:

bin/kafka-console-consumer.sh --topic first-topic --bootstrap-server localhost:9092

However, there will be no output. This is because the consumer is streaming data from the topic and nothing is being produced or sent at the moment. To consume messages that you produced before the consumer started, you need to read the topic from the beginning by running:

bin/kafka-console-consumer.sh --topic first-topic --from-beginning --bootstrap-server localhost:9092

The consumer replays all topic events and fetches messages:

Outputtest
...

Like the builder, to exit, CTRL+C Press .

To verify that the consumer is actually streaming data, open it in a separate terminal session. Open a secondary SSH session and run the consumer in the default configuration:

bin/kafka-console-consumer.sh --topic first-topic --bootstrap-server localhost:9092

In the initial session, run the constructor:

bin/kafka-console-producer.sh --topic first-topic --bootstrap-server localhost:9092

Then enter your desired messages:

>second test
>third test
>

You will immediately see that they are received by the consumer:

Output
second test
third test

After testing is complete, terminate both the producer and consumer.

To delete the first topic, --delete to kafka-topics.sh Transfer:

bin/kafka-topics.sh --delete --topic first-topic --bootstrap-server localhost:9092

There will be no output. You can list the topics to verify that they were indeed deleted:

bin/kafka-topics.sh --list --bootstrap-server localhost:9092

The output will be:

Output
__consumer_offsets

__Consumer_Equivalent is an internal Kafka topic that stores the amount of time a consumer has read a topic. At this point, you have created a Kafka topic and created messages in it. Then, you have consumed the messages using the provided script and finally received them in real time. In the next step, you will learn how Kafka compares to other event brokers and similar software.

Comparison with similar architectures

Apache Kafka is considered a real solution for event streaming use cases. However, Apache Pulsar and RabbitMQ are also widely used and stand out as versatile options, albeit with differences in their approach. The main difference between message queues and event streams is that the main task of the former is to deliver messages to clients in the fastest possible manner, regardless of their order. Such systems usually store messages in memory until they are acknowledged by consumers. Message filtering and routing are important aspects, as consumers can show interest in specific categories of data. RabbitMQ is a strong example of a traditional messaging system, where multiple consumers can subscribe to a topic and receive multiple copies of a message. Event streaming, on the other hand, focuses on persistence. Events should be archived, maintained in a tidy manner, and processed once. Their routing to specific consumers is not important, as the idea is that all consumers process events the same way. Apache Pulsar is an open source messaging system developed by the Apache Software Foundation that supports event streaming. Unlike Kafka, which it was built with from the ground up, Pulsar started out as a traditional message queuing solution and later gained event streaming capabilities. Pulsar is therefore useful when a combination of both methods is needed, without the need to deploy separate applications.

Result

You now have Apache Kafka running securely in the background of your server, configured as a system service. You have also learned how to manipulate topics from the command line, as well as produce and consume messages. However, the main appeal of Kafka is the wide variety of clients for integrating it into your applications.

Introduction to Kafka

Introduction

Prerequisites

Step 1 – Download and Configure Apache Kafka

Step 2 – Create a systemd service for Kafka

Step 3 – Producing and consuming topical messages

Comparison with similar architectures

Result

In this article:

Post written by: Hadi Bahadori

Leave a Reply

EducationalKafka event flow explanation

EducationalHow to deploy a NestJS application with Nginx on an Ubuntu VPS

How to create a dedicated CS2 (Counter Strike 2) server

The story of the entire Dark Souls series

Wbuntu: Ubuntu OS that looks like Windows 11

Hosting an AI chatbot with Olama and Open WebUI

How to use Netcat to create and test TCP and UDP connections

Virtual Box installation tutorial

How to configure repositories in Ubuntu 20.04

How to play audio with Python?

Introduction to Kafka

Introduction

Prerequisites

Step 1 – Download and Configure Apache Kafka

Step 2 – Create a systemd service for Kafka

Step 3 – Producing and consuming topical messages

Comparison with similar architectures

Result

In this article:

Post written by: Hadi Bahadori

Follow

Leave a Reply

You May Also Like