What is kafka?
Indroduction
- Apache kafka is a distributed streaming platform.
- It is a popular distributed message broker designed to efficiently handle large volumes of real-time data.
- A Kafka cluster is not only highly scalable and fault-tolerant, but it also has a much higher throughput compared to other message brokers such as ActiveMQ and RabbitMQ.
- It is generally used as a publish/subscribe messaging system, a lot of organizations also use it for log aggregation because it offers persistent storage for published messages.
Three Capabilities of Streaming platform
- Publish and subscribe to the stream of records
- Stores streams of records in a fault-tolerant durable way
- Process streams of records
kafka is used for 2 types of Applications
- Helps to build real-time streaming data pipelines that can get data between applications.
- Helps to build real-time streaming applications that react to the streams of data.
5 Core API’s of kafka
Here is the lists of API’s in the Kafka system.
PRODUCER API
It allows the application to publish stream of records to one or more topics.
CONSUMER API
It allows an application to subscribe to topics and process the streams of records.
STREAMS API
It helps to transform the input stream to an output stream.It allows applications to act as a stream processor , consuming input streams from topics and producing output streams to output topics.
CONNECTOR API
Helps to build and run reusable producers or consumers that connect kafka topics to the existing applications.
ADMIN API
It helps to manage and inspect kafka objects such as topics , brokers etc.
PreRequisites
- A Centos 7 server with sudo or root privileges.
- Kafka requires a server with minimum 4GB of RAM to run.
- Java should be installed on the server
Apache Kafka is a distributed streaming platform. It is useful for building real-time streaming data pipelines to get data between the systems or applications.
This tutorial will help you to install Apache Kafka CentOS 8 or RHEL 8 Linux systems.
Prerequisites
- The newly installed system’s recommended to follow initial server setup.
- Shell access to the CentOS 8 system with sudo privileges account.
Step 1 – Install Java
You must have Java installed on your system to run Apache Kafka. You can install OpenJDK on your machine by executing the following command. Also, install some other required tools.
sudo yum install java-11-openjdk.x86_64 -y Once the package is installed , Check the installed Java version using the below command java -version
Step 2 – Download Apache Kafk
Download the Apache Kafka binary files from its official download website. You can also select any nearby mirror to download.
wget http://www-us.apache.org/dist/kafka/2.7.0/kafka_2.13-2.7.0.tgz
Then extract the archive file
tar xzf kafka_2.13-2.7.0.tgz mv kafka_2.13-2.7.0 /usr/local/kafka
Step 3 – Setup Kafka Systemd Unit Files
CentOS 8 uses systemd to manage its services state. So we need to create systemd unit files for the Zookeeper and Kafka service. Which helps us to manage Kafka services to start/stop.
First, create systemd unit file for Zookeeper with below command:
vim /etc/systemd/system/zookeeper.service
Add below contnet:
[Unit] Description=Apache Zookeeper server Documentation=http://zookeeper.apache.org Requires=network.target remote-fs.target After=network.target remote-fs.target [Service] Type=simple ExecStart=/usr/bin/bash /usr/local/kafka/bin/zookeeper-server-start.sh /usr/local/kafka/config/zookeeper.properties ExecStop=/usr/bin/bash /usr/local/kafka/bin/zookeeper-server-stop.sh Restart=on-abnormal [Install] WantedBy=multi-user.target
Save the file and close it.
Next, to create a Kafka systemd unit file using the following command:
vim /etc/systemd/system/kafka.service
Add the below content. Make sure to set the correct JAVA_HOME path as per the Java installed on your system.
[Unit] Description=Apache Kafka Server Documentation=http://kafka.apache.org/documentation.html Requires=zookeeper.service [Service] Type=simple Environment="JAVA_HOME=/usr/lib/jvm/jre-11-openjdk" ExecStart=/usr/bin/bash /usr/local/kafka/bin/kafka-server-start.sh /usr/local/kafka/config/server.properties ExecStop=/usr/bin/bash /usr/local/kafka/bin/kafka-server-stop.sh [Install] WantedBy=multi-user.target
Save the file and close it.
Reload the systemd daemon to apply changes.
systemctl daemon-reload
Step 4 – Start Kafka Server
Kafka required ZooKeeper so first, start a ZooKeeper server on your system. You can use the script available with Kafka to get start a single-node ZooKeeper instance.
sudo systemctl start zookeeper
Now start the Kafka server and view the running status:
sudo systemctl start kafka sudo systemctl status kafka
All done. You have successfully installed Kafka on your CentOS 8. The next part of this tutorial will help you to create topics in the Kafka cluster and work with the Kafka producer and consumer service.
Step 5 – Creating Topics in Apache Kafka
Apache Kafka provides multiple shell script to work on it. First, create a topic named “testTopic” with a single partition with single replica:
cd /usr/local/kafka bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic testTopic Created topic testTopic.