Kafka
Basics of Kafka
Introduction
- Till now we have written program which stores information in databases. Databases encourages to think of this world as a thing not as an event.
- Due to this, some people thought it’s better to store data as events rather than data as thing.
- Events can be stores in new storage named as logs rather than database.
- Apache Kafka is the system for managing these logs.
- In Apache Kafka it is called as Topic. Topic is just an ordered collection of events In a durable way.
- Here durable way means they are written to discs and they are replicating (they are storing in more than one disk and more than one servers. So that during hardware failure they will not lose their data.).
- Topics can store all these data for small amount of time or relatively large amount of year. Topic can be very small or enormous there is nothing about the economics of Kafka that says topic has to be large and nothing about the architecture of Kafka that says that they have to be small.
- While dealing with database, Kafka usually takes data in and out. It basically takes data as records and provides it for the services.
- Kafka does the work of connection between Database and external system.
- Kafka Stream is the one which handles frameworks and infrastructure and kind a undifferentiate stuff that we have to do to get our work done.
It’s okay to store data in Kafka
It’s very common to think whether to use Kafka for longer term or not as we knew Kafka stores logs as records.
- Now here first question arises that whether we can treat this log like a file and use it as a source of truth store of your data. Answer to this is very obvious and it’s easily possible if you will set the retention to “forever” or enable log compaction on a topic, this data will be kept for all time.
- But most of the people don’t ask the first question, mainly they are interested in why we might want to do this?? Mostly people think it’s insane to do this, but it’s really not because people do this all time. Kafka is actually designed for this work.
- So is it crazy to do this? The answer is no, there’s nothing crazy about storing data in Kafka: it works well for this because it was designed to do it. Data in Kafka is persisted to disk, checksummed, and replicated for fault tolerance. Accumulating more data will not make it work slow.
Log Compaction
- It ensures that Kafka will always the last known value for each message key within the log of data for single topic partition.
With all this, let’s end this blog as this is it of Kafka Basics. If you will find any issue regarding concept or code, you can message me on my Twitter or LinkedIn. The next blog will be published on 26/02/23.
Some words about me
I’m Mohit.❤️ You can also call me Chessman. I’m a Machine learning Developer and a competitive programmer. Most of my time is spent staring at a computer screen. During the day, I am usually programming, working to derive insight from large datasets. My skills include Data Analysis, Data Visualization, Machine learning, and Deep Learning. I have developed a strong acumen for problem-solving, and I enjoy occasional challenges.