Kafka Summit 2023

May 30, 2023 | by Radicalbit, Events, Fresh news

Kafka Summit is the unmissable event for developers, architects, data engineers, Devops professionals, and streaming data enthusiasts. The two-day appointment hosted by Confluent gathers the Apache Kafka community sharing best practices, learning how to build next-generation systems, and discussing the future of streaming technologies. Radicalbit joined Kafka Summit for the second year in a row – it’s always exciting to meet the Kafka community, listen to novel use cases and learn about approaches to Kafka challenges!

Radicalbit showcased its MLOps platform’s brand-new drift detection modules for real time Machine Learning models, obviously implemented on top of Kafka: the reception has been awesome! We presented two real-time ML use case demos:

  • AI-powered telemetry for Formula 1 cars, displaying real-time driving performance assessment and real-time commentary powered by ChatGPT.
  • Real-Time Fraud Detection, enhanced by Concept Drift monitoring. In this use case, our MLOps platform goes beyond the mere identification of fraudulent activities. It can in fact understand the repeated occurrence of false negatives, to ensure the relevance and accuracy of machine learning models.

Are curious about how our MLOps platform works, simplifies and accelerates developments in adaptable AI and machine learning-powered decision support systems?

For the second year in a row Radicalbit joined Kafka Summit as a speaker: Alessandro Conflitti, Head of Data Science at Radicalbit, presented his talk “Memory Matters: Drift Detection with Low Memory Footprint for ML Models on Kafka Streams.” The talk explores how Radicalbit’s streaming MLOps platform, provides an effective solution to detect drift in models using Kafka streams for input and output data.

Other significant talks held during the event are summarized below. The Leitmotiv of this year’s conference was: do you need stream processing capabilities? Flink is the way to go.

Keynote
Jay Kreps – CEO and Co-founder – Confluent,
Callum Blanchard – Lead Platform Engineer – Flutter,
Martijn Visser – Senior Product Manager – Confluent,
Olivier Jauze – Senior Fellow and CTO of Michelin Experiences -Michelin,
Shaun Clowes – Chief Product Officer – Confluent

After Confluent’s recent acquisition of Immerok, they announced imminent availability of Flink managed service within Confluent Cloud, putting particular emphasis on Flink SQL engine. This makes a crucial impact in the data streaming world: in fact Confluent, which traditionally put strong dedication to sponsoring Kafka Streams and KSQL, now recognizes Apache Flink as the de facto standard for streaming processing, leading to a redefinition of hierarchies in the industry.

As early adopters of Flink for stream processing, we at Radicalbit are thrilled to offer a unique visual pipeline editor that seamlessly supports both Kafka Streams and Flink engines without any modifications.

While we still believe that there will be a place for Kafka Streams, especially in the context of building microservices, we anticipate an early disappearance for KSQL applications.

Kafka Summit 2023 Keynote Speech. Courtesy of Confluent</p>
<p>

Don’t Drop it; Handle it: Reliable Message Reprocessing Patterns for Kafka

Dunith Dhanushka – Senior Developer Advocate – Redpanda Data
Initially expecting a discussion about reprocessing strategies in Kappa architectures for long-retained topics (e.g. reapplying modified business logic) this comprehensive speech actually focused on answering to the following question: how do we effectively handle failing to consume Kafka messages?

Dhanushka began by distinguishing between transient failures, such as errors when contacting a temporarily unavailable look-up server for message enrichment, and non-transient failures, such as schema compatibility errors, for Kafka consumers. Subsequently, he introduced the concept of dead-letter topics, special queues meant to handle tasks such as back-off strategies: Dhanushka suggested creating a dead-letter alter ego for each topic supporting this strategy, keeping the same name and suffixed with ”_dlt”.
Finally, he also introduced the concept of expensive messages: a Kafka message is expensive when losing implies a business impact.

A Kafka Client’s Request: There and Back Again

Danica Fine -Senior Developer Advocate – Confluent
Danica Fine from Confluent gave this awesome speech, enriched by metaphors drawn from the Lord of the Rings saga. She shared the whole data journey happening behind the hood on both client-side and broker-side. Did you know that Kafka brokers implement a Mordor purgatory? We really suggest taking a look at the speech slides as soon as they are available since Danica shared very useful tips for fine-tuning producers and consumers.

The Dark and Dirty Side of Fixing Uneven Partitions

Olena Kutsenko – Sr. Developer Advocate – Aiven,
Olena Babenko – Senior Software Engineer – Aiven
This speech addressed the challenges faced by Kafka users in dealing with unevenly distributed partitions. We certainly know that skewed keys and bad partition strategies dramatically affect streaming processing jobs scalability and performance.

Aiven engineers Olena Babenko and Olena Kutsenko presented potential solutions for re-distributing keys across partitions, which occasionally involve the creation of new topics with fine-tuned partitioning configuration. The downside of this approach is that we need to coordinate producers and consumers working with such topics in order to minimize downtime and prevent data loss.

Kafka Summit 2023 main hall. Courtesy of Confluent</p>
<p>

Real-time Fraudulent Trips Detection

Xueyao Jiang – Principal Data Engineer – FREENOW
FreeNow’s engineer Xueyao Jiang presented a Kafka Streams based software solution specifically designed to detect fraudulent trips and prevent future frauds. These fraudulent trips occur when drivers attempt to persuade passengers to pay for rides outside the application. Xueyao’s solution employs a sophisticated routing processing strategy.

Xueyao acknowledged that limitations of the high-level DSL compelled them to transition to the lower-level Processor API. At Radicalbit, we faced similar challenges while developing our low-code pipeline editor, which allows the codeless definition of Kafka Streams runnable topologies. Implementing out-of-the-happy-path join and aggregate strategies often requires going beyond the SDK’s capabilities and building customized solutions from scratch.

We highly recommend watching Xueyao’s talk once it becomes available to gain insights into their shared-state-store approach among processors. Additionally, if you’re unfamiliar with concepts such as static membership or you don’t know what group.instance.id config does, we encourage you to review the accompanying slides for a better understanding.

Dataflows for Machine Learning Operations

Alex Rakowski – Seldon Technologies Ltd,
Andrei Paleyes – University of Cambridge
We have been happy to attend the speech from Seldon’s engineering team: we have been passionate Seldon users since late 2020 and it’s amazing to get insights about how they leveraged Kafka to design the architecture of seldon-core pipelines. Did you know that seldon-core added support for context-aware generative models operations? Make sure to explore this exciting feature!

 

Is Pseudonymization the Answer to Your GDPR Problems?

Pieter van der Meer – Dataworkz / Essent
This speech addresses the problem of dealing with GDPR compliance and proposes «Pseudonymization» as a technique for replacing identifiable data with artificial artifacts that still enable analysis and processing, while being compliant with EU regulations.

The interesting part lies in being able to get back the original data using additional confidential information.

The main process goes as follows: as soon a record is received through a Kafka topic, it is pseudonymized and at the same time a second record is created with the private information required to get back the original data. The two records are placed on two separated Kafka topics and the one with the additional confidential information is only available to those with the right clearance.

Building Real-Time Applications at Scale: A Case Study in Cyclist Crash Detection

Tomas Neubauer- Quix
In this speech the author explains some challenges his company faced in building a specific application which handles large volumes of data at high speed, namely detection and response to cyclist crashes.

The involved data are telemetry data collected by an app on cyclist smart-phone and it was proved real-time during the talk shaking the speaker phone how the final plot quickly responded to that movement. The talk was very interesting and the speaker shared several interesting details of their work. Highly recommended.

 

Welcome to Kafka Summit 2023. Courtesy of Confluent</p>
<p>

To wrap it up, Kafka Summit 2023 was yet another valuable opportunity to share insights about the trends in stream processing and see first hand how companies and data professionals are tackling impressive business challenges.

If you’d like to get more details about the Kafka Summit 2023, we invite you to also read the article from our sister company, Bitrock.

As Radicalbit have confirmed our privileged position at the intersection of real-time data and artificial intelligence, offering cutting-edge technology to extract more value from stream processing.

If you want to learn more about our MLOps platform, visit our website and start your free trial!

Why do we need MLOps?

Why do we need MLOps?

MLOps and AI infrastructures are topics that have been widely discussed in recent months, even more so after the rise of technologies around LLMs like ChatGPT. In this blog post, we’re going to give a short and gentle introduction to these concepts by introducing their basic aspects.