Upsolver joins Qlik to deliver real-time data, Iceberg optimizations and cost savings under a single platform

Comparing Message Brokers and Event Processing Tools

Message queue software or brokers are used to enable inter-process communication between systems and applications. These queues provide asynchronous protocols that allow senders and receivers to communicate remotely and at different times. Messages consist of requests, replies, alerts, or other log files, depending on the communication requirements. Additionally, the message queue facilitates end-to-end or service-to-service communications by storing, processing, and deleting actions as soon as they are completed. Message queues typically used in large, middle-ware message-oriented systems can also subscribe to the publisher/subscriber pattern.

There are a variety of message brokers and event processing tools available for use. And each has its own use case. Before we look at message brokers like Apache Kafka, Amazon Managed Kafka, and Confluent Cloud, as well as event processing tools like Amazon Kinesis, Azure Event Hub, and Google Pub/Sub, let’s consider a brief description of a typical message broker architecture. 

The message broker architecture

As highlighted above, the message broker’s fundamental role is to route messages between applications. It is similar to a mediator or intermediary that has been called in to mediate between two parties where there has been a breakdown in communications. For instance, let’s assume that a complex legal contract must be drawn up between a construction company and a client, detailing the construction of an apartment block. In summary, construction contracts are becoming increasingly complex and litigious. The average agreement can be as long as 500 pages, with the client attempting to replace all responsibility on the construction company. This process can take a long time with multiple contract reviews, driving up the cost of the construction project and delaying the project’s start. Consequently, a mediator is called in during the initial stages of the contractual discussion to ensure that the process is concluded quickly and cost-effectively. 

In the same way, the message broker acts to facilitate communications, sending messages between disparate applications. The software message broker plays a different role in that it enables the decoupling of endpoints between itself and the software it serves. In other words, the broker architecture is structured so that the individual application components operate independently from each other while still interfacing with each other via the message broker. 

The message broker architecture can be used to process messages or events. Event processing is “computing that performs operations on events as they are reported in a system that observes or listens for events.”

Event processing had its origins in active database systems. Ergo, triggers are activated based on the arrival of a specific event or record in the database. Event-based systems are built to react to events from the external environment. Additionally, event-based architecture is based on event-driven best practices and principles. The Google Cloud documentation notes that “any architecture for complex event processing (CEP) must have the ability to import data from multiple, heterogeneous sources, apply complex business rules, and drive outbound actions.

Now that we have an understanding of what a message broker is, its architecture and how it’s being used to process events, let’s take a look at the following message brokers and different tools available for event processing to determine which one will provide the best solution to your use case.

Apache Kafka and Managed Kafka Distributions

Apache Kafka

The Apache Kafka website describes Kafka as an “open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.”

In other words, Kafka is a reliable message broker, a topic-based system, that enables applications to process, store, and reroute messages or streamed data between the applications.

Kafka collects messages from applications via a pull-type method. In other words, Kafka pulls the messages from the different software and reroutes them to the receiving applications. Another essential point to note is that Kafka is topic-based. Therefore, consumers subscribe to specific topics, consuming the data and sending it to the message’s intended recipient.

Apache Kafka is used on its own as a vanilla-flavored message broker. Secondly, there are also managed Kafka distributions, including Amazon Managed Kafka and Confluent Cloud.

1. Amazon Managed Streaming for Apache Kafka (Amazon MSK) 

This is a fully managed service offered by Amazon AWS that simplifies building and running applications that use Kafka to manage message queues and process streaming data. With MSK, you can use “native Kafka APIs to populate data lakes, stream changes to and from databases, and power machine learning and analytics applications.” 

The most important benefits include the fact that MSK is fully managed, fully compatible with existing native Apache Kafka applications, highly available, extremely secure, and includes Elastic stream processing. 

Amazon MSK use cases include the management of native Kafka clusters as a mechanism to scale up capacity, reduce overhead, and manage the clusters with confidence. MSK is also used to maintain and scale Kafka clusters, enabling an end-to-end ingestion pipeline supported by a fully managed service. Lastly, Apache Kafka is used as a real-time message broker between different microservices. MSK has simplified the setup and maintenance, allowing the client time and capacity to build innovative new functionality. 

2. Confluent Cloud

Confluent Cloud is defined as the “complete event streaming platform for Apache Kafka.” In other words, it is a fully managed cloud service for Kafka, allowing users to accelerate the development of event-driven services and real-time applications without having to manage the Kafka cluster. Lastly, Confluent Cloud is available on AWS, so you can start streaming with “on-demand provisioning of elastically scalable clusters.” 

Confluent Cloud is best used when a user wants to host Kafka in the cloud. It is useful as a message broker that facilitates communications between enterprise-level systems and integrates the data generated by each system into a central location like the Amazon S3 data lake. 

Public cloud-based event processing

Public cloud-based event processing is simply event processing using cloud providers’ services, such as Amazons AWS, Google Cloud, and Microsoft Azure.

Therefore, lets expand on this discussion and look at the three top event processing tools, Amazon Kinesis, Azure Event Hubs, and Google Pub/Sub.

1. Amazon Kinesis

Amazon Kinesis is an event processing tool that “makes it easy to collect, process, and analyze real-time streaming data.” Much of the data collected, processed, and analyzed by Kinesis is voluminous unstructured data like video, audio, application logs, IoT telemetry data, and website clickstreams.

Kinesis’s real benefit is that it processes and analyzes data the instant it arrives, instead of waiting for all of the data to be streamed to a central location and then start processing it.

Some of the benefits of using Kinesis include the ability to ingest, buffer, and process streaming data in real-time. It is fully managed so that you can run Kinesis without needing to manage any infrastructure. And it is scalable in that it can handle any amount of streaming data and process data from hundreds of thousands of sources with very low latencies.

2. Azure Event Hubs

Azure Event Hubs is a fully managed real-time data ingestion service. It streams millions of events per second from multiple sources to create dynamic data pipelines, which allow the business to respond to any challenges that these event logs indicate instantly.

Secondly, Event Hubs integrates seamlessly with other Azure services to form a robust data architecture that returns valuable insights into the data ingested along these pipelines. It is also worth noting that Event Hubs can talk to Apache Kafka clients.

3. Google Pub/Sub

Google Pub/Sub is an asynchronous messaging service that decouples services that produce events and services that process events. In other words, Pub/Sub acts as a mediator between event-creation services and event-processing services. Consequently, you can use Pub/Sub as an event ingestion and delivery tool for streaming analytics pipelines. Or it can also be used as message-oriented middleware, much like the mediator described in this article.

Google describes this tools offering as durable message storage and real-time message delivery with high availability and consistent performance at scale.

As with Kafka, Pub/Sub is a subscription-based message broker that divides messages received up into topics. The subscriber must sign up to receive messages about a specific topic. Messages are pushed to a subscribers endpoint, or the subscriber can pull the message from Pub/Subs topic.

Legacy tools 

Lastly, lets consider two of the most prominent legacy message brokers that are still available for use today – RabbitMQ and ActiveMQ.

1. RabbitMQ

RabbitMQ vs Kafka: RabbitMQ was first released in 2007 and was primarily used as a component in SOA and messaging systems. In summary, RabbitMQ is a general-purpose message broker supporting protocols such as MQTT, STOMP, and AMQP.

It can process high-throughput and is ideal use cases where there is exceptionally high movement of transactional records like an online processing system. Additionally, it is really good at handling background jobs or acting as a message broker between microservices. It is also a useful low-latency message broker because its primary mechanism is based on a push model. In other words, it distributes messages quickly and individually, making sure that the work is parallelized evenly, and messages are processed in the order they arrive in the queue.

2. Apache ActiveMQ

Apache ActiveMQ is a popular open-source Java-based messenger service. It supports industry standard protocols, along with RabbitMQ, like MQTT, STOMP, and AMQP.

There are currently two variants available, ActiveMQ 5 Classic and ActiveMQ Artemis. The biggest differences between the two is that the Classic version is the standard, long-established version that is backwards compatible, it is endlessly pluggable, and it supports generations of applications.

Artemis on the other hand is a “high-performance non-blocking architecture for the next generation of message-driven applications.”

Some of the benefits of using ActiveMQ to manage your message queue is its load-balancing capabilities, its data security and data protection mechanism. It uses patterns to easily integrate with enterprise architectures. And its deployment is flexible. In fact, it is most commonly deployed as a standalone service, isolating ActiveMQ from all other application.

Published in: Blog , Cloud Architecture
Upsolver Team
Upsolver Team

Upsolver enables any data engineer to build continuous SQL data pipelines for cloud data lake. Our team of expert solution architects is always available to chat about your next data project. Get in touch

Keep up with the latest cloud best practices and industry trends

Get weekly insights from the technical experts at Upsolver.

Subscribe

Templates

All Templates

Explore our expert-made templates & start with the right one for you.