Originally developed for large enterprises, Message Brokers allowed software engineers to break down complex systems into independent services. They would then act as a middleware between those services, effectively decoupling components that previously depended on each other. This significantly improved maintainability, and made systems easier to scale to handle increased workloads.
Message brokers are now widely used by businesses and enterprises of all sizes, and they come with a variety of advanced features. The majority of message brokers are free and open-source, and they work with all major operating systems. Others are proprietary software, but they are still inexpensive.
Broadway is an Elixir library that makes working with message brokers a breeze. Broadway is built on top of GenStage, which is another Elixir library that is used to construct an event/message exchange between processes. If you are uncertain about how GenStage works, I strongly recommend you checking my article about it before moving on.
Broadway makes it easy to build data processing pipelines that consume events from external sources. This is why they are also called data ingestion pipelines. It supports the most popular message brokers and requires only a small amount of configuration to get up and running.
At the time of writing, Broadway officially supports the following message brokers:
- Amazon SQS via broadway_sqs
- Apache Kafka via broadway_kafka
- Google Cloud Pub/Sub via broadway_cloud_pub_sub
- RabbitMQ via broadway_rabbitmq
You can also find unofficial packages for Broadway on the Hex registry.
Broadway is an excellent choice when working with popular message brokers. However, it’s not limited to just that. Broadway could be useful in a wide range of use cases where you need a data processing pipeline with dynamic batching built-in. All you have to do is bring your own GenStage producer, and Broadway will happily do the work for you.
The GenStage pipeline generated by Broadway is designed according to the current best practices. It is fault-tolerant and performs graceful shutdowns out of the box, ensuring minimal loss of messages when something unexpected happens. It also has a number of other useful features, such as automatic acknowledgments, dynamic batching, rate limiting, and more. While you can certainly build this all yourself using GenStage, with Broadway you get everything with just a few lines of configuration code.
Some of Broadway's built-in features
Back-pressure is a feature that Broadway has inherited from GenStage. Back-pressure in both libraries allows "consumers" to notify "producers" of their availability to accept data. When a consumer's capacity is reached, producers stop delivering data to that consumer and resume providing data once the customer has cleared its backlog.
Broadway automatically acknowledges messages at the end of the pipeline or in case of errors. Once captured, Broadway enables you to choose how best to handle the errors and the data that caused them.
Using batching in Broadway is an easy way to add another step in your data ingestion pipeline for further processing. It allows you to leverage concurrency, and group relevant messages together to perform operations in bulk.
Using static batching is great when you know how to group your messages in advance. But often messages contain dynamic data, which means that static batching is not going to work. Thankfully, Broadway gives us the option to define batches at runtime, using :batch_key. This is also called dynamic batching.
Fault tolerance with minimal data loss
Broadway's data pipelines are meticulously engineered to prevent data loss. In the event of a failure, producers are segregated from the rest of the pipeline and instantly resubscribed to. User callbacks, on the other hand, are stateless, allowing us to handle any issues locally.
"Broadway integrates with the VM to provide graceful shutdown. By starting Broadway as part of your supervision tree, it will guarantee all events are flushed once the VM shuts down."
Custom failure handling
"Broadway provides a handle_failed/2 callback where developers can outline custom code to handle errors. For example, if they want to move messages to another queue for further processing."
Broadway, which not only simplifies working with message brokers but also takes away the complexity of assembling data processing pipelines.
There are even more features in Broadway that we didn’t explore in this article but are well documented online. Examples include rate limiting, partitioning, and metrics exposed using the telemetry library. There are also various helpers that make testing Broadway pipelines easy. I encourage you to check the official documentation to learn more, I assure you it's worth it.