Using Hazelcast Jet pipeline API in Java applications

21 Sep 2023

Hazelcast Jet is an open-source, distributed computing platform for carrying out fast and efficient data processing tasks. It provides an easy-to-use Pipeline API that allows developers to build powerful data processing workflows and execute them in a distributed manner.

In this blog post, we will explore how to utilize the Hazelcast Jet Pipeline API in Java applications to perform data processing tasks.

Setting Up Hazelcast Jet

To get started, you need to include the Hazelcast Jet dependency in your Java project. You can do this by adding the following Maven or Gradle dependency to your build configuration:

<dependency>
    <groupId>com.hazelcast</groupId>
    <artifactId>hazelcast-jet</artifactId>
    <version>4.5.1</version>
</dependency>

Once you have included the dependency, you are ready to start using the Hazelcast Jet Pipeline API.

Building a Simple Pipeline

Let’s begin by building a simple data processing pipeline using the Hazelcast Jet Pipeline API. Assume we have a collection of integers and we want to find the sum of all the numbers in the collection.

import com.hazelcast.jet.Jet;
import com.hazelcast.jet.JetInstance;
import com.hazelcast.jet.pipeline.Pipeline;
import com.hazelcast.jet.pipeline.Sinks;
import com.hazelcast.jet.pipeline.Sources;

public class SimplePipelineExample {

    public static void main(String[] args) {
        JetInstance jet = Jet.newJetInstance();

        Pipeline pipeline = Pipeline.create();

        pipeline
            .readFrom(Sources.list("numbers"))
            .map(integer -> integer * 2)
            .aggregate(summingInt(integer -> integer))
            .writeTo(Sinks.logger());

        jet.newJob(pipeline).join();

        Jet.shutdownAll();
    }
}

In this example, we create a pipeline and define the following stages:

Read: We read the data from a Hazelcast IMap named “numbers”.
Map: We multiply each number by 2.
Aggregate: We sum up all the numbers using the summingInt aggregation function.
Write: We log the final result using the logger() sink.

Finally, we submit the pipeline to the Hazelcast Jet cluster and wait for it to complete. Once the job is done, we shut down the Hazelcast Jet instance.

Conclusion

The Hazelcast Jet Pipeline API provides a simple and intuitive way to build and execute data processing workflows in Java applications. With its powerful set of operators and functions, you can perform complex data transformations and aggregations with ease.

In this blog post, we covered the basics of using the Hazelcast Jet Pipeline API, including setting up Hazelcast Jet, building a simple pipeline, and executing it in a distributed manner. You can now start exploring the full potential of Hazelcast Jet to solve your data processing challenges.

#jet #dataProcessing #Java #HazelcastJet