Map Reduce Example in Java 8

The map-reduce concept is one of the powerful concepts in computer programming, particularly on functional programming which utilizes the power of distributed and parallel processing to solve a big and heavy problem in quick time. From Java 8 onwards, Java also got this powerful feature from the functional programming world. Many of the services provided on the internet like Google Search are based on the concept of the map and reduce. In map reduce a job is usually split from the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner. The framework then sorts the outputs of the map operation, which are then supplied to the reduce tasks.

For example, suppose you want to calculate the average age of all the people in a given town, instead of counting sequentially you can divide the problem into locality or zip code and then calculate average age for each locality and then combine them using a reduction operation to calculate the average age for the town.

It wasn't possible earlier to perform functional programming in Java earlier, at least, not so easy, but Java 8 also introduced a powerful Stream API which provides many functional programming operations like mapfilterflatmap, reduce, collect and so on.

There are many such scenarios where map-reduce can really be a game changer and solve a problem in quick time which wasn't possible earlier due to the sheer scale of data.

As I have said, earlier Java didn't have the support of performing bulk data operation and it wasn't possible to load all the data into memory because of their size but from Java 8 onwards, we have an abstraction called Stream which allows us to perform the bulk data operation on a large chunk of data without loading them into memory.

By using these methods, together with implicit parallelism provided by Stream API, you can now write code in Java which can work efficiently with Big data which wasn't that easy earlier.

If you want to learn more about Stream API features and how to use them efficiently in your code, I suggest you take a look at this Java Streams API Developer Guide by Nelson Djalo. It will provide you with first-hand experience of using Stream API in your code.




Map Reduce Example in Java 8

In this Java 8 tutorial, we will go over the map function in Java 8. It is used to implement MapReduce type operations. Essentially we map a set of values then we reduce it with a function such as average or sum into a single number. Let's take a look at this sample which will do an average of numbers the old and new ways.

Again, the old method is many lines of code which is designed to do this all very sequentially. This code doesn't take advantage of a multi-core processor, which is available in all modern servers.

Sure, the example is simple, but imagine if you have millions of items you are processing and you have an 8 core machine or 32 core server. Most of those CPU cores would be completely wasted and idle during this long calculation and it will frustrate your clients which are waiting for the response.

Now if we look at the simple one-line Java 8 way of doing it:

double average = peoples.stream().mapToInt(p-> p.getAge())
                        .average().getAsDouble();

This uses the concept of parallelism, where it creates a parallel stream out of the array, which can be processed by multiple cores and then finally joined back into to map the results together.

The map function will create a stream containing only the values with meet the given criteria, then the average function will reduce this map into a single value.

Now all 8 of your cores are processing this calculation so it should run much faster.

As some wise man has said that, a picture is worth more than a thousand words, here is a picture which shows how map reduce concept works in practice.

Map Reduce Example in Java 8


In this, map function could be cutting operation which applies to each vegetable and then their result is combined using Reduce to prepare a nice Sandwich you can enjoy.

If you want to learn more about the map and reduce operation, or in general, functional programming features introduced in Java 8 then I suggest you see the Java 8 New Features In Simple Way course on Udemy. A nice little but a still useful course to learn all the Java 8 features which matter.




Java Program to demonstrate Map reduce operation

Here is our Java program which will teach you how to use the map and reduce in Java 8. The reduce operation is also known as the fold in the functional programming world and very helpful with Collection class which holds a lot of items.

You can perform a lot of bulk operation, calculating stats using the map and reduce in Java 8. As a Java developer you must know how to use map(), flatMap() and filter() method, these three are key methods for doing functional programming in Java 8.

If you want to learn more about functional programming in Java 8, I also suggest joining From Collections to Streams in Java 8 Using Lambda Expressions course on Pluralsight, one of the best course to start with functional programming in Java.

package test;
 
import java.util.ArrayList;
import java.util.List;
 
/**
 * Java Program to demonstrate how to do map reduce in Java. Map, reduce also
 * known as fold is common operation while dealing with Collection in Java.

 * @author Javin Paul
 */
public class Test {
 
    public static void main(String args[]) {
        List<Employee> peoples = new ArrayList<>();
        peoples.add(new Employee(101, "Victor", 23));
        peoples.add(new Employee(102, "Rick", 21));
        peoples.add(new Employee(103, "Sam", 25));
        peoples.add(new Employee(104, "John", 27));
        peoples.add(new Employee(105, "Grover", 23));
        peoples.add(new Employee(106, "Adam", 22));
        peoples.add(new Employee(107, "Samy", 224));
        peoples.add(new Employee(108, "Duke", 29));
      
        double average = calculateAverage(peoples);
        System.out.println("Average age of employees are (classic way) : " 
                            + average);
       
        average = average(peoples);
        System.out.println("Average age of employees are (lambda way) : " 
                            + average);
    }
   
    /**
     * Java Method to calculate average from a list of object without using
     * lambdas, doing it on classical java way.
     * @param employees
     * @return average age of given list of Employee
     */
    private static double calculateAverage(List<? extends Employee> employees){
        int totalEmployee = employees.size();
        double sum = 0;
        for(Employee e : employees){
            sum += e.getAge();
        }
       
        double average = sum/totalEmployee;
        return average;
    }
   
    /**
     * Java method which uses map reduce to calculate average of list of 
     * employees in JDK 8.
     * @param peoples
     * @return average age of given list of Employees
     */
    private static double average(List<? extends Employee> peoples){
        return peoples.stream().mapToInt(p-> p.getAge())
                               .average()
                               .getAsDouble();
    }
 
}
 
class Employee{
    private final int id;
    private final String name;
    private final int age;
   
    public Employee(int id, String name, int age){
        this.id = id;
        this.name = name;
        this.age = age;
    }
   
    public int getId(){
        return id;
    }
   
    public String getName(){
        return name;
    }
   
    public int getAge(){
        return age;
    }
}
 
 
Output:
Average age of employees are (classic way) : 49.25
Average age of employees are (lambda way) : 49.25

You can see that the Java 8 way is much more succinct and readable than the iterative version of pre-Java 8 code.

Though the code doesn't tell you the impact when the amount of data increases, you can easily make the Java 8 version parallel by just replacing stream() with parallelStream() but you need to do a lot of hard work to parallelize the iterative version of the code.

That's all about how to do map-reduce in Java 8.  This is rather a simple example of a powerful concept like map reduce but most important thing is that now you can use this concept in Java 8 with built-in map() and reduce() method of Stream class. If you are interested in learning more about Stream API or Functional Programming in Java, here are some of the useful resources to learn further and strengthen your knowledge.

Further Learning
The Complete Java MasterClass
Java SE 8 New Features
Java 8 in Action by Raoul-Gabriel Urma
From Collections to Streams in Java 8 Using Lambda Expressions


Other Java 8 tutorials you may like
  • 10 Example of Lambda Expression in Java 8 (see here)
  • 10 Example of forEach() method in Java 8 (example)
  • 10 Example of Joining String in Java 8 (see here
  • 10 Example of Stream API in Java 8 (see here)
  • How to use peek() method of Stream in Java 8? (example)
  • 10 Example of converting a List to Map in Java 8 (example)
  • 20 Example of LocalDate and LocalTime in Java 8 (see here)
  • Difference between map() and flatMap() in Java 8? (answer)
  • How to use Stream.flatMap in Java 8(example)
  • How to use Stream.map() in Java 8 (example)
  • 5 Books to Learn Java 8 and Functional Programming (list)
  • 5 Free Courses to learn Java 8 and 9 (courses)
  • Java 8 Interview Questions with Answers (course)

Thanks for reading this article so far. If you liked this article then please share with your friends and colleagues. If you have any questions or feedback then please drop a note.

P.S.: If you just want to learn more about new features in Java 8 then please see the course What's New in Java 8. It explains all the important features of Java 8 e.g. lambda expressions, streams, functional interfaces, Optional, new Date Time API and other miscellaneous changes.

5 comments:

  1. If you are performing map-reduce in large scale, which is always the case, you should use parallelStream() instead of stream() to get full advantage of multiple cores of your server.

    ReplyDelete
  2. Yes I think Javin is right. If you are using stream it will still be executed on a single core just like a normal sequential routine. For using parallelism we will have to use parallelSteam.

    ReplyDelete
  3. When you use parallel stream check you cpu in task manager. All core of cpu are used . So in future if we moved from quad core to octa core performance will increase.

    ReplyDelete
    Replies
    1. Hello Rohan, where exactly do you see the core get utilized? Do you mean Task Manager - Performance tab - Open resource Monitor and then CPU tab?

      Delete