How to find highest repeating word from a text File in Java - Word Count Problem

How to find the word and their count from a text file is another frequently asked coding question from Java interviews. The logic to solve this problem is similar to what we have seen in how to find duplicate words in a String. In the first step you need to build a word Map by reading contents of a text File. This Map should contain word as a key and their count as value. Once you have this Map ready, you can simply sort the Map based upon values. If you don't know how to sort a Map on values, see this tutorial first. It will teach you by sorting HashMap on values. Now getting key and value in sorted should be easy, but remember HashMap doesn't maintain order, so you need to use a List to keep the entry in sorted order. Once you got this list, you can simply loop over the list and print each key and value from the entry. This way, you can also create a table of words and their count in decreasing order.  This problem is sometimes also asked as to print all word and their count in tabular format.

How to find highest repeated word from a file

Here is the Java program to find the duplicate word which has occurred maximum number of times in a file. You can also print frequency of word from highest to lowest because you have the Map, which contains word and their count in sorted order. All you need to do is iterate over each entry of Map and print the keys and values.

Most important part of this solution is sorting all entries. Since Map.Entry doesn't implement the Comparable interface, we need to write our own custom Comparator to sort the entries. If you look at my implementation, I am comparing entries on their values because that's what we want. Many programmer says that why not use the LinkedHashMap class? but remember, the LinkedHashMap class keeps the keys in sorted order, not the values. So you need this special Comparator to compare values and store them in List.

Here is one approach to solve this problem using map-reduce technique:

How to find word and their count from a file in Java

Java Program to Print word and their count from File

import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Set;
import java.util.StringTokenizer;
import java.util.regex.Pattern;
 * Java program to find count of repeated words in a file.
 * @author
public class Problem {

    public static void main(String args[]) {
        Map<String, Integer> wordMap = buildWordMap("C:/temp/words.txt");
        List<Entry<String, Integer>> list = sortByValueInDecreasingOrder(wordMap);
        System.out.println("List of repeated word from file and their count");
        for (Map.Entry<String, Integer> entry : list) {
            if (entry.getValue() > 1) {
                System.out.println(entry.getKey() + " => " + entry.getValue());

    public static Map<String, Integer> buildWordMap(String fileName) {
        // Using diamond operator for clean code
        Map<String, Integer> wordMap = new HashMap<>();
        // Using try-with-resource statement for automatic resource management
        try (FileInputStream fis = new FileInputStream(fileName);
                DataInputStream dis = new DataInputStream(fis);
                BufferedReader br = new BufferedReader(new InputStreamReader(dis))) {
            // words are separated by whitespace
            Pattern pattern = Pattern.compile("\\s+");
            String line = null;
            while ((line = br.readLine()) != null) {
                // do this if case sensitivity is not required i.e. Java = java
                line = line.toLowerCase();
                String[] words = pattern.split(line);
                for (String word : words) {
                    if (wordMap.containsKey(word)) {
                        wordMap.put(word, (wordMap.get(word) + 1));
                    } else {
                        wordMap.put(word, 1);
        } catch (IOException ioex) {
        return wordMap;

    public static List<Entry<String, Integer>> sortByValueInDecreasingOrder(Map<String, Integer> wordMap) {
        Set<Entry<String, Integer>> entries = wordMap.entrySet();
        List<Entry<String, Integer>> list = new ArrayList<>(entries);
        Collections.sort(list, new Comparator<Map.Entry<String, Integer>>() {
            public int compare(Map.Entry<String, Integer> o1, Map.Entry<String, Integer> o2) {
                return (o2.getValue()).compareTo(o1.getValue());
        return list;

List of repeated word from file and their count
its => 2
of => 2
programming => 2
java => 2
language => 2

Things to note

If you writing code on interviews make sure they are production quality code, which means you must handle as many errors as possible, you must write unit tests, you must comment the code and you do proper resource management. Here are couple of more points to remember:

1) Close files and streams once you are through with it, see this tutorial learn right way to close the stream. If you are in Java 7, just use try-with-resource statement.

2) Since the size of the file is not specified, the interviewer may grill you on cases like What happens if the file is large? With a large file, your program will run out of memory and throw java.lang.OutOfMemory: Java Heap space. One solution for this is to do this task in chunk e.g. first read 20% content, find maximum repeated word on that, then read next 20% content and find repeated maximum by taking the previous maximum in consideration. This way, you don't need to store all words in memory and you can process any arbitrary length file.

3) Alway use Generics for type-safety.

That's all on how to find repeated word from a file and print their count. You can apply the same technique to find duplicate words in a String. Since now you have a sorted list of words and their count, you can also find the maximum, minimum or repeated words which has counted more than the specific number.

Further Reading
If you are preparing for programming job interview then you must prepare for all-important topic e.g. data structure, string, array etc. One book which can help you on this task is the Cracking the Coding Interview book. It contains 150 Programming Questions and Solutions, which is good enough to clear most of the coding interviews.


  1. Nice smile emoticon On java 8 it has done with 4 code line :

    String data = String.join("", Files.readAllLines(Paths.get("D:\\lab.txt")));

    System.out.println(String.format("Word: %s - Occur: %d",text,list.size()));});

    1. Nice solution showing the power of streams, but uses temporarily more memory.

    2. How does the second point solve your memory problem. Out of memory can occur, if the line is to long or the word map gets too big. For a too long line you need a custom read method. For the a big map you need some external storage solution. If you only store the maximum repeated words (like the top 10, did I understand this correctly?) , you may get wrong results.

  2. For an interview, I would also expect that the source follows the "single level of abstraction" principle ( the buildWordMap function).

  3. How to print the line number in which maximum occurances occured , i wanted to show top 5 occurances of the word with the line number

  4. How to print the occurances of the word bases on the line number , i want to print 5 most occurances of the word.