3 ways to count words in Java String

You can count words in Java String by using the split() method of String. A word is nothing but a non-space character in String, which is separated by one or multiple spaces. By using regular expression to find spaces and split on them will give you an array of all words in given String. This was the easy way to solve this problem as shown here, but if you have been asked to write a program to count a number of words in given String in Java without using any of String utility methods like String.split() or StringTokenizer then it's a little bit challenging for a beginner programmer. It's actually one of the common Java coding questions and I have seen it a couple of times with Java developer interviews of 2 to 4 years of experience. The interviewer put additional constraints like split() is not allowed, you can only use basic methods like charAt(), length(), and substring() along with loop, operators, and other basic programming tools.
In this article, I'll share all three ways to solve this problem i.e. first by using String's split() method and regular expression, second by using StringTokenizer and third without using any library method like above. The third one is the most interesting and very difficult to write a complete solution handling all special characters e.g. non-printable ASCII characters. for our purpose, we assume that space character includes tab, space or new line and anything which is considered as a letter by Character.isLetter() is considered as a word.

Btw, if you are looking for more String based coding problems, you can either check here, or you can buy Cracking the Coding Interview book, which is a collection of more than 190 programming questions and solutions from tech giants like Amazon, Google, Facebook, and Microsoft. It also includes questions from service based companies like Infosys, TCS, and Cognizant.



Solution 1 - Counting word using String.split() method
In this solution, we will use the split() method of java.lang.String class to count the number of words in a given sentence. This solution uses the regular expression "\\s+" to split the String on whitespace. The split method returns an array, the length of array is your number of words in given String.

 public static int countWordsUsingSplit(String input) {
    if (input == null || input.isEmpty()) {
      return 0;
    }

    String[] words = input.split("\\s+");
    return words.length;
  }

If you are new to regular expression in Java, the \s is a character class to detect space including tabs, since \ needs to be escaped in Java, it becomes \\s and because there could be multiple spaces between words we made this regular expression greedy by adding +, hence \\s+ will find one more space and split the String accordingly. See Core Java Volume 1 - Fundamentals by Cay S. Horstmann to learn more about the split() method of String class. This is also the simplest way to count the number of word in a given sentence.



Solution 2 - Counting word in String using StringTokenizer
Constructs a string tokenizer for the specified string. The tokenizer uses the default delimiter set, which is " \t\n\r\f": the space character, the tab character, the newline character, the carriage-return character, and the form-feed character. Delimiter characters themselves will not be treated as tokens


public static int countWordsUsingStringTokenizer(String sentence) {
    if (sentence == null || sentence.isEmpty()) {
      return 0;
    }
    StringTokenizer tokens = new StringTokenizer(sentence);
    return tokens.countTokens();
  }

You can see that we have not given any explicit delimiter to StringTokenizer, it uses the default set of delimiter which is enough to find any whitespace and since words are separated by whitespace, the number of tokens is actually equal to the number of words in given String. See Java How to Program by Dietel for more information on StringTokenizer class in Java.

How to count words in Java String



Solution 3 - Counting word in String without using library method
Here is the code to count a number of words in a given String without using any library or utility method. This is what you may have written in C or C++. It iterates through String array and checks every character. It assume that a word start with a letter and ends with something which is not a letter. Once it encounters a non-letter it increments the counter and starts searching again from the next postion.

 public static int count(String word) {
    if (word == null || word.isEmpty()) {
      return 0;
    }

    int wordCount = 0;

    boolean isWord = false;
    int endOfLine = word.length() - 1;
    char[] characters = word.toCharArray();

    for (int i = 0; i < characters.length; i++) {

      // if the char is a letter, word = true.
      if (Character.isLetter(characters[i]) && i != endOfLine) {
        isWord = true;

        // if char isn't a letter and there have been letters before,
        // counter goes up.
      } else if (!Character.isLetter(characters[i]) && isWord) {
        wordCount++;
        isWord = false;

        // last word of String; if it doesn't end with a non letter, it
        // wouldn't count without this.
      } else if (Character.isLetter(characters[i]) && i == endOfLine) {
        wordCount++;
      }
    }

    return wordCount;
  }

If you want to practice some more of this type of question, you can also check the Cracking the Coding Interview, one of the biggest collection of Programming Questions, and Solutions from technical interviews. 

3 ways to count words in Java String



Java Program to count a number of words in String

Here is our complete Java program to count a number of words in a given String sentence. It demonstrates all three examples we have seen so far e.g. using String.split() method, using StringTokenizer and writing your own method to count the number of words without using any third party library e.g. Google Guava or Apache Commons.


import java.util.StringTokenizer;

/*
 * Java Program to count number of words in String.
 * This program solves the problem in three ways,
 * by using String.split(), StringTokenizer, and
 * without any of them by just writing own logic
 */
public class Main {

  public static void main(String[] args) {

    String[] testdata = { "", null, "One", "O", "Java and C++", "a b c",
        "YouAre,best" };

    for (String input : testdata) {
      System.out.printf(
          "Number of words in stirng '%s' using split() is : %d %n", input,
          countWordsUsingSplit(input));
      System.out.printf(
          "Number of words in stirng '%s' using StringTokenizer is : %d %n",
          input, countWordsUsingStringTokenizer(input));
      System.out.printf("Number of words in stirng '%s' is : %d %n", input,
          count(input));
    }

  }

  /**
   * Count number of words in given String using split() and regular expression
   * 
   * @param input
   * @return number of words
   */
  public static int countWordsUsingSplit(String input) {
    if (input == null || input.isEmpty()) {
      return 0;
    }

    String[] words = input.split("\\s+");
    return words.length;
  }

  /**
   * Count number of words in given String using StirngTokenizer
   * 
   * @param sentence
   * @return count of words
   */
  public static int countWordsUsingStringTokenizer(String sentence) {
    if (sentence == null || sentence.isEmpty()) {
      return 0;
    }
    StringTokenizer tokens = new StringTokenizer(sentence);
    return tokens.countTokens();
  }

  /**
   * Count number of words in given String without split() or any other utility
   * method
   * 
   * @param word
   * @return number of words separated by space
   */
  public static int count(String word) {
    if (word == null || word.isEmpty()) {
      return 0;
    }

    int wordCount = 0;

    boolean isWord = false;
    int endOfLine = word.length() - 1;
    char[] characters = word.toCharArray();

    for (int i = 0; i < characters.length; i++) {

      // if the char is a letter, word = true.
      if (Character.isLetter(characters[i]) && i != endOfLine) {
        isWord = true;

        // if char isn't a letter and there have been letters before,
        // counter goes up.
      } else if (!Character.isLetter(characters[i]) && isWord) {
        wordCount++;
        isWord = false;

        // last word of String; if it doesn't end with a non letter, it
        // wouldn't count without this.
      } else if (Character.isLetter(characters[i]) && i == endOfLine) {
        wordCount++;
      }
    }

    return wordCount;
  }

}

Output
Number of words in string '' using split() is : 0 
Number of words in string '' using StringTokenizer is : 0 
Number of words in string '' is : 0 
Number of words in string 'null' using split() is : 0 
Number of words in string 'null' using StringTokenizer is : 0 
Number of words in string 'null' is : 0 
Number of words in string 'One' using split() is : 1 
Number of words in string 'One' using StringTokenizer is : 1 
Number of words in string 'One' is : 1 
Number of words in string 'O' using split() is : 1 
Number of words in string 'O' using StringTokenizer is : 1 
Number of words in string 'O' is : 1 
Number of words in string 'Java and C++' using split() is : 3 
Number of words in string 'Java and C++' using StringTokenizer is : 3 
Number of words in string 'Java and C++' is : 3 
Number of words in string 'a b c' using split() is : 3 
Number of words in string 'a b c' using StringTokenizer is : 3 
Number of words in string 'a b c' is : 3 
Number of words in string 'YouAre,best' using split() is : 1 
Number of words in string 'YouAre,best' using StringTokenizer is : 1 
Number of words in string 'YouAre,best' is : 2 


That's all about how to count a number of words in Java String. I have shown you three ways to solve this problem, first by using split() method and regular expression, second by using StringTokenizer class and third without using any library method to solve this problem directly e.g. split or StringTokenizer. Depending upon your need, you can use any of these methods. Interviewer usually asks you do it on the third way, so be ready for that. You can also solve more String problems given on Cracking the Code Interview book to gain more practice and confidence.


Other String based coding problems you may like to solve
  • How to reverse a String in place in Java? (solution)
  • How to find all permutations of a given String in Java? (solution)
  • How to check if two given Strings are Anagram in Java? (solution)
  • How to check if a String contains duplicate characters in Java? (solution)
  • How to find the highest occurring word from a given file in Java? (solution)
  • How to count vowels and consonants in given String in Java? (solution)
  • How to check if given String is palindrome or not in Java? (solution)
  • How to remove duplicate characters from String in Java? (solution)
  • How to reverse words in a given String in Java? (solution)


References
java.lang.String documentation


1 comment:

  1. we can also use direct method str.length();

    ReplyDelete