Thursday 30 March 2023

How to Implement Fuzzy Search or Approximate String Search in Java

What Is Fuzzy Matching ?

Fuzzy Matching or Approximate String Matching is among the most discussed issues in IT.
It is a method that offers an improved ability to identify two elements of text, strings, or entries that are approximately similar but are not precisely the same.
In other words, a fuzzy method may be applied when an exact match is not found for phrases or sentences on a database. Indeed, this method attempts to find a match greater than the application-defined match percentage threshold.

As below mentioned code you can implement Fuzzy Search. 


import org.apache.commons.codec.language.Soundex;
import org.apache.commons.text.similarity.LevenshteinDistance;
import me.xdrop.fuzzywuzzy.FuzzySearch;

public class FuzzySearch {

   public boolean searchFuzzy(String searchableName, String targetName) {
       boolean matches=false;
       int  fuzzySearchNameThreashold=70;
        System.out.println("FuzzySearch.tokenSetRatio(searchableName, targetName) >> "+FuzzySearch.tokenSetRatio(searchableName, targetName));
        System.out.println("levenshteinDistanceMatch(searchableName, targetName) >> "+levenshteinDistanceMatch(searchableName, targetName));
       if((soundexMatch(searchableName,targetName)
        || FuzzySearch.tokenSetRatio(searchableName, targetName) >= fuzzySearchNameThreashold
        || levenshteinDistanceMatch(searchableName, targetName) >= fuzzySearchNameThreashold)) {    
              matches = true;
        }
       return matches;
    }

   public boolean soundexMatch(String searchableName, String targetName) {
         Soundex soundex = new Soundex();
         return soundex.soundex(searchableName).equals(soundex.soundex(targetName));
   }

     public double levenshteinDistanceMatch(String searchableName, String targetName) {
         LevenshteinDistance levenshteinDistance = new LevenshteinDistance();
         return (100 - Math.round((((double) levenshteinDistance.apply(searchableName, targetName) / searchableName.length()) * 100)));
       }

 public static void main(String aa[]) {
       FuzzySearchUtil fuzzySearchUtil = new FuzzySearchUtil();
       boolean flag = fuzzySearchUtil.searchFuzzy("Amn","Aman Kumar");
       System.out.println("flag >> "+flag);
 }

    }
}

----------------------- Maven Dependency -------------------------

           <dependency>
                <groupId>me.xdrop</groupId>
                <artifactId>fuzzywuzzy</artifactId>
                <version>1.3.1</version>
             </dependency>

        <!-- https://mvnrepository.com/artifact/commons-codec/commons-codec -->
             <dependency>
                 <groupId>commons-codec</groupId>
                 <artifactId>commons-codec</artifactId>
                 <version>1.10</version>
             </dependency>

           <dependency>
              <groupId>org.apache.commons</groupId>
              <artifactId>commons-text</artifactId>
              <version>1.3</version>
           </dependency>