Matching imprecise company names in Java
I have a company database My application receives data that references companies by name, but the name may not exactly match the value in the database I need to match the incoming data with the company it refers to
For example, my database might contain a company called "A. B. widgets & Co Ltd." My incoming data may refer to "ab widgets Limited", "a.b.widgets and CO" or "a B widgets"
Some words in the company name (a, B widgets) are more important for matching than others (CO, Ltd, Inc, etc.) It is important to avoid mismatches
The number of companies is small enough that I can maintain their name map in memory, i.e I can choose to use Java instead of SQL to find the correct name
How would you do this with Java?
Solution
You can click dB / Map & amp; Standardize the format as much as possible Input (i.e. convert to uppercase / lowercase), and then use Levenshtein (Edit) distance metric in dynamic programming to score the input according to all known names
Then, you can ask users to confirm the match & if they don't like it, give them the option to enter this value into your list of known names (second idea - this may give users too much power...)