Matching imprecise company names in Java

I have a company database My application receives data that references companies by name, but the name may not exactly match the value in the database I need to match the incoming data with the company it refers to

For example, my database might contain a company called "A. B. widgets & Co Ltd." My incoming data may refer to "ab widgets Limited", "a.b.widgets and CO" or "a B widgets"

Some words in the company name (a, B widgets) are more important for matching than others (CO, Ltd, Inc, etc.) It is important to avoid mismatches

The number of companies is small enough that I can maintain their name map in memory, i.e I can choose to use Java instead of SQL to find the correct name

How would you do this with Java?

Solution

You can click dB / Map & amp; Standardize the format as much as possible Input (i.e. convert to uppercase / lowercase), and then use Levenshtein (Edit) distance metric in dynamic programming to score the input according to all known names

Then, you can ask users to confirm the match & if they don't like it, give them the option to enter this value into your list of known names (second idea - this may give users too much power...)

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>