
---------- Original Message ----------
I'm working on a problem. I found buddy wearing ACM T-Shirt(World final, so cool!). So I post it here thinking maybe you can give me some hints. It's described as following: Given a string and a pool of strings, find which string in pool is best matched the given one. I know I can use dynamic programming to find out the largest common substring of two strings.
But how to apply on my problem?
Hi, I've looked at something like this before in the past. The terms you want to look for are "string edit distance" or "Levenshtein" on a google search. Basically it gives a score for how many additions and deletions are required to turn one string into another.
The pool has ~2 million strings.... Each string should be around 1k in length. so I think comparing one by one is not a good method. Any opinion? Sincerely, GUO Zhijun jamguo(a)sh163.net