[wlug] Re: String matching problem

28 Jan 2005


      ---------- Original Message ----------
...
...
I'm working on a problem.  I found buddy wearing ACM T-Shirt(World
final, so cool!).  So I post it here thinking maybe you can give me
some hints.  It's described as following: Given a string and a pool
of strings, find which string in pool is best matched the given one.
I know I can use dynamic programming to find out the largest common
substring of two strings.
...
But how to apply on my problem?
Hi,
I've looked at something like this before in the past. The terms you
want to look for are "string edit distance" or "Levenshtein" on
a google search. Basically it gives a score for how many additions
and deletions are required to turn one string into another.
The pool has ~2 million strings.... Each string should be around 1k in length.  so I think comparing one by one is not a good method.  Any opinion? 

Sincerely,
    GUO Zhijun
jamguo(a)sh163.net