|SOUNDEX String Comparison|
In the late nineteenth century, United States census officials faced a dilemma. During the process of counting the huddled masses, our public servants created a huge paperwork trail that the law required them to preserve for future historians. With amazing forethought, they realized that people searching for records might not know the exact spelling of their ancestor's name. Was it Smith or Smythe? Chapple, Chapel or Chapelle?
To ease these searches, census officials turned to the Soundex phonetic filing system. This system uses a simple phonetic algorithm to reduce each name to a four character alphanumeric code. The first letter of the code corresponds to the first letter of the last name. The remainder of the code consists of three digits derived from the syllables of the word.
Largely unused outside of the halls of
government and genealogy, the Soundex system is making a comeback in modern
databases. Database developers have long struggled with the problem of
matching words that might not look alike, but actually sound alike. Let's
take a look at the following database table:
You'll notice that you can now detect phonetic similarities in the names simply by looking at the numeric values returned by the SOUNDEX function. Fortunately, computers are much better at comparing numbers than comparing names. To ease this task, we're provided with the DIFFERENCE function to compare strings based upon their Soundex values. Microsoft's function returns an integer result ranging in value from 1 (least similar) to 4 (most similar).
Let's suppose overheard a male employee talking in the hallway but didn't quite catch his name. You might overhear a name that sounded like "Ann" but you're positive it was a male. The DIFFERENCE function is ideal for this type of situation. We'd probably begin by specifying a threshold value of 4 to limit the number of results returned. Here's the query:
SELECT FirstName, DIFFERENCE(FirstName,'Ann') AS 'Difference' FROM Employees WHERE DIFFERENCE(FirstName,'Ann')=4And the results:
FirstName Difference ---------- ----------- Ann 4 Anne 4Unfortunately, none of the results were male names. Let's try a lower threshold of 2:
SELECT FirstName, DIFFERENCE(FirstName,'Ann') AS 'Difference' FROM Employees WHERE DIFFERENCE(FirstName,'Ann')>=2And the results:
FirstName Difference ---------- ----------- Ann 4 Andrew 2 Janet 2 Laura 2 Anne 4
Hmmm... looks like Andrew might be our man! For more information on the use of SOUNDEX, consult the documentation for your favorite database platform.