I maintain a music database in mySQL, how do I return results stored under e.g. 'Tiësto' when people search for 'Tiesto'?
All the data is stored under full text indexing, if that makes any difference.
I'm already employing a combination of Levenshtein in PHP and REGEXP in SQL - not in trying to solve this problem, but just for increased searchability in general.
PHP:
function Levenshtein($word) {
$words = array();
for ($i = 0; $i < strlen($word); $i++) {
$words[] = substr($word, 0, $i) . '_' . substr($word, $i);
$words[] = substr($word, 0, $i) . substr($word, $i + 1);
$words[] = substr($word, 0, $i) . '_' . substr($word, $i + 1);
}
$words[] = $word . '_';
return $words;
}
$fuzzyartist = Levenshtein($_POST['searchartist']);
$searchimplode = "'".implode("', '", $fuzzyartist)."'";
mySQL:
SELECT *
FROM new_track_database
WHERE artist REGEXP concat_ws('|', $searchimplode);
To add, I frequently perform character set conversions and string sanitation in PHP, but these have always been the OTHER way - standardising non latin characters. I can't get my head around performing the oppsite process, but only in certain circumstances based on the data I've got stored.
The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:
SELECT
*
FROM
new_track_database
WHERE
new_track_database.artist REGEXP concat_ws('|', $searchimplode)