您有多种选择。
对于每个选项,您可能应该在进行比较之前按摩专辑名称。您可以通过去除标点符号、按字母顺序(在某些情况下)对专辑名称中的单词进行排序等来做到这一点。
在每种情况下,当您进行比较时,如果您从数组中删除其中一个专辑名称,那么您的比较是顺序敏感的,除非您制定了要删除哪个专辑名称的规则。 因此,如果比较两个专辑名称并发现“相似”,则始终删除较长的专辑名称可能是有意义的。
主要的比较选项是
简单的子字符串比较。检查专辑名称是否在另一个专辑中。先去掉标点符号,不区分大小写比较(见下面我的第二个代码 sn-p)。
使用 levenshtein() 检查专辑名称的相似性。此字符串比较比similar_text() 更有效。您应该去掉标点符号并按字母顺序排列单词。
使用 similar_text() 检查专辑名称的相似性。我对这种方法最幸运。事实上,我可以选择你想要的确切专辑名称(参见下面的第一个代码 sn-p)。
您还可以使用各种其他字符串比较函数,包括 soundex() 和 metaphone()
无论如何...这里有 2 个解决方案。
第一个使用similar_text()...但它仅在去除所有标点符号并将单词按字母顺序排列并小写后才计算相似度.... .. 缺点是你必须玩弄阈值相似性...第二个使用简单的不区分大小写的子字符串测试,在所有标点符号和空格都被剥离之后。
两个代码 sn-ps 的工作方式是它们使用 array_walk() 在数组中的每个专辑上运行 compare() 函数。然后在compare() 函数中,我使用foreach() 将当前专辑与所有其他专辑进行比较。有足够的空间让事情变得更有效率。
请注意,我应该在array_walk 中使用第三个参数作为参考,有人可以帮我这样做吗?当前的解决方法是一个全局变量:
function compare($value, $key)
{
global $array; // Should use 3rd argument of compare instead
$value = strtolower(preg_replace("/[^a-zA-Z0-9 ]/", "", $value));
$value = explode(" ", $value);
sort($value);
$value = implode($value);
$value = preg_replace("/[\s]/", "", $value); // Remove any leftover \s
foreach($array as $key2 => $value2)
{
if ($key != $key2)
{
// collapse, and lower case the string
$value2 = strtolower(preg_replace("/[^a-zA-Z0-9 ]/", "", $value2));
$value2 = explode(" ", $value2);
sort($value2);
$value2 = implode($value2);
$value2 = preg_replace("/[\s]/", "", $value2);
// Set up the similarity
similar_text($value, $value2, $sim);
if ($sim > 69)
{ // Remove the longer album name
unset($array[ ((strlen($value) > strlen($value2))?$key:$key2) ]);
}
}
}
}
array_walk($array, 'compare');
$array = array_values($array);
print_r($array);
上面的输出是:
Array
(
[0] => Band of Horses - Is There a Ghost
[1] => Band Of Horses - No One's Gonna Love You
[2] => Band of Horses - The Funeral
[3] => Band of Horses - Laredo
[4] => Band of Horses - "The Great Salt Lake" Sub Pop Records
[5] => Band of Horses perform Marry Song at Tromso Wedding
[6] => Band of Horses, On My Way Back Home
[7] => Band of Horses - cigarettes wedding bands
[8] => Band Of Horses - I Go To The Barn Because I Like The
[9] => Our Swords - Band of Horses
[10] => Band of Horses - Monsters
)
请注意,玛丽的歌曲的 短版 版本丢失了......所以它一定是对其他东西的误报,因为长版本仍在列表中......但是它们正是您想要的专辑名称。
子串方法:
function compare($value, $key)
{
// I should be using &$array as a 3rd variable.
// For some reason couldn't get that to work, so I do this instead.
global $array;
// Take the current album name and remove all punctuation and white space
$value = preg_replace("/[^a-zA-Z0-9]/", "", $value);
// Compare current album to all othes
foreach($array as $key2 => $value2)
{
if ($key != $key2)
{
// collapse the album being compared to
$value2 = preg_replace("/[^a-zA-Z0-9]/", "", $value2);
$subject = $value2;
$pattern = '/' . $value . '/i';
// If there's a much remove the album being compared to
if (preg_match($pattern, $subject))
{
unset($array[$key2]);
}
}
}
}
array_walk($array, 'compare');
$array = array_values($array);
echo "<pre>";
print_r($array);
echo "</pre>";
对于您的示例字符串,上面的输出(它显示了您不想显示的 2 个):
Array
(
[0] => Band of Horses - Is There a Ghost
[1] => Band Of Horses - No One's Gonna Love You
[2] => Band of Horses - The Funeral
[3] => Band of Horses - Laredo
[4] => Band of Horses - "The Great Salt Lake" Sub Pop Records
[5] => Band of Horses perform Marry Song at Tromso Wedding // <== Oops
[6] => 'Laredo' by Band of Horses on Q TV // <== Oops
[7] => Band of Horses, On My Way Back Home
[8] => Band of Horses - cigarettes wedding bands
[9] => Band Of Horses - I Go To The Barn Because I Like The
[10] => Our Swords - Band of Horses
[11] => Band Of Horses - "Marry song"
[12] => Band of Horses - Monsters
)