【问题标题】:MYSQL PHP Combine Multiple Rows into One Based on a Duplicate ColumnMYSQL PHP 基于重复列将多行合二为一
【发布时间】:2012-12-28 02:46:29
【问题描述】:

我有一个包含大量重复数据的电子邮件列表,我想合并在某一列中具有重复数据的行..

这是我的桌子:

autoid,title,lastname,firstname,middlename,prefix,
fulladdress,address1,address2,
city,state,zip,country,county,phone1,phone2,email,id, ts

我想合并基于 email 和 phone1 的重复行。如果这些内容在两行中相同,那么我想合并这些行并填写任何空白,然后删除第二行。具有较低 autoid 的行中的数据将优先于具有较高 id 的行。

如果我们可以使用单个 mysql 查询来做到这一点,那就太好了,但如果我们必须使用 PHP,那也可以。

【问题讨论】:

  • 我不确定是否有一个 mysql 查询会自动合并它们,但 PHP 绝对是一个解决方案。我先让 mySQL 大神看看,如果没有,请告诉我,我们将为您提供 PHP 解决方案。
  • Ok SnareChops 让我们编写一些可行的 PHP 代码。
  • @ToddWelch,与某人交谈时,请在他的名字前加上@。在这种情况下,无论如何都会通知 SnareChops,因为他是迄今为止唯一发表评论的人,但总的来说他可能不会。见meta.stackexchange.com/a/43020/188688

标签: php mysql merge duplicates


【解决方案1】:

您可以使用类似的方法将多行合并为一行

GROUP BY email, phone1

如果你只是插入这个,你会得到任何一个组合行。如果您希望值优先于 NULL 字段,您可以使用聚合函数,例如MIN:

SELECT MIN(title), MIN(lastname), …
FROM tableName
GROUP BY email, phone1

但这决定了为每一行分别取哪个值。组合行但以您描述的方式进行的查询在 MySQL 中会相当棘手。您可以有一个查询,它按匹配列的顺序列出所有行,然后降序autoid,使用用户变量来填补空白。但是不填补不匹配行中的空白会很困难,因此每个匹配对的一个子查询可能会更好。除了查询的性能和可读性之外,总的来说,使用 PHP 解决方案可能会更好。

在 PHP 中,事情应该相当简单:使用查询数据库

ORDER BY email, phnone1, autoid ASC

然后在 PHP 端,对于从数据库中读取的每一行,检查它是否与两个特定列中先前读取的行匹配。如果是这样,请遍历列,随时替换nulls。这些天我不是一个 PHP 编码器,所以其他人可能更适合为此编写代码 sn-p。

【讨论】:

  • 非常感谢您提供 MvG 的信息。我想我会尝试用上面的 SnareChops 敲出一些 PHP 代码。
【解决方案2】:

我们真的不喜欢在 StackOverflow 上提供完整的代码解决方案,通常我们会帮助您编写自己的解决方案,但我不确定是否可以在不亲自编写代码的情况下解释所有步骤以解决问题,所以这里有一些入门代码。

这是未经测试的原始代码

首先复制您现有的表格,直到我们知道此代码不会损害或丢弃您现有的数据,然后在副本上执行所有操作,然后在完成后处理出来,您已经验证它可以正常工作,然后将其应用到正确的表中。

使用以下命令创建副本:

CREATE TABLE EmailListCopy LIKE EmailList; 
INSERT EmailListCopy SELECT * FROM EmailList;

PHP 代码:

<?php

//This script will first query the table for ALL results, store them in arrays, 
//then loop through the arrays to search the table for duplicates using individual
//sql queries and then compare the results, update the entry as needed and delete the
//duplicate row. THIS CODE IS NOT OPTIMIZED!! DO NOT RUN CONTINUOUSLY!! This should be
//used for OCCASIONAL merging ONLY!! i.e. Once-a-day or once-a-week etc...
$result="";
$duplicatesFound;
//Setup arrays to hold the original query information
$autoidArray = array();
$titleArray = array();
$lastnameArray = array();
$firstnameArray = array();
$middlenameArray = array();
$prefixArray = array();
$fulladdressArray = array();
$address1Array = array();
$address2Array = array();
$cityArray = array();
$stateArray = array();
$zipArray = array();
$countryArray = array();
$countyArray = array();
$phone1Array = array();
$phone2Array = array();
$emailArray = array();
$idArray = array();
$tsArray = array();
$link=mysqli_connect($hostname,$dbname,$password,$username);
if(mysqli_connect_errno())
{
    $result="Error connecting to database: ".mysqli_connect_error();
}
else
{
    $stmt=mysqli_prepare($link,"SELECT autoid,title,lastname,firstname,middlename,prefix,fulladdress,address1,address2,city,state,zip,country,county,phone1,phone2,email,id,ts FROM " . $table);
    mysqli_stmt_execute($stmt);
    mysqli_stmt_bind_result($stmt, $autoid, $title, $lastname, $firstname, $middlename, $prefix, $fulladdress, $address1, $address2, $city, $state, $zip, $country, $county, $phone1, $phone2, $email, $id, $ts);
    if(mysqli_stmt_errno($stmt))
    {
        $result="Error executing SQL statement: ".mysqli_stmt_error($stmt);
    }
    else
    {
        mysqli_stmt_store_result($stmt);
        if(mysqli_stmt_num_rows($stmt)==0)
        {
            $result="0 rows returned (Empty table)";
        }
        else 
        {
            while(mysqli_stmt_fetch($stmt))
            {
                //Load results into arrays
                array_push($autoidArray, $autoid);
                array_push($titleArray, $title);
                array_push($lastnameArray, $lastname);
                array_push($firstnameArray, $firstname);
                array_push($middlenameArray, $middlename);
                array_push($prefixArray, $prefix);
                array_push($fulladdressArray, $fulladdress);
                array_push($address1Array, $address1);
                array_push($address2Array, $address2);
                array_push($cityArray, $city);
                array_push($stateArray, $state);
                array_push($zipArray, $zip);
                array_push($countryArray, $country);
                array_push($countyArray, $county);
                array_push($phone1Array, $phone1);
                array_push($phone2Array, $phone2);
                array_push($emailArray, $email);
                array_push($idArray, $id);
                array_push($tsArray, $ts);
            }
        }
        mysqli_stmt_free_result($stmt);
    }
    for($i=0;$i<count($emailArray);$i++)
    {
        $duplicatestmt=mysqli_prepare($link,"SELECT autoid,title,lastname,firstname,middlename,prefix,fulladdress,address1,address2,city,state,zip,country,county,phone1,phone2,email,id,ts FROM " . $table . " WHERE email=? OR phone1=?");
        mysqli_stmt_bind_param($duplicatestmt, 'si', $emailArray[$i], $phone1Array[$i]);
        mysqli_stmt_execute($duplicatestmt);
        mysqli_stmt_bind_result($duplicatestmt, $autoid, $title, $lastname, $firstname, $middlename, $prefix, $fulladdress, $address1, $address2, $city, $state, $zip, $country, $county, $phone1, $phone2, $email, $id, $ts);
        if(mysqli_stmt_errno($duplicatestmt))
        {
            $result="Error executing SQL statement: ".mysqli_stmt_error($duplicatestmt);
        }
        else
        {
            mysqli_stmt_store_result($duplicatestmt);
            if(mysqli_stmt_num_rows($duplicatestmt)==0)
            {
                //NO Duplicate entry found, loop again;
                echo "<p>No Dublicate Found</p>";
            }
            else 
            {
                while(mysqli_stmt_fetch($duplicatestmt))
                {
                    //Found a duplicate
                    echo "<p>Dublicate Found</p>";
                    if($autoid > $autoidArray[$i])
                    {
                        if($email=="" && $phone1=="")
                        {
                            echo "<p>Both email and phone1 are empty. Skipping...</p>";
                        else
                        {
                        $duplicatesFound++;
                        //The autoid of the duplicate just found is greater then the autoid of the
                        //one used to find the duplicate (older). Therefor update the entry and remove the
                        //duplicate
                        //
                        //This checks each of the values and if the lower autoid one is blank, then will add the
                        //value to the table in the lower autoid row
                        //NOTE:** If having any problems with the queries below try removing the single quotes -> ' <- from any "autoid=" portion of the query
                        if($titleArray[$i]==""){mysqli_query($link,"UPDATE EmailListCopy SET title='".$title."' WHERE autoid='".$autoidArray[$i]."'");}
                        if($lastnameArray[$i]==""){mysqli_query($link,"UPDATE EmailListCopy SET title='".$firstname."' WHERE autoid='".$autoidArray[$i]."'");}
                        if($firstnameArray[$i]==""){mysqli_query($link,"UPDATE EmailListCopy SET title='".$lastname."' WHERE autoid='".$autoidArray[$i]."'");}
                        if($middlenameArray[$i]==""){mysqli_query($link,"UPDATE EmailListCopy SET title='".$middlename."' WHERE autoid='".$autoidArray[$i]."'");}
                        if($prefixArray[$i]==""){mysqli_query($link,"UPDATE EmailListCopy SET title='".$prefix."' WHERE autoid='".$autoidArray[$i]."'");}
                        if($fulladdressArray[$i]==""){mysqli_query($link,"UPDATE EmailListCopy SET title='".$fulladdress."' WHERE autoid='".$autoidArray[$i]."'");}
                        if($address1Array[$i]==""){mysqli_query($link,"UPDATE EmailListCopy SET title='".$address1."' WHERE autoid='".$autoidArray[$i]."'");}
                        if($address2Array[$i]==""){mysqli_query($link,"UPDATE EmailListCopy SET title='".$address2."' WHERE autoid='".$autoidArray[$i]."'");}
                        if($cityArray[$i]==""){mysqli_query($link,"UPDATE EmailListCopy SET title='".$city."' WHERE autoid='".$autoidArray[$i]."'");}
                        if($stateArray[$i]==""){mysqli_query($link,"UPDATE EmailListCopy SET title='".$state."' WHERE autoid='".$autoidArray[$i]."'");}
                        if($zipArray[$i]==""){mysqli_query($link,"UPDATE EmailListCopy SET title='".$zip."' WHERE autoid='".$autoidArray[$i]."'");}
                        if($countryArray[$i]==""){mysqli_query($link,"UPDATE EmailListCopy SET title='".$country."' WHERE autoid='".$autoidArray[$i]."'");}
                        if($countyArray[$i]==""){mysqli_query($link,"UPDATE EmailListCopy SET title='".$county."' WHERE autoid='".$autoidArray[$i]."'");}
                        if($phone1Array[$i]==""){mysqli_query($link,"UPDATE EmailListCopy SET title='".$phone1."' WHERE autoid='".$autoidArray[$i]."'");}
                        if($phone2Array[$i]==""){mysqli_query($link,"UPDATE EmailListCopy SET title='".$phone2."' WHERE autoid='".$autoidArray[$i]."'");}
                        if($emailArray[$i]==""){mysqli_query($link,"UPDATE EmailListCopy SET title='".$email."' WHERE autoid='".$autoidArray[$i]."'");}
                        if($idArray[$i]==""){mysqli_query($link,"UPDATE EmailListCopy SET title='".$id."' WHERE autoid='".$autoidArray[$i]."'");}
                        if($tsArray[$i]==""){mysqli_query($link,"UPDATE EmailListCopy SET title='".$ts."' WHERE autoid='".$autoidArray[$i]."'");}

                        //Now that it has been updated, delete the duplicate entry
                        mysqli_query($link, "DELETE FROM EmailListCopy WHERE autoid='".$autoid."'");
                        echo "<p>Duplicate to be updated DELETE FROM EmailListCopy WHERE autoid='".$autoid."'</p>";
                        }
                    }
                    else
                    {
                        //The duplicate autoid is lower then the one used to query either an entry we already but is still in the arrays, or something else. 
                        //This is to be skipped.
                        echo "<p>Duplicate not to be updated</p>";
                    }

                }
                $result="Merged ".$duplicatesFound." rows.";
            }
            mysqli_stmt_free_result($stmt);
        }
    }
    mysqli_stmt_close($duplicatestmt);
    mysqli_stmt_close($stmt);
    mysqli_close($link);
}
echo $result;
?>

【讨论】:

  • 在进行了一些语法错误修复后,我能够让脚本运行,但它删除了 23,000 条记录中的所有记录,其中 7 条记录除外。在考虑了更多之后,我在最初的帖子中犯了一个错误。有些记录我只有电子邮件或电话,有些可能没有。脚本应该查找 phone1 或 email 是否相同,如果相同则合并。如果两者都是空白,则不要理会它。
  • @ToddWelch 我在echo "&lt;p&gt;Found Duplicate&lt;/p&gt;"; 下添加了几行,用于检查电子邮件和电话1 字段是否为空,如果是则跳过它。请务必在 SQL 删除命令下完成 else }。更改后代码是否按预期工作?
  • 你知道你可以在 SQL 中使用 COUNT 吗?因此删除了几乎所有的脚本?
  • @Dave 听起来不错。我不是一个知识渊博的 SQL 人,所以我根据 OP 请求(想要一个 SQL 或 PHP 答案),我让 SQL 人先搞定它。然后,当 OP 联系我寻求 PHP 解决方案时,我尽我所能以我目前的知识水平知道的唯一方式回答它。 SQL 中的 COUNT 可以帮助消除什么?
猜你喜欢
  • 1970-01-01
  • 2012-07-02
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2015-04-01
  • 1970-01-01
  • 2018-08-08
  • 2012-09-20
相关资源
最近更新 更多