python 2.7 中的 sqlite3：将 DB 值与 CSV 值进行比较答案

【问题标题】：sqlite3 in python 2.7: Comparing DB values with CSV valuespython 2.7 中的 sqlite3：将 DB 值与 CSV 值进行比较
【发布时间】：2017-07-17 13:25:19
【问题描述】：

我有这段代码，它读取前 2 列的 csv 文件并从每一行附加一个列表。

            with open(self.selected_file[0], 'rb') as csv_file:
                itemids = []
                csv_reader = csv.reader(csv_file, delimiter=',', quotechar="\"")
                for row in csv_reader:
                    itemids.append([row[0], row[1]])

我有一个已经包含 2 个表的数据库，每个表分别对应每个项目。我希望检查 csv 文件的每一行（它们是字符串对）。如果两个字符串对于它们各自的表都是唯一的（意味着 row[0] 对于我的数据库中的第一个项目表是唯一的，而 row[1] 对于我的第二个项目表是唯一的），则将这些值添加到它们各自的表中。我尝试了以下方法：

        for item in itemids:
            first_itemids = db_cursor.execute('''SELECT itemid FROM items_one''').fetchall()
            second_itemds = db_cursor.execute('''SELECT itemid from items_two''').fetchall()
            try:
                if not item[0] in first_itemids and not item[1] in second_itemids:
                    db_cursor.execute('''INSERT INTO items_one(itemid) VALUES (?)''', (item[0], ))
                    db_cursor.execute('''INSERT INTO items_two(itemid) VALUES (?)''', (item[1], ))
                    db_conn.commit()

但是，此检查 if not item[0] in first_itemids and not item[1] in second_itemids 始终评估为 true，因此正在添加重复的非唯一项目。我也尝试过相反的方法 if item[0] in first_itemids or item[1] in second_itemids: pass 但这也失败了

注意：这些不是我的实际变量名，我不知道同一数据库下不同表中的相同列名是否会导致问题，但我的无论如何都不会 - 我只是为了可读性而更改了它.

编辑：

我还尝试检查每个 csv 行，然后将其附加到我的项目列表，如下所示：

            with open(self.selected_file[0], 'rb') as csv_file:
                itemids = []
                csv_reader = csv.reader(csv_file, delimiter=',', quotechar="\"")
                first_itemids = db_cursor.execute('''SELECT itemid FROM items_one''').fetchall()
                second_itemids = db_cursor.execute('''SELECT itemid from items_two''').fetchall()
                for row in csv_reader:
                    if row[0] not in first_itemds and row[1] not in second_itemids:
                        itemids.append([row[0], row[1]])

然后只需将列表的值插入数据库。也不好

【问题讨论】：

似乎您为 csv 文件中的每个项目调用了相同的 SELECT 查询。我建议在 for 循环之前进行选择。
您是否有可能在一个表中有该项目，而在另一个表中没有？这可能会导致重复条目。我还会确保这些值确实准确。空格可以使值不同。
你是对的 - 我确实为每个项目调用了相同的 SELECT 查询。通过这种方式，您可以检查 csv 文件中是否存在重复项。至于您的其他问题，我使用数据库浏览器进行检查-尝试运行代码时我同时拥有这两项。

标签： python python-2.7 csv sqlite

【解决方案1】：

您可以使用“upsert”在每个表中放置独特的项目：

for item in itemids:
    db_cursor.execute('INSERT INTO items_one(itemid) VALUES (?) WHERE (SELECT changes()=0) AND NOT EXISTS (SELECT itemid FROM items_two WHERE itemid = ?)', (item[0], item[1]))
    db_cursor.execute('INSERT INTO items_two(itemid) VALUES (?) WHERE (SELECT changes()=0)', (item[1],))
    db_conn.commit()

但是如果另一个项目在另一个表中，这不会失败。

这取自upsert in SQLite上的另一个问题。

您还应该能够扩展每个查询以检查该项目在另一个表中是否唯一：

db_cursor.execute('''
    INSERT INTO items_one(itemid)
    VALUES (?)
    WHERE (SELECT changes()=0)
        AND NOT EXISTS (
            SELECT itemid FROM items_two
            WHERE itemid = ?
        )
''', (item[0], item[1]))

【讨论】：

【解决方案2】：

我想通了...我正在将字符串 item[0] 和 item[1] 与 unicode 元组进行比较，尽管我认为 python 能够执行此检查。

我将 if 子句更改为

if not (item[0],) in first_itemids and not (item[1],) in second_itemids:

【讨论】：