使用 Python 将迭代数据插入 Cassandra 的正确方法答案

【问题标题】：Proper way to insert iterative data into Cassandra using Python使用 Python 将迭代数据插入 Cassandra 的正确方法
【发布时间】：2017-07-28 00:41:20
【问题描述】：

假设我有这样的 cassandra 表定义：

CREATE TABLE IF NOT EXISTS {} (
            user_id bigint ,
            username text,
            age int,
            PRIMARY KEY (user_id)
        );

我有 3 个相同大小的列表，让我们在每个列表中添加 1 000 000 记录。使用这样的 for 循环插入数据是否是一种好习惯：

for index, user_id in enumerate(user_ids):
    query = "INSERT INTO TABLE (user_id, username, age) VALUES ({0}, '{1}', {1});".format(user_id, username[index] ,age[index])
    session.execute(query)

【问题讨论】：

标签： python python-3.x cassandra datastax

【解决方案1】：

准备好的并发执行语句将是您的最佳选择。驱动程序提供实用函数，用于并发执行带有参数序列的语句，就像您对列表一样：execute_concurrent_with_args

Zipping 您的列表将一起生成适合输入到该函数的参数元组序列。

类似这样的：

prepared = session.prepare("INSERT INTO table (user_id, username, age) VALUES (?, ?, ?)")
execute_concurrent_with_args(session, prepared, zip(user_ids, username, age))

【讨论】：

【解决方案2】：

首先查看获取started guide 的python 驱动程序可能是个好主意。如果你已经看到了，那么很抱歉，但我认为值得一提。

一般来说，您会创建会话对象，然后在循环中执行插入操作，可能会使用准备好的语句（在入门页面下方讨论）以及 here 和 here

以上页面的示例以此为起点

user_lookup_stmt = session.prepare("SELECT * FROM users WHERE user_id=?")

users = []
for user_id in user_ids_to_query:
    user = session.execute(user_lookup_stmt, [user_id])
    users.append(user)

在谈论使用 python 驱动程序提高吞吐量时，您可能还会发现 this blog 有所帮助

您可能会发现python driver github page 是一个有用的资源，特别是我发现这个示例使用了一个准备好的语句here，它也可能对您有所帮助。

【讨论】：