【发布时间】:2017-08-02 18:11:41
【问题描述】:
我尝试在单个进程和多进程中进行批量插入,但它们使用的时间相同。我没有得到任何性能改进。 cassandra 的 keyspace 是 SimpleStrategy,我认为它只有一个节点。这些有影响吗?
这是我的多处理代码,你能帮我找出问题所在吗?
lock = Lock()
ID = Value('i', 0)
def copy(x):
cluster = Cluster()
session = cluster.connect('test')
global lock, row_ID
count = 0
insertt = session.prepare("INSERT INTO table2(id, age, gender, name) values(?, ?, ?, ?)")
batch = BatchStatement()
for i in x:
with open(files[i]) as csvfile:
reader = csv.reader(csvfile, delimiter=',')
for row in tqdm(reader):
if count <= 59:
with lock:
ID.value += 1
name_ID = row[1]
gender_ID = row[2]
age_ID = int(row[3])
batch.add(insertt, (ID.value, age_ID, gender_ID, name_ID))
count += 1
else:
count = 0
with lock:
ID.value += 1
name_ID = row[1]
gender_ID = row[2]
age_ID = int(row[3])
batch.add(insertt, (ID.value, age_ID, gender_ID, name_ID))
session.execute(batch)
batch = BatchStatement()
if __name__ == '__main__':
start = time.time()
with Pool() as p:
p.map(copy, [range(0,6),range(6,12),range(12,18),range(18,24)])
end = time.time()
t = end - start
print(t)
【问题讨论】:
标签: python cassandra multiprocessing batch-insert