将数据从一个 csv 写入另一个 python答案

【问题标题】：Write data from one csv to another python将数据从一个 csv 写入另一个 python
【发布时间】：2017-11-07 22:41:01
【问题描述】：

我有三个具有 Product_ID、名称、成本、描述属性的 CSV 文件。每个文件都包含 Product_ID。我想将 Name (file1)、Cost(file2)、Description(File3) 与 Product_ID 和上述所有三个属性组合到新的 CSV 文件中。我需要高效的代码，因为文件包含超过 130000 行。

将所有数据合并到新文件后，我必须将该数据加载到字典中。比如：Product_Id 作为 Key 和 Name，Cost，Description 作为 Value。

【问题讨论】：

到目前为止，您尝试了什么？向我们展示您的代码，以便我们更好地帮助您。
我所尝试的只是将三个文件中的数据合并到一个字典中，然后写入它，但我得到了错误。在下面的代码中，我正在将文件写入字典，其中 row[1] 作为键，row[2],row[3] 作为值。但我无法将另一个文件附加到同一个字典。使用 open('train_1.csv', 'r',encoding="utf8") 作为文件： text_file = csv.reader(file) next(text_file) for text_file 中的行： maindict[rows[1]] = rows[2 ],行[3]
@Sameer 可能想用该代码编辑您的问题，cmets 看起来并不容易。
我正在使用这种方法进行特征提取，之后我必须应用多项朴素贝叶斯。虽然我对这个方法一无所知，但我正在学习它。
我不知道如何在 cmets 中添加新行

标签： python

【解决方案1】：

在创建聚合结果之前将每个输入 .csv 读入字典可能会更有效。

这是读取每个文件并将列存储在字典中的解决方案，其中 Product_IDs 作为键。我假设每个文件中都存在每个 Product_ID 值并且包含标题。我还假设除了 Product_ID 之外的文件中没有重复的列。

import csv
from collections import defaultdict

entries = defaultdict(list)
files = ['names.csv', 'costs.csv', 'descriptions.csv']
headers = ['Product_ID']

for filename in files:
   with open(filename, 'rU') as f:      # Open each file in files.
      reader = csv.reader(f)            # Create a reader to iterate csv lines
      heads = next(reader)              # Grab first line (headers)

      pk = heads.index(headers[0])      # Get the position of 'Product_ID' in
                                        # the list of headers
      # Add the rest of the headers to the list of collected columns (skip 'Product_ID')
      headers.extend([x for i,x in enumerate(heads) if i != pk])

      for row in reader:
         # For each line, add new values (except 'Product_ID') to the
         # entries dict with the line's Product_ID value as the key
         entries[row[pk]].extend([x for i,x in enumerate(row) if i != pk])

writer = csv.writer(open('result.csv', 'wb'))    # Open file to write csv lines
writer.writerow(headers)                         # Write the headers first
for key, value in entries.items():
   writer.writerow([key] + value)      # Write the product IDs
   # concatenated with the other values

【讨论】：

如果我想从 CSV 追加多行，那么上面的代码将不起作用。假设 names.csv 包含 Product_ID、Names、Tags。如果我想同时追加第 1 行、第 2 行 ??
您没有包含太多有关 csv 列的信息。我假设其中没有包含其他数据。您可以从第一行读取标题，而不是跳过它们，以便为键和要附加的值找到正确的行索引。澄清一下，您希望添加每个文件中的每一列，并以产品 ID 作为键？
我已经编辑了答案以包含每个文件中的每一列。
感谢您的帮助，我会查看您提供的代码。如有需要，将进一步评论。
使用上面的代码，我得到了一些错误。 Heads = reader.next() AttributeError: '_csv.reader' 对象没有属性 'next'

【解决方案2】：

为每个id 生成记录（可能不完整）的通用解决方案需要使用专门的数据结构，幸运的是它只是一个列表，具有预先分配的插槽数

d = {id:[name,None,None] for id, name in [line.strip().split(',') for line in open(fn1)]}
for line in open(fn2):
    id, cost = line.strip().split(',')
    if id in d:
        d[id][1] = cost
    else:
        d[id] = [None, cost, None]
for line in open(fn3):
    id, desc = line.strip().split(',')
    if id in d:
        d[id][2] = desc
    else:
        d[id] = [None, None, desc]

for id in d:
    if all(d[id]): 
       print ','.join([id]+d[id])
    else: # for this id you have not complete info,
          # so you have to decide on your own what you want, I have to
        pass

如果您确定不想进一步处理不完整的记录，可以简化上面的代码

d = {id:[name] for id, name in [line.strip().split(',') for line in open(fn1)]}
for line in open(fn2):
    id, cost = line.strip().split(',')
    if id in d: d[id].append(name)
for line in open(fn3):
    id, desc = line.strip().split(',')
    if id in d: d[id].append(desc)

for id in d:
    if len(d[id])==3: print ','.join([id]+d[id])

【讨论】：

@gboffi，我今天研究一下代码，谢谢你的帮助。
请你看看这个问题好吗？ stackoverflow.com/questions/54192260/…
@Barbie 我已经检查了你的那个问题，但我对pandas 没有工作知识，我也不清楚这个问题，所以我恐怕无法帮助你，抱歉。 ..