【问题标题】:How to split one csv into multiple files in python如何在python中将一个csv拆分为多个文件
【发布时间】:2020-06-11 00:27:44
【问题描述】:

我有一个 csv 文件 (world.csv),如下所示:

"city","city_alt","lat","lng","country"
"Mjekić","42.6781","20.9728","Kosovo"
"Mjekiff","42.6781","20.9728","Kosovo"
"paris","42.6781","10.9728","France"
"Bordeau","16.6781","52.9728","France"
"Menes","02.6781","50.9728","Morocco"
"Fess","6.6781","3.9728","Morocco"
"Tanger","8.6781","5.9728","Morocco"

我想像这样按国家/地区将其拆分为多个文件:

科索沃.csv:

"city","city_alt","lat","lng","country"
"Mjekić","42.6781","20.9728","Kosovo"
"Mjekiff","42.6781","20.9728","Kosovo"

法国.csv:

"city","city_alt","lat","lng","country"
"paris","42.6781","10.9728","France"
"Bordeau","16.6781","52.9728","France"

Morroco.csv:

"city","city_alt","lat","lng","country"
"Menes","02.6781","50.9728","Morocco"
"Fess","6.6781","3.9728","Morocco"
"Tanger","8.6781","5.9728","Morocco"

【问题讨论】:

标签: python excel python-3.x csv export-to-csv


【解决方案1】:

如果你不能使用 pandas,你可以使用内置的 csv 模块和 itertools.groupby() 函数。您可以使用它按国家/地区分组。

from itertools import groupby
import csv

with open('world.csv') as csv_file:
    reader = csv.reader(csv_file)
    next(reader) #skip header
    
    #Group by column (country)
    lst = sorted(reader, key=lambda x : x[4])
    groups = groupby(lst, key=lambda x : x[4])

    #Write file for each country
    for k,g in groups:
        filename = k + '.csv'
        with open(filename, 'w', newline='') as fout:
            csv_output = csv.writer(fout)
            csv_output.writerow(["city","city_alt","lat","lng","country"])  #header
            for line in g:
                csv_output.writerow(line)

【讨论】:

  • 它给了我很多相同国家的 csv 文件
  • 我需要一个国家/地区的 csv
  • @FlutterLover 使用你的 world.csv 我得到三个文件:Morocco.csv、Kosovo.csv 和 France.csv。这不是你想要的吗?你得到什么文件?
  • @FlutterLover 我刚刚注意到在您的 world.csv 文件中似乎缺少一列。标题中有五个列名,但实际数据中只有四个。也许这是一个复制和粘贴错误?如果数据包含五列,请尝试在我的代码中将 x[3] 更改为 x[4]。
  • 是的,现在可以通过将 x[3] 更改为 x[4] 来工作,非常感谢
【解决方案2】:

最简单的方法如下: #例如在您的工作目录中创建一个名为“adata”的文件夹 #import 全局

for i,g in df.groupby('CITY'):
    g.to_csv('adata\{}.csv'.format(i), header=True, index_label='Index')
print(glob.glob('adata\*.csv'))
filenames = sorted(glob.glob('adata\*.csv'))

for f in filenames:
    #your intended processes

【讨论】:

    【解决方案3】:

    试试这个:

    根据国家名称过滤列。然后使用pandas中的to_csv将其转换为csv文件

    df = pd.read_csv('test.csv')
    
    france = df[df['country']=='France']
    kosovo = df[df['country']=='Kosovo']
    morocco = df[df['country']=='Morocco']
    
    france.to_csv('france.csv', index=False)
    kosovo.to_csv('kosovo.csv', index=False)
    morocco.to_csv('morocco.csv', index=False)
    

    【讨论】:

      猜你喜欢
      • 2016-07-26
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2014-08-24
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多