根据使用python的第1列中的内容将csv拆分为多个csv答案

【问题标题】：Splitting a csv into multiple csv's depending on what is in column 1 using python根据使用python的第1列中的内容将csv拆分为多个csv
【发布时间】：2021-06-23 12:43:06
【问题描述】：

所以我目前有一个包含许多事件数据的大型 csv。

例如，第一列包含许多日期以及每个事件的一些 ID。

基本上我想在 Python 中编写一些东西，只要有一个 id 号（AL.....），它就会创建一个新的 csv，其中 id 号作为标题，其中包含下一个 id 号之前的所有数据，所以我最终为每个事件生成了一个 csv。

对于信息，整个 csv 包含 8 列，但单个 csv 的划分仅基于第一列

Use Python to split a CSV file with multiple headers

我注意到这个问题非常相似，但在我的情况下，我有 AL，然后每次都有不同的数字字符串，而且我想通过 id 数字调用新的 csvs。

【问题讨论】：

标签： python csv split

【解决方案1】：

您可以使用pandas 来实现这一点，所以我们首先生成一些数据：


import pandas as pd
import numpy as np

def date_string():
    return str(np.random.randint(1, 32)) + "/" + str(np.random.randint(1, 13)) + "/1997"

l = [date_string() for x in range(20)]
l[0] = "AL123"
l[10] = "AL321"
df = pd.DataFrame(l, columns=['idx'])

# -->
|    | idx        |
|---:|:-----------|
|  0 | AL123      |
|  1 | 24/3/1997  |
|  2 | 8/6/1997   |
|  3 | 6/9/1997   |
|  4 | 31/12/1997 |
|  5 | 11/6/1997  |
|  6 | 2/3/1997   |
|  7 | 31/8/1997  |
|  8 | 21/5/1997  |
|  9 | 30/1/1997  |
| 10 | AL321      |
| 11 | 8/4/1997   |
| 12 | 21/7/1997  |
| 13 | 9/10/1997  |
| 14 | 31/12/1997 |
| 15 | 15/2/1997  |
| 16 | 21/2/1997  |
| 17 | 3/3/1997   |
| 18 | 16/12/1997 |
| 19 | 16/2/1997  |

所以，有趣的位置是0 和10，因为有AL* 字符串... 现在过滤你可以使用的AL*：

idx = df.index[df['idx'].str.startswith('AL')] # get's you all index where AL is
dfs = np.split(df, idx) # splits the data
for out in dfs[1:]:
    name = out.iloc[0, 0]
    out.to_csv(name + ".csv", index=False, header=False) # saves the data

这会为您提供两个名为 AL123.csv 和 AL321.csv 的 csv 文件，第一行是 AL* 字符串。

【讨论】：