【问题标题】:Split CSV in multiple files, reading column 1 for naming output files将 CSV 拆分为多个文件,读取第 1 列以命名输出文件
【发布时间】:2017-01-14 19:04:35
【问题描述】:

我有一个约 500 行 11 列的 employees.csv 文件,列文件由双引号限制:

"1","Paula","Paula's Role","Paula's Job Description","Paula's Department","11/10/2008","8","14","10","24","0
"2","John","John's Role","John's Job Description","John's Department","11/10/2008","2","17","6","11","0"
"3","Mark","Mark's Role","Mark's Job Description","Mark's Department","11/10/2008","4","17","13","44","0"
:
:
(more records)
:
:
"499","Maria","Maria's Role","Maria's Job Description","Maria's Department","11/10/2008","8","15","2","9","0"
"500","Peter","Peter's Role","Peters's Job Description","Peters's Department","11/10/2008","8","17","16","22","0"

我正在尝试根据第一个字段(唯一的员工 ID 号)将此类文件拆分为多个 csv(一行 = 一个文件)。 该命令的输出应为 500 个单独的 csv 文件,每个文件包含 1 行,命名如下:

1.csv
2.csv
3.csv
:
:
:
499.csv
500.csv

我一直在尝试 cat 和 awk 的组合,但是代码中有一些错误:

for i in $(cat unix | awk -F\, '{print $1}' /myfolder/employees.csv);

    do
        grep $i "/myfolder/employees.csv" > "/myfolder/splittedfiles/$i";
    done

非常感谢。

【问题讨论】:

    标签: linux bash csv awk


    【解决方案1】:

    你可以像这样使用 GNU awk:

    awk 'BEGIN {FPAT="[^\"]+"} { print $0 > "/myfolder/splittedfiles/"$1".csv" }' yourfile 
    

    FPAT 通过正则表达式定义字段内容,在这里它可以帮助我们去除$1 中的引号。

    【讨论】:

    • 完美运行,谢谢。有没有办法使用相同的命令将输出文件放在单独的文件夹中? (即:名为 splittedfiles 的文件夹)
    【解决方案2】:

    编辑(这个已经过测试),这个gawk 脚本为我完成了这项工作:

    gawk -F'"' -- '{print $0 >> ("/myfolder/splittedfiles/" $2 ".csv")}' /myfolder/employees.csv
    

    -F'"'" 处拆分字段,因此员工编号在$2 中。然后("/myfolder/splittedfiles/" $2 ".csv") 构建您想要的文件名,print $0 >> ... 将原始行打印到该文件。


    或者如果字段总是按从 1 开始的数字顺序,这应该可以工作(未经测试)

    split -l 1 /myfolder/employees.csv /myfolder/splittedfiles/EMPL
    empno=1
    for fname in /myfolder/splittedfiles/EMPL* ; do
        mv "$f" "/myfolder/splittedfiles/${empno}.csv"
        empno=$((empno+1))
    done
    

    split 使每一行 (-l 1) 成为一个单独的文件。 for 按顺序遍历这些文件。 mv 将每个文件重命名为 ${empno}.csv,从 empno=1 开始。然后$((empno+1)) 递增empno

    【讨论】:

      猜你喜欢
      • 2012-02-04
      • 2014-02-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2014-08-20
      • 2021-12-03
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多