【问题标题】:Converting multiple text files to a csv to create a labelled dataset将多个文本文件转换为 csv 以创建标记数据集
【发布时间】:2020-10-28 06:21:21
【问题描述】:

我在多个文件夹中有文本文件(文件夹名称是类别/标签的名称)。我想生成一个 csv 文件(数据集),其中有一列作为该文本类别的标签(文件夹名称)。

import csv
import os

folder = os.path.dirname("/home/jaideep/Desktop/folder/ML DS/Csv/Datasets/")
folder_list = os.listdir(folder)

with open("/home/jaideep/Desktop/folder/ML DS/Csv/data.csv", "w") as outfile:
    writer = csv.writer(outfile)
    writer.writerow(['Label', 'Email','Message'])
    for f in folder_list:
        file_list = os.listdir(folder+"/"+f+"/")
        print(file_list)
        for file in file_list:
            with open(file, "r")  as infile:
                contents = infile.read()
                outfile.write(f+',')
                outfile.write(contents)

但我得到了

File "/home/jaideep/Desktop/folder/ML DS/Csv/Main.py", line 15, in <module>
    with open(file, "r")  as infile:

FileNotFoundError: [Errno 2] No such file or directory: 'file2.txt'

我知道以前有人问过类似的问题,但我无法为我的问题提交解决方案。任何帮助将不胜感激,谢谢。

【问题讨论】:

    标签: python pandas dataset file-handling


    【解决方案1】:

    os.listdir 只列出了一个目录的文件名,所以需要重构路径。

    您可能想查看glob

    这个版本应该可以解决你的问题。

    import csv
    import os
    
    folder = os.path.dirname("/home/jaideep/Desktop/folder/ML DS/Csv/Datasets/")
    folder_list = os.listdir(folder)
    
    with open("/home/jaideep/Desktop/folder/ML DS/Csv/data.csv", "w") as outfile:
        writer = csv.writer(outfile)
        writer.writerow(['Label', 'Email','Message'])
        for f in folder_list:
            file_list = os.listdir(os.path.join(folder, f))
            print(file_list)
            for file in file_list:
                with open(os.path.join(folder, f, file), "r")  as infile:
                    contents = infile.read()
                    outfile.write(f+',')
                    outfile.write(contents)
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2021-12-23
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-09-11
      • 2020-03-16
      • 2021-11-05
      相关资源
      最近更新 更多