将当前工作目录中的所有 CSV 文件读入具有正确文件名的 pandas答案

【问题标题】：Reading all CSV files in current working directory into pandas with correct filenames将当前工作目录中的所有 CSV 文件读入具有正确文件名的 pandas
【发布时间】：2016-12-17 15:41:51
【问题描述】：

我正在尝试使用循环来读取多个 CSV（现在但将来会混合使用它和 xls）。

我希望 pandas 中的每个数据框都具有相同的名称，不包括我文件夹中的文件扩展名。

import os 
import pandas as pd


files = filter(os.path.isfile, os.listdir( os.curdir ) )
files #   this shows a list of the files that I want to use/have in my directory- they are all CSVs if that matters

# i want to load these into pandas data frames with the corresponding filenames

 # not sure if this is the right approach....
 # but what is wrong is the variable is named 'weather_today.csv'... i need to drop the .csv or .xlsx or whatever it might be

for each_file in files:
    frame = pd.read_csv( each_file)
    each_file = frame

Bernie 看起来很棒但是有一个问题：

or each_file in files:
    frame = pd.read_csv(each_file)
    filename_only = os.path.splitext(each_file)[0]
   # Right below I am assigning my looped data frame the literal variable name of "filename_only" rather than the value that filename_only represents
   #rather than what happens if I print(filename_only)
    filename_only = frame

例如，如果我的两个文件在我的文件列表中分别是 weather_today、地震.csv（按此顺序），则不会创建“地震”和“天气”。

但是，如果我简单地键入“filename_only”并在 python 中单击回车键 - 那么我将看到地震数据框。如果我有 100 个文件，则列表循环中的最后一个数据框名称将标题为“filename_only”，而其他 99 个则不会，因为以前的分配从未进行过，第 100 个会覆盖它们。

【问题讨论】：

标签： csv pandas for-loop

【解决方案1】：

您可以为此使用os.path.splitext() 来“将路径名路径拆分为一对 (root, ext)，使得 root + ext == 路径，并且 ext 为空或以句点开头并且最多包含一个句点。 "

for each_file in files:
    frame = pd.read_csv(each_file)
    filename_only = os.path.splitext(each_file)[0]
    filename_only = frame

正如评论中所问的，我们想要一种过滤 CSV 文件的方法，以便您可以执行以下操作：

files = [file for file in os.listdir( os.curdir ) if file.endswith(".csv")]

【讨论】：

哇非常简单。如果您不介意询问我的第一部分 - files = filter(os.path.isfile, os.listdir( os.curdir ) ) ----- 有没有办法将其指定为特定扩展名。我对这种事情不熟悉......
你所做的不就是用DataFrame覆盖filename_only中存储的文件名吗？它将在循环的下一次运行中被覆盖。有点不明白这一点。
我又试了一次，它似乎不起作用。我认为这是自从我通过 csv 读入它们以来第一次起作用。我重新启动了会话，但我仍然将所有文件作为“today_weather.csv”名称而不是 today_weather

【解决方案2】：

使用字典来存储你的帧：

frames = {}

for each_file in files:
    frames[os.path.splitext(each_file)[0]] = pd.read_csv(each_file)

现在您可以通过以下方式获取您选择的 DataFrame：

frames[filename_without_ext]

很简单，对吧？不过要注意 RAM 的使用，读取一堆文件会很快填满系统内存并导致崩溃。

【讨论】：

是的，但是为什么这行得通，而上面的方法却行不通？当我尝试使用上面的中间打印步骤时，我得到了一个名称，但上面答案的最后一部分似乎不适用于分配......我也想让它们将目录加载到内存中而不是列表中。
是字典，不是列表。两者有很大的不同。您确定您从@bernie 的答案中正确复制了代码吗？而你所要求的不能严格做到。除非您的程序编写代码然后执行它，否则您不能声明变量。正在执行的代码不能在 Python 中修改。你唯一能做的就是修改你的变量。字典允许命名键，您可以将值附加到这些键。这就是我的回答所做的，也是您回答问题的最简单方法。
好的，我理解那部分，但我相信出了问题的是单词 FILENAME_ONLY 被用作变量 aka "FILENAME_ONLY" = weather_data.csv 的拼写，而不是 FILENAME_ONLY '携带'提取拼写weather_data，从而设置weather_data = weather_data.csv
你在绕着我转！让代码说话。您可以使用@bernie 的答案和输出以及预期输出来编辑您的问题吗？那会很有帮助。