在熊猫的for循环中读取时如何连接交叉表答案

【问题标题】：How to concatenate crosstabs when reading in a for loop in pandas在熊猫的for循环中读取时如何连接交叉表
【发布时间】：2018-03-02 06:41:58
【问题描述】：

我在 python 3.5 中使用 Pandas 模块从子目录中递归读取交叉表，我想在调用 pd.crosstab() 之后在 for 循环中连接交叉表，在 for 循环之后将输出写入 excel 文件.在调用 pd.crosstab() 后，我尝试将 table1 复制到 table3 中（参见下面的代码），但如果后面的数据文件中不存在某些值，则 table3 显示这些条目的 NaN。我查看了 pd.concat，但找不到如何在 for 循环中使用它的示例。

数据文件看起来像（有 100 个文件和很多列，但这里只显示我感兴趣的列）：

    First Data File
    StudentID    Grade      
    3            A
    2            B
    1            A

    Second Data File
    StudentID   Grade
    1            B
    2            A
    3            A

    Third Data File
    StudentID   Grade
    2            C
    1            B
    3            A

    and so on ....
    At the end the output should be like:

    Grade       A   B   C
    StudentID
    1           1   2   0
    2           1   1   1
    3           3   0   0

我的 python 程序看起来像（从文件顶部删除导入）

.....

fields = ['StudentID', 'Grade']
path= 'C:/script_testing/'
i=0

for filename in glob.glob('C:/script_testing/**/*.txt', recursive=True):
    temp = pd.read_csv(filename, sep=',', usecols=fields)
    table1 = pd.crosstab(temp.StudentID, temp.Grade)
    # Note the if condition is executed only once to initlialize table3
    if(i==0):
        table3 = table1
        i = i + 1
    table3 = table3 + table1

writer = pd.ExcelWriter('Report.xlsx', engine='xlsxwriter')
table3.to_excel(writer, sheet_name='StudentID_vs_Grade')
writer.save()

【问题讨论】：

标签： python-3.x pandas crosstab

【解决方案1】：

pd.concat([df1, df2, df3]).pipe(lambda d: pd.crosstab(d.StudentID, d.Grade))

Grade      A  B  C
StudentID         
1          1  2  0
2          1  1  1
3          3  0  0

我尝试翻译你的代码

fields = ['StudentID', 'Grade']
path= 'C:/script_testing/'
i=0

parse = lambda f: pd.read_csv(f, usecols=fields)
table3 = pd.concat(
    [parse(f) for f in glob.glob('C:/script_testing/**/*.txt', recursive=True)]
).pipe(lambda d: pd.crosstab(d.StudentID, d.Grade))

writer = pd.ExcelWriter('Report.xlsx', engine='xlsxwriter')
table3.to_excel(writer, sheet_name='StudentID_vs_Grade')
writer.save()

【讨论】：

非常感谢！有效。请提出一个后续问题。从 glob.gob() 读取时，有没有办法排除某些文件？例如，我想读取文件名为 Data*,txt 的所有文件，但排除那些具有 Data*Old.txt 的文件？
不客气。这是一个glob 的问题。或者你可以做类似[parse(f) for f in glob.glob('C:/script_testing/**/Data*.txt', recursive=True) if not 'DataOld' in f]