如何将所有docx数据放入python中的单独数据框列答案

【问题标题】：how to put all docx data into separate dataframe columns in python如何将所有docx数据放入python中的单独数据框列
【发布时间】：2021-09-21 03:08:10
【问题描述】：

我在stackoverflow中没有找到任何关于这个问题的信息，所以请耐心等待我，我没有任何想法解决这个问题，请多多包涵。

下面是我的代码：

v_doc 

for root, dirs, files in os.walk(paths):
    for t in files:
        if t.endswith('.xlsx'):   
            v_doc.append(Document(t))

            # say like, there are 3 docx which contains simple sentences. how to put 
            #those sentences into seperate dataframe columns for each docx sentences ? i have many docx. n number of docx

示例文档：

docx1 包含：

Hello guys how are you all, hope you guys doing good.

docx2 包含：

I dont know what to write here

docx3 包含：

We are strong together ! do we ?

预期输出：

dataframe:
column1                                                 column2
#Hello guys how are you all, hope you guys doing good.  #I don't know what to write here
column3
#We are strong together ! do we ?

希望我能得到一些回应。提前谢谢你。

【问题讨论】：

这不是最小的可复现代码sn-p--试着让它复现

标签： python arrays regex dataframe numpy

【解决方案1】：

哥奇亚：

import os
import docx

dataframe = {}

def get_files(extension, location):
    v_doc = []

    for root, dirs, files in os.walk(location):
        for t in files:
            if t.endswith(extension):   
                v_doc.append(t)
    return v_doc

file_list = get_files('.docx', '.')
index = 0
for file in file_list:
    index += 1
    doc = docx.Document(file)
    column_label = f'column{index}'
    data_content = doc.paragraphs[0].text
    dataframe = {column_label: data_content}

print(dataframe)

【讨论】：

{'column1': 'example1.docx 的内容', 'column2': 'example2.docx 的内容', 'column3': 'example3.docx 的内容'}
doc.paragraphs[0].text 什么也没显示。但对于 data_content 中的 x：print(x.text)
它应该只抓取title。即第一段。如果它留空，那么它也将留空。
哦，我明白了，但是否可以抓取 docx1、docx2 中的所有内容并放入数据框的 column1 和 column2 中？
工作真棒！！！！更改了一点代码。非常感谢！