将整列转换为语料库

【问题标题】：Transform entire column as a corpus将整列转换为语料库
【发布时间】：2021-04-07 06:20:51
【问题描述】：

df 有两列包含文本。我想将它们分别转换为语料库。

id | Description 1                   |Description 2       |
-----------------------------------------------------------
1  |that book is good                | better than book2  |
2  |book 2 is not better than 1      | not good           |
.  |            .                    |      .             |
.  |            .                    |      .             |
.  |            .                    |      .             |

考虑描述 1 是文档，描述 2 是查询。

预期输出

Corpus 1: that book is good book 2 is not better than 1..................
Corpus 2: better than book2 not good.....................

【问题讨论】：

你能举一个预期结果的例子吗？
语料库 1：那本书是好书 2 不比 1 好........ 语料库 2：比书 2 不好.. ....................

标签： python dataframe corpus

【解决方案1】：

您需要使用连接函数连接列中可用的每一行，然后附加它。输出为列表格式

corpus = []
for i  in range(len(df.columns)):
    corpus.append(' '.join(df.iloc[j,i] for j in range(len(df.iloc[:,i]))))

【讨论】：

我尝试了corpus = [] for i in range(len(df_processed2.columns)): corpus.append(' '.join(df_processed2['Input_Description'].iloc[j,i] for j in range(len(df_processed2.iloc[:,i])))) 但它导致了一个错误，IndexingError: Too many indexers
是的，附加的谎言你提供了更多的索引，所以它显示的错误
而不是这个 -> corpus.append(' '.join(df_processed2['Input_Description'].iloc[j,i] for j in range(len(df_processed2.iloc[:,i] )))) 更改为 corpus.append(' '.join(df_processed2.iloc[j,i] for j in range(len(df_processed2.iloc[:,i])))))
如果你觉得答案是对的就接受吧
好的！现在我可以将它们作为 corpus[0] 和 corpus[1] 获取，对吗？