根据字典中所有单词的缺失/存在创建新的数据框答案

【问题标题】：Create new dataframe based on the absence / presence of all words in a dictionary根据字典中所有单词的缺失/存在创建新的数据框
【发布时间】：2020-09-07 14:25:32
【问题描述】：

我想将句子列表处理成一个新的数据框，该数据框应具有基于词汇表中唯一单词数的最大列数。

在数据框中，每列都应指示字典中是否存在句子中的单词——如果为真（填写值 1）或不为真（填写值 0）。

句子列表：

sentence = [['I','like','fruit'],['cars','are','great'],['great','time','eating','fruit']]

包含所有唯一单词的词汇表 - 词汇表的总长度 = 8

vocab = ['I','like','fruit','cars','are','great','time','eating']

最后，我想给每个句子加上相应的标签。

标签：

labels = ['Fruit','Cars','Fruit']

填充 0 值的数据框现在是这样创建的：

new_df = pd.DataFrame(index=np.arange(4), columns=np.arange(8))
new_df = new_df.fillna(0)

预期结果：

          Word1 Word2 Word3 Word4 Word5 Word6 Word7 Word8 Label 

Sentence1   1      1     1     0    0     0     0    0     Fruit
Sentence2   0      0     0     1    1     1     0    0     Car  
Sentence3   0      0     1     0    0     1     1    0     Fruit

【问题讨论】：

到目前为止您遇到了什么问题？你是如何初始化你的数据框的？
@Manakin 我不知道如何用迭代命名为 Word1、Word2 等的 x 列数初始化数据框 - 我可能应该将其添加到问题中！

标签： python-3.x pandas dictionary

【解决方案1】：

sentences = [
    ['I','like','fruit'],
    ['cars','are','great'],
    ['great','time','eating','fruit']
]

# For each sentence, create a dictionary of <word>: 1 for each word
words_dict = [{word: 1 for word in sentence} for sentence in sentences]

# Convert to data frame, fill in the empty values and rename the columns as required
df = pd.DataFrame(words_dict).fillna(0)
df.columns = ['Word{}'.format(i+1) for i in range(len(df.columns))]

这太天真了；您必须研究 panda 的“DataFrame 字典列表”和“填充稀疏数据框”的效率。

【讨论】：

这显然不会创建您在示例中的 Label 列，因为我不确定您是如何从句子中得到的。
谢谢 - 我使用的原始数据框只有两列（一列称为句子，另一列称为标签）！应该更清楚地说明这一点。