如何有效地将字典的条目转换为数据框答案

【问题标题】：How to efficiently convert the entries of a dictionary into a dataframe如何有效地将字典的条目转换为数据框
【发布时间】：2016-09-28 04:31:33
【问题描述】：

我有一本这样的字典：

mydict = {'A': 'some thing',
          'B': 'couple of words'}

所有值都是由空格分隔的字符串。我的目标是将其转换为如下所示的数据框：

  key_val splitted_words
0       A           some
1       A          thing
2       B         couple
3       B             of
4       B          words

所以我想拆分字符串，然后将关联的键和这些单词添加到数据框的一行中。

快速实现可能如下所示：

import pandas as pd

mydict = {'A': 'some thing',
          'B': 'couple of words'}

all_words = " ".join(mydict.values()).split()
df = pd.DataFrame(columns=['key_val', 'splitted_words'], index=range(len(all_words)))

indi = 0
for item in mydict.items():
    words = item[1].split()
    for word in words:
        df.iloc[indi]['key_val'] = item[0]
        df.iloc[indi]['splitted_words'] = word
        indi += 1

这给了我想要的输出。

但是，我想知道是否有更有效的解决方案！？

【问题讨论】：

标签： python performance dictionary pandas

【解决方案1】：

这是我的在线方法：

df = pd.DataFrame([(k, s) for k, v in mydict.items() for s in v.split()], columns=['key_val','splitted_words'])

如果我拆分它，它将是：

d=[(k, s) for k, v in mydict.items() for s in v.split()]
df = pd.DataFrame(d, columns=['key_val','splitted_words'])

输出：

Out[41]: 
  key_val splitted_words
0       A           some
1       A          thing
2       B         couple
3       B             of
4       B          words

【讨论】：

很好，也许用.split()代替.split(' ')
效果很好！我现在投赞成票，稍后根据其他答案的质量接受。

【解决方案2】：

基于@qu-dong 的想法并使用生成器函数来提高可读性，这是一个工作示例：

#! /usr/bin/env python
from __future__ import print_function
import pandas as pd

mydict = {'A': 'some thing',
          'B': 'couple of words'}


def splitting_gen(in_dict):
    """Generator function to split in_dict items on space."""
    for k, v in in_dict.items():
        for s in v.split():
            yield k, s

df = pd.DataFrame(splitting_gen(mydict), columns=['key_val', 'splitted_words'])
print (df)

#   key_val splitted_words
# 0       A           some
# 1       A          thing
# 2       B         couple
# 3       B             of
# 4       B          words

# real    0m0.463s
# user    0m0.387s
# sys     0m0.057s

但是这只能满足所请求解决方案的优雅/可读性的效率。

如果您注意时间，它们都差不多。比 500 毫秒短一点。因此，在输入较大的文本时，人们可能会继续进一步分析以免受到影响；-)

【讨论】：