将元组列表转换为熊猫中的数据框答案

【问题标题】：Convert a list of tuples into a dataframe in pandas将元组列表转换为熊猫中的数据框
【发布时间】：2014-08-02 05:53:10
【问题描述】：

我有一个希望转换为 DataFrame x 的元组 (y) 列表。 y中有五个元组。 y 中的每个元组有 33 个元素。所有 5 个元组中的元素 1 都是文本并且是相同的。所有五个元组中的第二个元素是文本并且是相同的。每个元组中的元素三是文本并且是相同的。

我想将 y 中的前三个元素作为 DataFrame 中的列名。我想将元组列表转换为 10 x 3 DataFrame。棘手的部分是数据框中的第 1 行将是 y[1] 中的元素 4,5,6，数据框中的第 2 行将是 y[1] 中的元素 7,8,9，第 3 行将是 10,11， 12...等

y 看起来像这样（不显示整个列表）：

List of tuples y                
y[0]    y[1]    y[2]    y[3]    y[4]

Formula Formula Formula Formula Formula
Phase   Phase   Phase   Phase   Phase
Value   Value   Value   Value   Value
"a" "a" "a" "a" "a"
"nxxx"  "nxxx"  "nxxx"  "nxxx"  "nxxx"
3.2 3.7 22.4    18.2    9.7
"h45"   "h45"   "h45"   "h45"   "h45"
"cacpp" "cacpp" "cacpp" "cacpp" "cacpp"
45.2    61.76   101.2   171.89  203.7
"trx"   "trx"   "trx"   "trx"   "trx"
"v2o5p" "v2o5p" "v2o5p" "v2o5p" "v2o5p"
0.24    0.81    0.97    1.2 1.98
"blnt"  "blnt"  "blnt"  "blnt"  "blnt"
"g2o3"  "g2o3"  "g2o3"  "g2o3"  "g2o3"
807.2   905.8   10089   10345   10979

我想将 y 转换成 DataFrame x 如下：

DataFrame x     
column 1 column 2 column 3

Formula Phase   Value
"a" "nxxx"  3.2
"h45"   "cacpp" 45.2
"trx"   "v2o5p" 0.24
"blnt"  "g2o3"  807.2
"a" "nxxx"  3.7
"h45"   "cacpp" 61.76
"trx"   "v2o5p" 0.81
"blnt"  "g2o3"  905.8
"a" "nxxx"  22.4
"h45"   "cacpp" 101.2
"trx"   "v2o5p" 0.97
"blnt"  "g2o3"  10089
etc etc etc

我知道必须有一种简单的方法来遍历元组列表。但是对于 Pandas 来说是新手，对于 Python 来说相对较新，所以我正在努力寻找一种干净的方式来做到这一点。

【问题讨论】：

标签： python pandas

【解决方案1】：

基本上，您需要： 1）删除每个元组的前3个元素（只需要一个作为列标题） 2）concatenatey中的所有元素 3) reshape 到 3 列所有这些都可以使用numpy 来实现，如果您使用pandas，您必须熟悉它

#Step 1) and 2) above.
In [83]: data = np.concatenate ([z[3:] for z in y])

#reshape
In [84]: data = data.reshape(-1, 3)

#Now data is a numpy array which looks what you need:
In [85]: data
Out[85]: 
array([['a', 'nxxx', '3.2'],
       ['h45', 'cacpp', '45.2'],
       ['trx', 'v2o5p', '0.24'],
       ['blnt', 'g2o3', '807.2'],
       ['a', 'nxxx', '3.7'],
       ['h45', 'cacpp', '61.76'],
       ['trx', 'v2o5p', '0.81'],
       ['blnt', 'g2o3', '905.8'],
       ['a', 'nxxx', '22.4'],
       ['h45', 'cacpp', '101.2'],
       ['trx', 'v2o5p', '0.97'],
       ['blnt', 'g2o3', '10089'],
       ['a', 'nxxx', '18.2'],
       ['h45', 'cacpp', '171.89'],
       ['trx', 'v2o5p', '1.2'],
       ['blnt', 'g2o3', '10345'],
       ['a', 'nxxx', '9.7'],
       ['h45', 'cacpp', '203.7'],
       ['trx', 'v2o5p', '1.98'],
       ['blnt', 'g2o3', '10979']], 
      dtype='|S6')

您可以将data 放入pandas DataFrame

In [86]: df = pd.DataFrame (data, columns=y[0][:3])

In [87]: df
Out[87]: 
   Formula  Phase   Value
0        a   nxxx     3.2
1      h45  cacpp    45.2
2      trx  v2o5p    0.24
3     blnt   g2o3   807.2
4        a   nxxx     3.7
5      h45  cacpp   61.76
6      trx  v2o5p    0.81
7     blnt   g2o3   905.8
8        a   nxxx    22.4
9      h45  cacpp   101.2
10     trx  v2o5p    0.97
11    blnt   g2o3   10089
12       a   nxxx    18.2
13     h45  cacpp  171.89
14     trx  v2o5p     1.2
15    blnt   g2o3   10345
16       a   nxxx     9.7
17     h45  cacpp   203.7
18     trx  v2o5p    1.98
19    blnt   g2o3   10979

【讨论】：

这看起来非常棒，但仍然出现错误.... def phs_tab(y): data = np.concatenate ([z[3:] for z in y]) data = data.reshape (-1, 3) df = pd.DataFrame (data, columns=y[0][:3]) print df phs_tab(y) data = data.reshape(-1, 3) ValueError: 新数组的总大小必须保持不变
尝试for z in y: print len(z) 检查是否所有的 y 都有预期的长度？
这一步给了我错误：data = data.reshape(-1, 3)

【解决方案2】：

假设一些虚拟数据：

In [122]: y1 = ('Formula', 'Phase', 'Value', 1, 2, 3, 4, 5, 6)
In [123]: y2 = ('Formula', 'Phase', 'Value', 7, 8, 9, 10, 11, 12)
In [124]: y = [y1, y2]

并使用来自 answer 的“grouper”配方按组进行迭代。

In [125]: from itertools import izip_longest

In [126]: def grouper(iterable, n, fillvalue=None):
     ...:     args = [iter(iterable)] * n
     ...:     return izip_longest(*args, fillvalue=fillvalue)

那么你可以做这样的事情吗？ grouper(y_tuple[3:], 3) 以 3 个一组迭代元组，不包括前 3 个元素。

In [127]: columns = y[0][:3]

In [128]: data = []
     ...: for y_tuple in y:
     ...:     for group_of_3 in grouper(y_tuple[3:], 3):
     ...:         data.append(list(group_of_3))
     ...:         

In [129]: data
Out[129]: [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]

In [130]: pd.DataFrame(data=data, columns=columns)
Out[130]: 
   Formula  Phase  Value
0        1      2      3
1        4      5      6
2        7      8      9
3       10     11     12

【讨论】：