【发布时间】:2018-02-27 16:20:26
【问题描述】:
给定数据框 df 和 df2:
>>> df = pd.DataFrame([[1,'a','b'], [1, 'c', 'd'],
[2, 'c', 'd'], [1, 'f', 'o'],
[2, 'b', 'a']], columns=['x', 'y', 'z'])
>>> df2 = pd.DataFrame([[1, 'apple'], [2, 'orange'],
[3, 'pear']], columns=['x', 'fruit'])
>>> df
x y z
0 1 a b
1 1 c d
2 2 c d
3 1 f o
4 2 b a
>>> df2
x fruit
0 1 apple
1 2 orange
2 3 pear
如何基于共享的x 列创建具有fruit 列值的新列?
期望的输出:
>>> df
x y z fruit
0 1 a b apple
1 1 c d apple
2 2 c d orange
3 1 f o apple
4 2 b a orange
我已经尝试过了,它有效,但我确信有更简单的方法可以做到这一点:
>>> df['fruit'] = [list(df2[df2['x'] == row['x']]['fruit'])[0] for idx, row in df.iterrows()]
>>> df
x y z fruit
0 1 a b apple
1 1 c d apple
2 2 c d orange
3 1 f o apple
4 2 b a orange
请注意,上面的 Dataframe 没有索引。如果数据帧被索引,尝试的方法将不起作用:
>>> df = df.set_index('x')
>>> df2 = df2.set_index('x')
>>> df
y z fruit
x
1 a b apple
1 c d apple
2 c d orange
1 f o apple
2 b a orange
>>> df2
fruit
x
1 apple
2 orange
3 pear
>>> df['fruit'] = [list(df2[df2['x'] == row['x']]['fruit'])[0] for idx, row in df.iterrows()]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 2062, in __getitem__
return self._getitem_column(key)
File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 2069, in _getitem_column
return self._get_item_cache(key)
File "/usr/local/lib/python2.7/site-packages/pandas/core/generic.py", line 1534, in _get_item_cache
values = self._data.get(item)
File "/usr/local/lib/python2.7/site-packages/pandas/core/internals.py", line 3590, in get
loc = self.items.get_loc(item)
File "/usr/local/lib/python2.7/site-packages/pandas/core/indexes/base.py", line 2395, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5239)
File "pandas/_libs/index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5085)
File "pandas/_libs/hashtable_class_helper.pxi", line 1207, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20405)
File "pandas/_libs/hashtable_class_helper.pxi", line 1215, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20359)
KeyError: 'x'
【问题讨论】:
-
简单合并
x列:pd.merge(df, df2, on='x') -
完全相同的订单
df.merge(df2, how='left')
标签: python pandas join dataframe merge