【问题标题】:How to create a new column with column values based on the shared column across 2 Dataframes?如何基于跨 2 个数据框的共享列创建具有列值的新列?
【发布时间】:2018-02-27 16:20:26
【问题描述】:

给定数据框 dfdf2

>>> df = pd.DataFrame([[1,'a','b'], [1, 'c', 'd'], 
                       [2, 'c', 'd'], [1, 'f', 'o'], 
                       [2, 'b', 'a']], columns=['x', 'y', 'z'])

>>> df2 = pd.DataFrame([[1, 'apple'], [2, 'orange'], 
                        [3, 'pear']], columns=['x', 'fruit'])

>>> df
   x  y  z
0  1  a  b
1  1  c  d
2  2  c  d
3  1  f  o
4  2  b  a

>>> df2
   x   fruit
0  1   apple
1  2  orange
2  3    pear

如何基于共享的x 列创建具有fruit 列值的新列?

期望的输出:

>>> df
   x  y  z   fruit
0  1  a  b   apple
1  1  c  d   apple
2  2  c  d  orange
3  1  f  o   apple
4  2  b  a  orange

我已经尝试过了,它有效,但我确信有更简单的方法可以做到这一点:

>>> df['fruit'] = [list(df2[df2['x'] == row['x']]['fruit'])[0] for idx, row in df.iterrows()]
>>> df
   x  y  z   fruit
0  1  a  b   apple
1  1  c  d   apple
2  2  c  d  orange
3  1  f  o   apple
4  2  b  a  orange

请注意,上面的 Dataframe 没有索引。如果数据帧被索引,尝试的方法将不起作用:

>>> df = df.set_index('x')
>>> df2 = df2.set_index('x')
>>> df
   y  z   fruit
x              
1  a  b   apple
1  c  d   apple
2  c  d  orange
1  f  o   apple
2  b  a  orange
>>> df2
    fruit
x        
1   apple
2  orange
3    pear
>>> df['fruit'] = [list(df2[df2['x'] == row['x']]['fruit'])[0] for idx, row in df.iterrows()]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 2062, in __getitem__
    return self._getitem_column(key)
  File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 2069, in _getitem_column
    return self._get_item_cache(key)
  File "/usr/local/lib/python2.7/site-packages/pandas/core/generic.py", line 1534, in _get_item_cache
    values = self._data.get(item)
  File "/usr/local/lib/python2.7/site-packages/pandas/core/internals.py", line 3590, in get
    loc = self.items.get_loc(item)
  File "/usr/local/lib/python2.7/site-packages/pandas/core/indexes/base.py", line 2395, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5239)
  File "pandas/_libs/index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5085)
  File "pandas/_libs/hashtable_class_helper.pxi", line 1207, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20405)
  File "pandas/_libs/hashtable_class_helper.pxi", line 1215, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20359)
KeyError: 'x'

【问题讨论】:

  • 简单合并x 列:pd.merge(df, df2, on='x')
  • 完全相同的订单df.merge(df2, how='left')

标签: python pandas join dataframe merge


【解决方案1】:

使用merge:

df.merge(df2, on='x')

输出:

   x  y  z   fruit
0  1  a  b   apple
1  1  c  d   apple
2  1  f  o   apple
3  2  c  d  orange
4  2  b  a  orange

【讨论】:

    【解决方案2】:

    或使用map

    df = pd.DataFrame([[1,'a','b'], [1, 'c', 'd'],
                               [2, 'c', 'd'], [1, 'f', 'o'],
                               [2, 'b', 'a']], columns=['x', 'y', 'z'])
    
    df2 = pd.DataFrame([[1, 'apple'], [2, 'orange'],
                            [3, 'pear']], columns=['x', 'fruit'])
    
    df['fruit']=df.x.map(df2.set_index('x').fruit)
    
    
    df
    Out[257]: 
       x  y  z   fruit
    0  1  a  b   apple
    1  1  c  d   apple
    2  2  c  d  orange
    3  1  f  o   apple
    4  2  b  a  orange
    

    假设你已经完成了 set_index() 按索引合并,那么 ~

    df = df.set_index('x')
    df2 = df2.set_index('x')
    
    df.merge(df2,left_index=True,right_index=True)
    
    Out[260]: 
       y  z   fruit
    x              
    1  a  b   apple
    1  c  d   apple
    1  f  o   apple
    2  c  d  orange
    2  b  a  orange
    

    【讨论】:

      【解决方案3】:

      为了完整性

      df.join(df2.set_index('x'), on='x')
      
         x  y  z   fruit
      0  1  a  b   apple
      1  1  c  d   apple
      2  2  c  d  orange
      3  1  f  o   apple
      4  2  b  a  orange
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2016-09-02
        • 1970-01-01
        • 2021-08-12
        相关资源
        最近更新 更多