【问题标题】:join two unique combinations of single DataFrame, convert it into column name加入单个DataFrame的两个唯一组合,将其转换为列名
【发布时间】:2021-07-30 19:16:57
【问题描述】:

我有一个有趣的问题,我尝试这样做,但没有奏效。我有一个包含 4 列的时间序列数据框:源、目标、时间戳和值。

每个时间戳都有多个源、目标和值作为提供的代码:

import pandas as pd
data = 
    [['a','None','01.01.2020',20], ['a','None','02.01.2020',15],['a','None','03.01.2020',11],
    ['a','b','01.01.2020',100], ['a','b','02.01.2020',105], ['a','b','03.01.2020',101],
    ['c','d','01.01.2020',0], ['c','d','02.01.2020',0], ['c','d','03.01.2020',1],
    ['b','c','01.01.2020',50.45], ['b','c','02.01.2020',10.5], ['b','c','03.01.2020',500],
    ['a','d','01.01.2020',5000], ['a','d','02.01.2020',1500], ['a','d','03.01.2020',25],
    ['c','a','01.01.2020',2.2538], ['c','a','02.01.2020',105], ['c','a','03.01.2020',110]]

df = pd.DataFrame(data, columns = ['Source', 'Target', 'timestamp', 'values'])

我想返回一个新的数据格式作为定义的数据框

resultdata = [['01.01.2020',20,100,0,50.45,5000,2.2538], ['02.01.2020',15,105,0,10.5, 1500,105],
          ['03.01.2020',11,101,1,500,25,110]]
result = pd.DataFrame(resultdata, columns = ['timestamp', 'aNone', 'ab', 'cd', 'bc', 'ad', 'ca'])

为此,我尝试加入字符串列并删除重复的时间戳,然后运行迭代,但我只收到字典格式的最后一次迭代数据。

df['Source Target'] = df['Source']  + ' ' + df['Target']
st = df['Source Target'].drop_duplicates(keep= 'first').reset_index(drop=True)
timestamp = df['timestamp'].drop_duplicates(keep= 'first')

d ={}
for j in range(len(timestamp)):
    Time = timestamp ['timestamp'][j]
    for k in range(len(st)):
        Column = st[k] 
        for i in range(len(df)):
            time =  df['timestamp'][i]
            columnname =  df['Source Target'][i]
            if time==Time and columnname == Column:
                d[Column] = (time,df['values'][i])

【问题讨论】:

    标签: python pandas dataframe for-loop time-series


    【解决方案1】:

    让我们改用pivot_table

    import pandas as pd
    
    data = [['a', 'None', '01.01.2020', 20], ['a', 'None', '02.01.2020', 15],
            ['a', 'None', '03.01.2020', 11], ['a', 'b', '01.01.2020', 100],
            ['a', 'b', '02.01.2020', 105], ['a', 'b', '03.01.2020', 101],
            ['c', 'd', '01.01.2020', 0], ['c', 'd', '02.01.2020', 0],
            ['c', 'd', '03.01.2020', 1], ['b', 'c', '01.01.2020', 50.45],
            ['b', 'c', '02.01.2020', 10.5], ['b', 'c', '03.01.2020', 500],
            ['a', 'd', '01.01.2020', 5000], ['a', 'd', '02.01.2020', 1500],
            ['a', 'd', '03.01.2020', 25], ['c', 'a', '01.01.2020', 2.2538],
            ['c', 'a', '02.01.2020', 105], ['c', 'a', '03.01.2020', 110]]
    
    df = pd.DataFrame(data, columns=['Source', 'Target', 'timestamp', 'values'])
    
    # Create Pivot Table
    df = df.pivot_table(index='timestamp', 
                        columns=['Source', 'Target'], 
                        values='values').reset_index()
    
    # Reduce mutli-index columns
    df.columns = df.columns.map(''.join)
    
    # Fix dtypes
    df = df.convert_dtypes()
    
    # For Display
    print(df.to_string())
    

    df:

        timestamp  aNone   ab    ad     bc      ca  cd
    0  01.01.2020     20  100  5000  50.45  2.2538   0
    1  02.01.2020     15  105  1500   10.5   105.0   0
    2  03.01.2020     11  101    25  500.0   110.0   1
    

    【讨论】:

    • 我认为如果您在旋转后进行列聚合 (source + target) 可能会获得更快的速度......要聚合的值数量较少。
    • 我认为你是对的。更新后减少多索引列,而不是对所有列进行聚合。谢谢!
    【解决方案2】:

    您也可以使用pyjanitor 中的pivot_wider 进行整形,names_sep 有助于合并多索引列:

    #pip install janitor
    import pandas as pd
    import janitor
    df.pivot_wider(index="timestamp", 
                   names_from=["Source","Target"],
                   values_from="values", 
                   names_sep="").convert_dtypes()
     
        timestamp  aNone   ab  cd     bc    ad      ca
    0  01.01.2020     20  100   0  50.45  5000  2.2538
    1  02.01.2020     15  105   0   10.5  1500   105.0
    2  03.01.2020     11  101   1  500.0    25   110.0
    

    pivot_wider 是对 pandas pivot 函数的抽象。

    【讨论】:

      猜你喜欢
      • 2021-09-30
      • 1970-01-01
      • 2015-08-08
      • 2016-07-30
      • 1970-01-01
      • 1970-01-01
      • 2019-06-22
      • 2020-02-20
      • 2020-06-13
      相关资源
      最近更新 更多