【问题标题】:Convert a dataframe to a list of tuples将数据框转换为元组列表
【发布时间】:2021-08-11 22:05:24
【问题描述】:

我有一张桌子 pandas DF,看起来像

Slave start_addr0 end_addr0 start_addr1 end_addr1 start_addr2 end_addr2
0 0 10000000 1FFFFFFF NaN NaN NaN NaN
1 1 20000000 2007FFFF 40000000 40005FFF NaN NaN
2 1 20000000 2007FFFF 20100000 201FFFFF NaN NaN
3 2 20200000 202FFFFF 20080000 20085FFF 40006000 400FFFFF
4 3 0 0FFFFFFF NaN NaN NaN NaN
5 4 20300000 203FFFFF NaN NaN NaN NaN
6 5 20400000 204FFFFF NaN NaN NaN NaN

对于每个从属编号,我需要将其转换为范围列表(元组)。例如,

Slave1_list = ( (20000000, 2007FFFF), (40000000, 40005FFF), (20100000, 201FFFFF))

从属设备(行)和地址对(列)的数量可以变化。

谢谢

编辑

运行以下代码将示例数据加载到数据框中:

import pandas as pd
import io

f = io.StringIO('''Slave|start_addr0|end_addr0|start_addr1|end_addr1|start_addr2|end_addr2
0|10000000|1FFFFFFF|NaN|NaN|NaN|NaN
1|20000000|2007FFFF|40000000|40005FFF|NaN|NaN
1|20000000|2007FFFF|20100000|201FFFFF|NaN|NaN
2|20200000|202FFFFF|20080000|20085FFF|40006000|400FFFFF
3|0|0FFFFFFF|NaN|NaN|NaN|NaN
4|20300000|203FFFFF|NaN|NaN|NaN|NaN
5|20400000|204FFFFF|NaN|NaN|NaN|NaN
''')
df = pd.read_csv(f, sep='|', engine='python', index_col=None)

【问题讨论】:

  • 你能发一个我们可以copy and run的df吗。其次,给出该数据帧的确切预期输出。谢谢

标签: python pandas dataframe


【解决方案1】:

类似于以下内容:

import pandas as pd
from collections import defaultdict

data = [{'Slave': 1, 'start_addr0': 12, 'end_addr0': 189, 'start_addr1': 9, 'end_addr1': 17},
        {'Slave': 1, 'start_addr0': 3, 'end_addr0': 6, 'start_addr1': 1, 'end_addr1': 4},
        {'Slave': 3, 'start_addr0': 1, 'end_addr0': 7, 'start_addr1': 2, 'end_addr1': 14}]

df = pd.DataFrame(data)

print(df)
result = defaultdict(list)
rows = df.to_dict(orient='records')
for row in rows:
    slave = row.get('Slave')
    for key, start_value in row.items():
        if key.startswith('start_addr'):
            idx = key[-1]
            end_value = row.get('end_addr' + idx)
            result[slave].append((start_value, end_value))
        else:
            continue

print('result:')
print(result)

输出

   Slave  start_addr0  end_addr0  start_addr1  end_addr1
0      1           12        189            9         17
1      1            3          6            1          4
2      3            1          7            2         14
result:
defaultdict(<class 'list'>, {1: [(12, 189), (9, 17), (3, 6), (1, 4)], 3: [(1, 7), (2, 14)]})

【讨论】:

    【解决方案2】:

    我想这就是你要找的:

    def make_tuples(x):
        return tuple([x['start_addr0'], x['end_addr0']])
    
    # simple tuples
    result = tuple(df[['start_addr0', 'end_addr0']].apply(make_tuples, axis=1).tolist())
    print(result)
    
    # unique tuples
    unique_result = tuple(df[['start_addr0', 'end_addr0']].apply(make_tuples, axis=1).unique().tolist())
    print(unique_result)
    

    输出

    ((10000000, '1FFFFFFF'), (20000000, '2007FFFF'), (20000000, '2007FFFF'), (20200000, '202FFFFF'), (0, '0FFFFFFF'), (20300000, '203FFFFF'), (20400000, '204FFFFF'))
    ((10000000, '1FFFFFFF'), (20000000, '2007FFFF'), (20200000, '202FFFFF'), (0, '0FFFFFFF'), (20300000, '203FFFFF'), (20400000, '204FFFFF'))
    

    【讨论】:

      【解决方案3】:

      你可以试试:

      一个选项通过wide_to_long:

      
      df = df.reset_index()
      result = pd.wide_to_long(df, stubnames=['start_addr', 'end_addr'], i=['index', 'Slave'], j='add_num', sep='').dropna(
      ).reset_index([0, -1], drop=True).apply(tuple, 1).groupby(level=0).agg(list)
      

      一个选项来自groupby

      k = df.set_index('Slave').stack().reset_index()
      result = k.groupby(k.index//2).agg({'Slave': 'first', 0 : tuple}).groupby('Slave').agg({0 : set})
      

      解释

      df.set_index('Slave').stack().reset_index() 将删除 NaN 值并堆叠数据帧。

      k.groupby(k.index//2) 将对备用行进行分组并执行所需的聚合(在此步骤中形成元组)

      .groupby('Slave').agg({0 : set}) -> 最后一个 groupby 是为每个从属捕获唯一的元组。

      输出:

                                                                                  0
      Slave                                                                        
      0                                                      {(10000000, 1FFFFFFF)}
      1      {(40000000.0, 40005FFF), (20100000.0, 201FFFFF), (20000000, 2007FFFF)}
      2      {(20080000.0, 20085FFF), (40006000.0, 400FFFFF), (20200000, 202FFFFF)}
      3                                                             {(0, 0FFFFFFF)}
      4                                                      {(20300000, 203FFFFF)}
      5                                                      {(20400000, 204FFFFF)}
      

      注意:我假设每个start_addr 都存在一个end_addr

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2018-02-03
        • 2017-12-30
        • 1970-01-01
        • 2018-02-05
        • 1970-01-01
        相关资源
        最近更新 更多