【问题标题】:How can I do this split process in Python?如何在 Python 中执行此拆分过程?
【发布时间】:2021-12-30 13:25:55
【问题描述】:

我正在尝试在表中制作数据标签,并且我需要以这样一种方式进行操作,即在每一行中重复索引,但是,在每一列中都有另一个 Enum 类。

到目前为止,我所做的是使用相同的枚举器类进行此表示。

将列单独用作列表的解决方案也是可能的。但是解决这个问题的最佳方法是什么?

import pandas as pd
from enum import Enum


df = pd.DataFrame({'first': ['product and other', 'product2 and other', 'price'], 'second':['product and prices', 'price2', 'product3 and price']})
df

class Tipos(Enum):
    B = 1
    I = 2
    L = 3

for index, row in df.iterrows():
    sentencas = row.values
    for sentenca in sentencas:
        for pos, palavra in enumerate(sentenca.split()):
            print(f"{palavra} {Tipos(pos+1).name}")

结果:

                first              second
0   product and other  product and prices
1  product2 and other              price2
2               price  product3 and price

product B
and I
other L
product B
and I
prices L
product2 B
and I
other L
price2 B
price B
product3 B
and I
price L

期望的结果:

        Word Ent
0    product B_first
1        and I_first
2      other L_first
3    product B_second
4        and I_second
5     prices L_second
6   product2 B_first
7        and I_first
8      other L_first
9     price2 B_second
10     price B_first
11  product3 B_second
12       and I_second
13     price L_second

# In that case, the sequence is like that: (B_first, I_first, L_first, L_first...) and if changes the column gets B_second, I_second, L_second...

【问题讨论】:

    标签: python pandas enums


    【解决方案1】:

    您可以使用dict 映射,而不是使用Enum。如果你扁平化你的数据框,你可以避免循环:

    out = df.unstack().str.split().explode().sort_index(level=1).to_frame('Word')
    out['Ent'] = out.groupby(level=[0, 1]).cumcount().map(Tipos) \
                     + '_' + out.index.get_level_values(0)
    out = out.reset_index(drop=True)
    

    输出:

    >>> out
            Word       Ent
    0    product   B_first
    1        and   I_first
    2      other   L_first
    3    product  B_second
    4        and  I_second
    5     prices  L_second
    6   product2   B_first
    7        and   I_first
    8      other   L_first
    9     price2  B_second
    10     price   B_first
    11  product3  B_second
    12       and  I_second
    13     price  L_second
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2015-07-07
      • 2023-01-25
      • 2021-07-13
      • 2017-06-14
      相关资源
      最近更新 更多