从数据框python中删除字符答案

【问题标题】：Removing characters from the dataframe python从数据框python中删除字符
【发布时间】：2020-08-13 09:05:01
【问题描述】：

我想从表中的一列中替换一个 str。示例：我想从 df 列中删除 b"SET 和 b"MULTISET。如何做到这一点。我需要像这样的输出详情如下，

columns = ['cust_id', 'cust_name', 'vehicle', 'details', 'bill'] 
df = pd.DataFrame(data=t, columns=columns)
df
    
        cust_id     cust_name                   vehicle                             details                                                 bill
0   101         b"SET{'Tom','C'}"           b"MULTISET{'Toyota','Cruiser'}"     b"ROW('Street 1','12345678','NewYork, US')"             1200.00
1   102         b"SET{'Rachel','Green'}"    b"MULTISET{'Ford','se'}"            b"ROW('Street 2','12344444','Florida, US')"             2400.00
2   103         b"SET{'Chandler','Bing'}"   b"MULTISET{'Dodge','mpv'}"          b"ROW('Street 1','12345555','Georgia, US')"             601.10

所需输出：

    cust_id     cust_name                   vehicle                             details                                         bill
0   101         {'Tom','C'}                 {'Toyota','Cruiser'}            ('Street 1','12345678','NewYork, US')               1200.00
1   102         {'Rachel','Green'}          {'Ford','se'}                   ('Street 2','12344444','Florida, US')               2400.00
2   103         {'Chandler','Bing'}         {'Dodge','mpv'}                 ('Street 1','12345555','Georgia, US')               601.10

【问题讨论】：

print(t) 并在帖子中包含输出。
print(df) 的输出如下，cust_id cust_name 车辆详情 bill 0 101 b"SET{'Tom','C'}" b"MULTISET{'Toyota','Cruiser'} " b"ROW('Street 1','12345678','NewYork, US')" 1200.00 1 102 b"SET{'Rachel','Green'}" b"MULTISET{'Ford','se'}" b"ROW('Street 2','12344444','Florida, US')" 2400.00 2 103 b"SET{'Chandler','Bing'}" b"MULTISET{'Dodge','mpv'}" b "ROW('Street 1','12345555','Georgia, US')" 601.10
嗨 Sushanth，对不起，我弄糊涂了 print(df) 输出 .. print(t) 输出如下，[(101, b"SET{'Tom','C'}" , b"MULTISET{'Toyota','Cruiser'}", b"ROW('Street 1','12345678','NewYork, US')", 1200.0), (102, b"SET{'Rachel', 'Green'}”，然后继续

标签： python

【解决方案1】：

这是一个可能的解决方案，

让我们定义感兴趣的列，

columns = ['cust_name', 'vehicle', 'details']

使用正则表达式提取{} 或() 之间的值

regex_ = r"([\{|\(].*[\}|\)])"

综上所述，str.decode('ascii') 是将列值从byte 转换为string。

columns = ['cust_name', 'vehicle', 'details']

regex_ = r"([\{|\(].*[\}|\)])"

for col in columns:
    df[col] = df[col].str.decode('ascii').str.extract(regex_)

   cust_id            cust_name  ...                                details    bill
0      101          {'Tom','C'}  ...  ('Street 1','12345678','NewYork, US')  1200.0
1      102   {'Rachel','Green'}  ...  ('Street 2','12344444','Florida, US')  2400.0
2      103  {'Chandler','Bing'}  ...  ('Street 1','12345555','Georgia, US')   601.1

【讨论】：

进一步延续上述场景，如果我只想访问 Tom，我可以访问 cust_name 的第一个值吗？我该怎么做？
输出应该是我想在上面的情况下访问 cust_name 的第一个值在第一行它是 cust_name[0] Tom 然后 cust_name[1] 'C' 在第二行我想访问'Rachel ' 然后是 'Green' 有没有办法做同样的事情？
看到这篇文章，stackoverflow.com/a/56842372/4985099
我提到了上面的链接，df['cust_name2'] = df['cust_name'].apply(ast.literal_eval) df['cust_name2'] 我得到如下输出，0 {C，Tom } 1 {Rachel, Green} 但我希望行的第一个值为 C，然后第二个为 Tom。从第 2 行的第一个值为 Rachel，然后第二个值为 Green