如何在行中存在某些字符的熊猫数据框中获取子字符串？答案

【问题标题】：How to get substring in panda data frame when certain characters exist in the row?如何在行中存在某些字符的熊猫数据框中获取子字符串？
【发布时间】：2017-05-27 06:16:54
【问题描述】：

我有一个数据框，其中某些行包含特殊字符“#”。

这是我的数据，我可以找到 '#' 的索引位置：

import pandas as pd
df = pd.DataFrame(data=['fig#abc', 'strawberry', 'applepie#efg'], columns=['fruitname'])
ind= df.fruitname.str.find("#")
df['col1'].str.find(".")-1]
print df
print ind


    fruitname
0   fig#abc
1   strawberry
2   applepie#efg

0    3
1   -1
2    8

如果索引 '#' 大于 4，我想要一个新列数据，其值是 '#' 之前的前几个字符，否则原始数据的值为：

   fruitname_new
0  fig#abc
1  strawberry
2  applepie

获得此结果的最佳方法是什么？

【问题讨论】：

标签： python-2.7 pandas dataframe substring

【解决方案1】：

#use apply to split fruitname and then check the length before setting the new fruitname column.

df['fruitname_new'] = df.apply(lambda x: x.fruitname if len(x.fruitname.split('#')[0])<=4 else x.fruitname.split('#')[0], axis=1)

df
Out[484]: 
      fruitname fruitname_new
0       fig#abc       fig#abc
1    strawberry    strawberry
2  applepie#efg      applepie

【讨论】：