【发布时间】:2021-03-19 00:30:14
【问题描述】:
我大大简化了这一点 - 实际列将包含多达 500 个字符,子字符串列表将有 60 个值,介于 10-80 个字符之间
数据框比这更复杂,列表将包含 60 个值,DF 将有 11,000 行,但这是我想要做的
我有一个这样的数据框和一个列表
my_list = ['alabama 500', 'beta 15', 'carthouse', 'd320 blend']
df = pd.DataFrame({'col1':['left side alabama 500 on the right side carthouse', '1st entry is at beta 15', 'this one takes a mix of d320 blend and beta 15']})
col1
0 left side alabama 500 on the right side carthouse
1 1st entry is at beta 15
2 this one takes a mix of d320 blend and beta 15
我正在尝试编写一个函数来返回它,保持第一列完整,并在原始列完整的新列中返回子字符串
df['col2']
col1 col2
0 left side alabama 500 on the right side carthouse alabama 500
1 left side alabama 500 on the right side carthouse carthouse
2 1st entry is at beta 15 beta 15
3 this one takes a mix of d320 blend and beta 15 beta 15
4 this one takes a mix of d320 blend and beta 15 d320 blend
这是我尝试过的
def add_new_col(data, col_name, my_list):
#function looks at the column col_name in a dataframe data, if the substring exists, it adds a new
#column with only that substring, keeping multiples
for i in my_list:
if data[col_name].str.contains(i):
data['col2'] = i
else:
continue
return data
在笔记本中运行函数
my_list = ['a', 'b', 'c', 'd']
add_new_col(df, 'col1', my_list)
返回此错误:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
基于其他一些答案,我也尝试了这个
def add_new_col(data, col_name, my_list):
#function looks at the column col_name in a dataframe data, if the substring exists, it adds a new
#column with only that substring, keeping multiples
for i in my_list:
if data[data[col_name].str.contains(i)]:
data['col2'] = i
else:
continue
return data
给出了相同的错误代码
【问题讨论】:
-
不理解 col2 的逻辑。例如,col1 中的第 0 行和第 1 行都是“acd”,但在 col2 中,我们得到第 0 行的“a”和第 1 行的“c”。
标签: python pandas string dataframe