【发布时间】:2018-07-13 18:54:29
【问题描述】:
我的数据框中有两列,“主题”和“描述”。我正在尝试通过从主题列中拆分文本数据来清理描述列,因为它包含在描述的所有行中。
这是主题列的 sn-p:
Subject
1 Question about the program
2 Technical issue with the site
还有描述栏:
Description \
1 An HTML only email was received and a rough conversion is below.
Please refer to the Emails related list for the HTML contents of the
message. Question about the program Hello Hello I was wondering if there
is going to be a product review coming up soon?
2 An HTML only email was received and a rough conversion is below.
Please refer to the Emails related list for the HTML contents of the
message. Technical issue with the site Reviews I received emails stating
that I need to rewrite two of my reviews
例如在第 1 行,我希望在描述列的第一行中拆分“关于程序的问题”,并且只捕获该字符串之后的文本。
我试过了
df['Description'] = df.apply(lambda x: x['Description'].split(x['Subject'], 1), axis=1)['Description']
但我没有运气并在不包含描述中标题的索引上收到错误“TypeError:('必须是str或None,不是float')”。如何处理不包含此确切文本的行,同时仍拆分包含的行?
任何帮助将不胜感激。谢谢。
我也尝试了建议的响应并收到此错误。 IndexError: ('list index out of range', 'occurred at index 1')
【问题讨论】:
-
我试过了,但得到这个错误:“AttributeError: ("'str' object has no attribute 'str'", 'occured at index 1')"
标签: python pandas dataframe split