【问题标题】:Splitting one column's text based on another column's in Pandas dataframe根据 Pandas 数据框中的另一列拆分一列的文本
【发布时间】:2018-07-13 18:54:29
【问题描述】:

我的数据框中有两列,“主题”和“描述”。我正在尝试通过从主题列中拆分文本数据来清理描述列,因为它包含在描述的所有行中。

这是主题列的 sn-p:

Subject
1     Question about the program   
2  Technical issue with the site    

还有描述栏:

Description \
1  An HTML only email was received and a rough conversion is below. 
Please refer to the Emails related list for the HTML contents of the 
message. Question about the program Hello Hello I was wondering if there 
is going to be a product review coming up soon?

2  An HTML only email was received and a rough conversion is below. 
Please refer to the Emails related list for the HTML contents of the 
message. Technical issue with the site Reviews I received emails stating 
that I need to rewrite two of my reviews    

例如在第 1 行,我希望在描述列的第一行中拆分“关于程序的问题”,并且只捕获该字符串之后的文本。

我试过了 df['Description'] = df.apply(lambda x: x['Description'].split(x['Subject'], 1), axis=1)['Description'] 但我没有运气并在不包含描述中标题的索引上收到错误“TypeError:('必须是str或None,不是float')”。如何处理不包含此确切文本的行,同时仍拆分包含的行?

任何帮助将不胜感激。谢谢。

我也尝试了建议的响应并收到此错误。 IndexError: ('list index out of range', 'occurred at index 1')

【问题讨论】:

  • 我试过了,但得到这个错误:“AttributeError: ("'str' object has no attribute 'str'", 'occured at index 1')"

标签: python pandas dataframe split


【解决方案1】:

您需要将df['Description'] 中的字符串拆分为Subject 中的特定值,并在拆分后取出后面的部分。

df.apply(lambda x: x['Description'].split(x['Subject'])[1], axis=1)

输出:

0     Hello Hello I was wondering if there is going...
1     Reviews I received emails stating that I need...

【讨论】:

  • 好的,我已经尝试过了,现在我得到一个 IndexError,这可能是什么原因? IndexError: ('list index out of range', 'occurred at index 1')
  • 对于哪一行,您收到错误了吗?这告诉我,对于您的某些数据,描述没有主题作为子字符串。
  • 所有行。它要么抛出列表索引超出范围,要么列表没有 attr str,鉴于上下文,这两者都为零意义
  • @Garglesoap 如您所见,它适用于 OP。 :((
猜你喜欢
  • 1970-01-01
  • 2021-03-31
  • 1970-01-01
  • 1970-01-01
  • 2017-08-11
  • 1970-01-01
  • 2021-09-11
  • 1970-01-01
  • 2014-02-28
相关资源
最近更新 更多