Python 将 pandas 数据框中的单词单数化答案

【问题标题】：Python singularize words from pandas dataframePython 将 pandas 数据框中的单词单数化
【发布时间】：2018-08-06 12:13:34
【问题描述】：

我想将“短语”栏中的复数词转换为单数词。如何遍历每一行和每一项？

my_data = [('Audi Cars', 'Vehicles'),
           ('Two Parrots', 'animals'),
           ('Tall Buildings', 'Landmark')]
test = pd.DataFrame(my_data)
test.columns = ["Phrase","Connection"]
test

我试过了

test["Phrase"] = test["Phrase"].str.lower().str.split()
import inflection as inf
test["Phrase"].apply(lambda x:inf.singularize([item for item in x]))

我想要的输出是

Phrase:         Connection:
Audi Car        Vehicles
Two Parrot      animals
Tall Building   Landmark

请注意，我只想单数化一列Phase

【问题讨论】：

它可以与test["Phrase"].apply(lambda x:[inf.singularize(item) for item in x]) 一起使用吗？
@xdze2 谢谢，是的，它有效，但是我也必须加入这些话。所以下面 vivek 提供的答案可以按预期工作
@xdze2 正如您在列表理解中的评论中所解释的 x 必须是 x.split() 然后是 ' '.join() 外部
怎么样rstip() ，像test['Phrase'] = test['Phrase'].apply(lambda x:x.rstrip("s")) 这样的东西？

标签： python pandas loops nlp

【解决方案1】：

略有变化-

test['clean'] = test['Phrase'].apply(lambda x: ' '.join([inf.singularize(item) for item in x.split()]))

输出

           Phrase Connection          clean
0       Audi Cars   Vehicles       Audi Car
1     Two Parrots    animals     Two Parrot
2  Tall Buildings   Landmark  Tall Building

说明

在您现有的代码中，您正在这样做 -

test["Phrase"].apply(lambda x:inf.singularize([item for item in x]))

让我们以第一个例子来看看会发生什么。 x 在这种情况下将是 Audi Cars -

[item for item in x] 返回一个字符列表 - ['A', 'u', 'd', 'i', ' ', 'C', 'a', 'r', 's'] 所以singularize 不起作用，因为它只对字符起作用。

诀窍是使用 x.split() 拆分单词，然后将 singularize 放入列表理解中。

最后执行' '.join() 来取回字符串。

【讨论】：

谢谢，它工作得很好，我也很欣赏你的精彩解释，在这种情况下我不需要写 test["Phrase"] = test["Phrase"].str.lower( ).str.split()
总是更喜欢使用字符串方法（例如str.lower）而不是apply。 Apply 是始终使用的最后手段，因为它很慢并且没有利用 pandas 矢量化的任何优势