【问题标题】:Lowercase sentences in lists in pandas dataframe熊猫数据框中列表中的小写句子
【发布时间】:2018-08-04 18:33:41
【问题描述】:

我有一个如下所示的熊猫数据框。我想将所有文本转换为小写。我如何在 python 中做到这一点?

数据框示例

[Nah I don't think he goes to usf, he lives around here though]                                                                                                                                                                                                                          

[Even my brother is not like to speak with me., They treat me like aids patent.]                                                                                                                                                                                                      

[I HAVE A DATE ON SUNDAY WITH WILL!, !]                                                                                                                                                                                                                                                  

[As per your request 'Melle Melle (Oru Minnaminunginte Nurungu Vettam)' has been set as your callertune for all Callers., Press *9 to copy your friends Callertune]                                                                                                                      

[WINNER!!, As a valued network customer you have been selected to receivea £900 prize reward!, To claim call 09061701461., Claim code KL341., Valid 12 hours only.]

我尝试了什么

def toLowercase(fullCorpus):
   lowerCased = [sentences.lower()for sentences in fullCorpus['sentTokenized']]
   return lowerCased

我收到此错误

lowerCased = [sentences.lower()for sentences in fullCorpus['sentTokenized']]
AttributeError: 'list' object has no attribute 'lower'

【问题讨论】:

    标签: python pandas nlp


    【解决方案1】:

    很简单:

    df.applymap(str.lower)
    

    df['col'].apply(str.lower)
    df['col'].map(str.lower)
    

    好的,你有成行的列表。那么:

    df['col'].map(lambda x: list(map(str.lower, x)))
    

    【讨论】:

    • 我收到这些错误 - TypeError: ("descriptor 'lower' requires a 'str' object but received a 'list'", 'occured at index sentTokenized'), TypeError: descriptor 'lower' requires一个“str”对象,但收到一个“列表”
    • @Kabilesh 查看更新。
    • 谢谢。使用 colab 找到有用的
    【解决方案2】:

    也可以将其设为string,使用str.lower 并返回列表。

    import ast
    df.sentTokenized.astype(str).str.lower().transform(ast.literal_eval)
    

    【讨论】:

    • 这会为我节省很多时间
    【解决方案3】:

    您可以尝试使用applymap

    def toLowercase(fullCorpus):
       lowerCased = fullCorpus['sentTokenized'].apply(lambda row:list(map(str.lower, row)))
       return lowerCased
    

    【讨论】:

      【解决方案4】:

      numpy 也有一个很好的方法:

      fullCorpus['sentTokenized'] = [np.char.lower(x) for x in fullCorpus['sentTokenized']]
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2015-06-26
        • 2023-03-13
        • 2015-07-31
        • 2017-08-09
        • 1970-01-01
        • 2019-08-12
        • 2016-01-02
        • 2022-11-15
        相关资源
        最近更新 更多