【问题标题】:How to filter a list of tuples for every row in a pandas dataframe?如何过滤熊猫数据框中每一行的元组列表?
【发布时间】:2021-10-03 14:18:18
【问题描述】:

您好,我正在尝试过滤第二个元素以“V”开头的元组列表,以清理我的数据框。

我有一个 pandas 数据框调用“df_my_string”,例如:

一个样本是:

verbs_tokens
[('[', 'NNS'), ("'Europe", "''"), ('was', 'VBD'), ('always', 'RB'), ('the', 'DT'), ('future', 'NN'), ('.', '.'), ("'", "''"), (']', 'NN')]
[('[', 'IN'), ("'Europe", 'CD'), ('marks', 'NNS'), ('its', 'PRP$'), ('anniversary', 'NN'), (',', ','), ('it', 'PRP'), ('is', 'VBZ')]

我需要的是保留第二个值以“V”开头的每一行的元组

我尝试了很多方法,但我不知道如何:

 #df_my_string['clean_verbs_tokens']=filter((lambda x: x[1].startswith('V')),df_my_string[['verbs_tokens']])
#df_my_string['clean_verbs_tokens'] = df_my_string.verbs_tokens.apply(lambda x: str(x[0][1]).startswith('V'))
#df_my_string['clean_verbs_tokens'] = df_my_string.verbs_tokens.apply(lambda x: str(x[0][1]).startswith('V'))
#df_my_string['clean_verbs_tokens'] = [tup for tup in df_my_string['verbs_tokens'] if str(tup[0][1])=='V']
#df_my_string['clean_verbs_tokens'] = [item  for item in df_my_string['verbs_tokens'] if pd.Series(re.search('^V.*',item[0][1])).reset_index(drop=True).values]

预期输出:

verbs_tokens
[('was', 'VBD')]
[('is', 'VBZ')]

【问题讨论】:

    标签: python pandas list tuples nltk


    【解决方案1】:

    试试:

    df_my_string['clean_verbs_tokens'] = df_my_string["verbs_tokens"].apply(lambda x: [t for t in x if t[1].lower().startswith("v")])
    
    >>> df_my_string['clean_verbs_tokens']
    0    [(was, VBD)]
    1     [(is, VBZ)]
    Name: clean_verbs_tokens, dtype: object
    

    【讨论】:

      【解决方案2】:
      # this is wrong because x is containing list of tuples
      # so basically you are applying the condition only on 
      # the first tuple
      df_my_string['clean_verbs_tokens'] = df_my_string.verbs_tokens.apply(lambda x: str(x[0][1]).startswith('V'))
      
      # try this
      df_my_string['clean_verbs_tokens'] = df_my_string.verbs_tokens.apply(lambda x: [tup for tup in x if tup[1][0]=="V"])
      

      【讨论】:

        【解决方案3】:

        这是一个解决方案:

        df = pd.DataFrame(
            { 
               'Tuples' :  [ [('[', 'IN'), ("'Europe", 'CD'), ('marks', 'NNS'), ('its', 'PRP$'), ('anniversary', 'NN'), (',', ','), ('it', 'PRP'), ('is', 'VBZ')],
        [('[', 'NNS'), ("'Europe", "''"), ('was', 'VBD'), ('always', 'RB'), ('the', 'DT'), ('future', 'NN'), ('.', '.'), ("'", "''"), (']', 'NN')] ]
            } )
        
        

        定义一个函数来查找以任何字符开头的元组:

        def find_char(tuples , char):
            start_with_char = []
            
            for tp in tuples: 
                if tp[1][:1]  == char:
                        start_with_char.append(tp)
                        
            return start_with_char
            
        

        在你的数据框上应用函数:

        df['Tuples'].apply(lambda row: find_char(row ,'V')   )
        
        

        结果:

        0     [(is, VBZ)]
        1    [(was, VBD)]
        
        

        注意:此解决方案将为您提供具有字符的元组列表

        【讨论】:

          猜你喜欢
          • 2019-05-21
          • 2017-12-15
          • 2021-12-01
          • 2018-11-24
          • 2019-05-21
          • 1970-01-01
          • 2020-08-08
          • 1970-01-01
          • 2014-12-27
          相关资源
          最近更新 更多