【问题标题】:How do I iterate formulas in a python dictionary and save the results in pandas dataFrame?如何在 python 字典中迭代公式并将结果保存在 pandas dataFrame 中?
【发布时间】:2020-11-14 10:48:03
【问题描述】:

我有一本名为“评论”的字典:

reviews= {1: {'like': [0.0005, 0.0025], 'the': [0.5, 0.5], 'acting': [0.5, 0.5]},
          2: {'plot': [0.5, 0.5], 'hate': [0.0029, 0.0002], 'story': [0.5, 0.5]}}

对于字典的每条评论(在本例中为 1 和 2),我需要在它们的单词值上迭代两个公式。这些公式将计算每条评论的“neg_post_prob”和“pos_post_prob”。

公式是:

  1. 'neg_post_prob' = (neg_prior * pos) / (neg_prior * neg + pos_prior * pos)
  2. 'pos_post_prob' = (pos_prior * pos) / (neg_prior * neg + pos_prior * pos)

地点:

  • 'neg_prior' 是在上一次单词迭代中为 neg 计算的 'neg_post_prob',并且
  • 'pos_prior' 是在上一次单词迭代中为 pos 计算的 'pos_post_prob'

对于每条评论的第一个词,priors应该等于0.5

这是我审查 1 和 2 的代码:

#Review 1: 

# the prior before starting the iteration is 0.5
prior = 0.5

# priors after the first word "like"
neg_prior_like = (prior*0.0005) / (prior * 0.0005 + prior * 0.0025)
pos_prior_like = (prior*0.0025) / (prior * 0.0005 + prior * 0.0025)


# priors after the second word "the"
neg_prior_like_the = (neg_prior_like * 0.5) / (neg_prior_like * 0.5 + pos_prior_like * 0.5)
pos_prior_like_the = (pos_prior_like * 0.5) / (neg_prior_like * 0.5 + pos_prior_like * 0.5)


# post_prob after last word "acting"
neg_post_prob = (neg_prior_like_the * 0.5) / (neg_prior_like_the * 0.5 + pos_prior_like_the * 0.5)
pos_post_prob = (pos_prior_like_the * 0.5) / (neg_prior_like_the * 0.5 + pos_prior_like_the * 0.5)


validation = neg_post_prob + pos_post_prob
#Review 2: 

# the prior before starting the iteration is 0.5
prior = 0.5

# priors after the first word "plot"
neg_prior_plot = (prior*0.5) / (prior * 0.5 + prior * 0.5)
pos_prior_plot = (prior*0.5) / (prior * 0.5 + prior * 0.5)


# priors after the second word "hate"
neg_prior_plot_hate = (neg_prior_plot * 0.0029) / (neg_prior_plot * 0.0029 + pos_prior_plot * 0.0002)
pos_prior_plot_hate = (pos_prior_plot * 0.0002) / (neg_prior_plot * 0.0029 + pos_prior_plot * 0.0002)


# post_prob after last word "story"
neg_post_prob = (neg_prior_plot_hate * 0.5) / (neg_prior_plot_hate * 0.5 + pos_prior_plot_hate * 0.5)
pos_post_prob = (pos_prior_plot_hate * 0.5) / (neg_prior_plot_hate * 0.5 + pos_prior_plot_hate * 0.5)


validation = neg_post_prob + pos_post_prob

但我想要的结果是:

sentiment = {'review': [1, 2],
    'neg_post_prob': [0.17, 0.94],
    'pos_post_prob': [0.83, 0.06],
    'validation': [1, 1]
    }

sentiment = pd.DataFrame(sentiment, columns = ['review', 'neg_post_prob', 'pos_post_prob', 'validation'])

print (sentiment)

【问题讨论】:

    标签: python dictionary iteration


    【解决方案1】:

    使用来自 functools 模块的reduce

    代码

    from functools import reduce
    import pandas as pd
    
    def update(priors, values):
        '''
            Provides updated probabilities based upon previous pair of neg, pos
        '''
        # Previous neg, pos pair
        neg, pos = priors
        
        # New negative and positive (using OP update equation)
        scale = (pos *values[0] + neg * values[1])   # denominator
        new_neg = (neg*values[0]) / scale
        new_pos = (pos*values[1]) / scale
        return new_neg, new_pos                      # new update pair
        
    def calc(reviews):
        ''' Main function to perform calculations and 
            produce pandas data frame
        '''
        sentiment = {'review':[],
                     'neg_post_prob': [],
                     'pos_post_prob': [],
                     'validation': []}
        
        for review_id, word_values in reviews.items():
            # word_values is dictionary of negative/positive for words in review
            values = word_values.values()  # array of neg/pos values
            
            # Use reduce to iterative apply update function to sequence of value
            result = reduce(update, values, [0.5, 0.5])
            neg, pos = result
            validation = neg + pos
            
            # Update results
            sentiment['review'].append(review_id)
            sentiment['neg_post_prob'].append(neg)
            sentiment['pos_post_prob'].append(pos)
            sentiment['validation'].append(validation)
            
        
        return pd.DataFrame(sentiment)
            
    

    测试

    reviews= {1: {'like': [0.0005, 0.0025], 'the': [0.5, 0.5], 'acting': [0.5, 0.5]},
              2: {'plot': [0.5, 0.5], 'hate': [0.0029, 0.0002], 'story': [0.5, 0.5]}}
    
    df = calc(reviews)
    

    df

        review  neg_post_prob   pos_post_prob   validation
    0   1       0.166667        0.833333        1.0
    1   2       0.935484        0.064516        1.0
    

    【讨论】:

      猜你喜欢
      • 2022-01-11
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-12-31
      • 1970-01-01
      • 2021-08-19
      • 2020-05-21
      • 1970-01-01
      相关资源
      最近更新 更多