【问题标题】:Loading Pandas Dataframe with skipped sentiment加载带有跳过情绪的 Pandas 数据框
【发布时间】:2021-11-24 15:17:12
【问题描述】:

我有这个用于情感分析的数据集,使用以下代码加载数据:

url = 'https://raw.githubusercontent.com/jdvelasq/datalabs/master/datasets/amazon_cells_labelled.tsv'
df = pd.read_csv(url, sep='\t', names=["Sentence", "Feeling"])

问题是 DataFrame 与 NaN 匹配,但它只是整个句子的一部分。

现在的输出是这样的:

sentence                      feeling
I do not like it.             NaN
I give it a bad score.        0

输出应如下所示:

sentence                                    feeling
I do not like it. I give it a bad score     0

你能帮我根据分数连接或加载数据集吗?

【问题讨论】:

    标签: python pandas loading sentiment-analysis


    【解决方案1】:

    groupbyagg 行之前创建虚拟组:

    grp = df['Feeling'].notna().cumsum().shift(fill_value=0)
    out = df.groupby(grp).agg({'Sentence': ' '.join, 'Feeling': 'last'})
    print(out)
    
    # Output:
                                                      Sentence  Feeling
    Feeling                                                            
    0        I try not to adjust the volume setting to avoi...      0.0
    1                              Good case, Excellent value.      1.0
    2        I thought Motorola made reliable products!. Ba...      1.0
    3        When I got this item it was larger than I thou...      0.0
    4                                        The mic is great.      1.0
    ...                                                    ...      ...
    996      But, it was cheap so not worth the expense or ...      0.0
    997      Unfortunately, I needed them soon so i had to ...      0.0
    998      The only thing that disappoint me is the infra...      0.0
    999      No money back on this one. You can not answer ...      0.0
    1000     It's rugged. Well this one is perfect, at the ...      NaN
    
    [1001 rows x 2 columns]
    

    【讨论】:

    • 似乎工作,你有关于最后一行的想法。实际上不是文件的最后一个。
    • 是的,这是最后一个。最后一行以“希望京瓷会更好!”结尾。但没有感觉值。
    • 从第 2914 行(它很坚固。)到文件末尾没有 Feeling
    • 我的意思是,这个:“它很坚固。嗯,这个是完美的......”与 NaN
    猜你喜欢
    • 2019-07-28
    • 1970-01-01
    • 1970-01-01
    • 2019-08-10
    • 2017-10-20
    • 2021-02-06
    • 2018-02-25
    • 2020-11-14
    • 1970-01-01
    相关资源
    最近更新 更多