【问题标题】:How to merge rows in one column to match non-empty rows in the other column?如何合并一列中的行以匹配另一列中的非空行?
【发布时间】:2017-11-25 11:41:46
【问题描述】:

我有一个包含两列的 .csv 文件。第一个是 ID,第二个是文本字段。但是,文本字段中的文本被拆分为延伸到另一行的句子,因此文件如下所示:

ID TEXT
TXT_1 This is the first sentence
NA This is the second sentence
NA This is the third sentence
TXT_2 This is the first sentence of the second text
NA This is the second sentence of the second text

我想做的是合并文本字段,使其看起来像这样:

ID TEXT
TXT_1 This is the first sentence This is the second sentence This is the third sentence
TXT_2 This is the first sentence of the second text This is the second sentence of the second text

在 R 中有一个简单的解决方案吗?

【问题讨论】:

    标签: r csv text


    【解决方案1】:

    我们根据“ID”和paste“TEXT”中的非NA元素一起创建分组变量

    library(dplyr)
    df1 %>% 
        group_by(Grp = cumsum(!is.na(ID))) %>% 
        summarise(ID = ID[!is.na(ID)], TEXT = paste(TEXT, collapse = ' ')) %>%
        ungroup() %>%
        select(-Grp)  
    # A tibble: 2 x 2
    #     ID                                                                                         TEXT
    #    <chr>                                                                                        <chr>
    #1 TXT_1            This is the first sentence This is the second sentence This is the third sentence
    #2 TXT_2 This is the first sentence of the second text This is the second sentence of the second text
    

    或者按照@Jaap 的建议

    df1 %>% 
       group_by(ID = zoo::na.locf(ID)) %>%
       summarise(TEXT = paste(TEXT, collapse = ' ')) 
    

    【讨论】:

    • 或:df1 %&gt;% group_by(ID = zoo::na.locf(ID)) %&gt;% summarise(TEXT = paste(TEXT, collapse = ' '))
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2022-08-17
    • 1970-01-01
    相关资源
    最近更新 更多