【问题标题】:R function to filter a vector of sentences with a vector of wordsR函数用单词向量过滤句子向量
【发布时间】:2020-12-12 16:17:59
【问题描述】:

我一直在尝试从向量中提取句子。附上格式正确的图片。提前谢谢!

例如:

Vector1    Vector2
One        One day, it was sunny| There was no rain| There was One dollar on the floor
Two        Two day, it was rainy| There was no sun
Three      There was Three dollars on the floor| It was wet| Three of ants on floor|

答案:

Key        Sentence1                              Sentence2                           Sentence3
One        One day, it was sunny                  There was One dollar on the floor
Two        Two day, it was rainy
Three      There was Three dollars on the floor   Three of ants on floor

【问题讨论】:

    标签: r vector


    【解决方案1】:

    您可以通过将Vector2 拆分为"|" 来获取长格式数据,只保留其中包含Vector1 的行并获取宽格式数据。

    library(dplyr)
    library(tidyr)
    
    df %>%
      separate_rows(Vector2, sep = '\\|\\s*') %>%
      filter(stringr::str_detect(Vector2, paste0('\\b', Vector1, '\\b'))) %>%
      group_by(Vector1) %>%
      mutate(col = paste0('Sentence', row_number())) %>%
      pivot_wider(names_from = col, values_from = Vector2)
    
    # Vector1 Sentence1                            Sentence2                        
    #  <chr>   <chr>                                <chr>                            
    #1 One     One day, it was sunny                There was One dollar on the floor
    #2 Two     Two day, it was rainy                NA                               
    #3 Three   There was Three dollars on the floor Three of ants on floor       
    

    数据

    df <- structure(list(Vector1 = c("One", "Two", "Three"), 
    Vector2 = c("One day, it was sunny| There was no rain| There was One dollar on the floor",
    "Two day, it was rainy| There was no sun", 
    "There was Three dollars on the floor| It was wet| Three of ants on floor"
    )), class = "data.frame", row.names = c(NA, -3L))
    

    【讨论】:

    • 您好 Ronak,感谢您的回答,尝试过,但它没有拆分 Vector2。根据 |
    • 那么它会返回什么?有什么错误吗?您是否将输出分配给新对象 df1 &lt;- df %&gt;% separate_rows....rest of the code... ?如果它仍然不起作用,您能否使用dput(head(df)) 提供数据集的可重现示例,其中df 是您的数据框。
    • @Volatile 你检查了吗?它奏效了吗?如果没有,请使用dput 为前几行添加数据,如上所示。我相信这是一个很小的变化。
    猜你喜欢
    • 1970-01-01
    • 2012-03-05
    • 2021-07-27
    • 2016-08-12
    • 2021-03-16
    • 2021-02-20
    • 2020-08-22
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多