【问题标题】:getting the last 10 words from a string, applied on a vector of strings从字符串中获取最后 10 个单词,应用于字符串向量
【发布时间】:2022-01-21 06:28:39
【问题描述】:

我在数据框 (df1$text) 中有一个文本向量,我正在尝试使用文本的最后 10 个单词 (df1$last.ten) 创建一个新向量。我尝试了以下方法但没有成功:

df1$last.ten = mapply(function(x,y) paste(word(x,y), collapse=" "), df1$text, -1:-10)

但是我得到的只是一个词而不是一串十个词:

> df1$last.ten[1]
[1] "end."

当我给它一个字符串时它工作得很好,所以我似乎错误地使用了mapply

我已尝试为此使用gsub,但无法弄清楚语法。将不胜感激word()gsub() 解决方案。

【问题讨论】:

    标签: r string dataframe gsub mapply


    【解决方案1】:

    如果这是你的数据框(玩具数据)

    df1
                                                                text
    1 one two three four five six seven eight nine ten eleven twelve
    2 one two three four five six seven eight nine ten eleven twelve
    3 one two three four five six seven eight nine ten eleven twelve
    

    然后像这样提取最后10个单词

    rnge <- 10:1
    
    df1$last.ten <- apply( t(apply( as.data.frame(df1$text), 1, function(x)
      rev( unlist( strsplit(x, " ") ) ) )[rnge,]), 1, paste, collapse=" " )
    
    df1
                                                                text
    1 one two three four five six seven eight nine ten eleven twelve
    2 one two three four five six seven eight nine ten eleven twelve
    3 one two three four five six seven eight nine ten eleven twelve
                                                    last.ten
    1 three four five six seven eight nine ten eleven twelve
    2 three four five six seven eight nine ten eleven twelve
    3 three four five six seven eight nine ten eleven twelve
    

    如果您调整范围rnge,这将从任何地方提取数据

    rnge <- 5:3
    
    df1$mid <- apply( t(apply( as.data.frame(df1$text), 1, function(x)
      rev( unlist( strsplit(x, " ") ) ) )[rnge,]), 1, paste, collapse=" " )
    
    df1
                                                                text
    1 one two three four five six seven eight nine ten eleven twelve
    2 one two three four five six seven eight nine ten eleven twelve
    3 one two three four five six seven eight nine ten eleven twelve
                                                    last.ten            mid
    1 three four five six seven eight nine ten eleven twelve eight nine ten
    2 three four five six seven eight nine ten eleven twelve eight nine ten
    3 three four five six seven eight nine ten eleven twelve eight nine ten
    

    【讨论】:

      【解决方案2】:

      我制作了一些示例数据。也许你不需要使用 apply 函数。

      
      df1 <- data.frame(text = c("one two three four five six seven eight nine ten eleven","one two three four five six seven eight nine ten eleven twelve"))
      
      
      df1$last.ten <- word(df1[[1]], str_count(df1[[1]], '\\w+') - 9, str_count(df1[[1]], '\\w+'))
      
      

      【讨论】:

        【解决方案3】:

        这是一个基本的 R 选项 -

        #example data
        df1 <- data.frame(text = c('This is a long text which consists of words more than 10', 
                                   'This is another one which is similar to first one but even longer'))
        
        #split string on space for every word and paste the last 10 words in one string
        df1$last.ten <- sapply(strsplit(df1$text, '\\s+'), function(x) 
                               paste0(tail(x, 10), collapse = ' '))
        df1
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2013-09-07
          • 2020-02-14
          • 1970-01-01
          • 1970-01-01
          • 2015-05-21
          • 2013-03-17
          相关资源
          最近更新 更多