【问题标题】:Remove row if it starts with "RT"删除以“RT”开头的行
【发布时间】:2014-05-27 12:26:11
【问题描述】:

如何删除第一列以“RT”开头的整行?

    structure(list(text = structure(c(4L, 6L, 1L, 2L, 5L, 3L), .Label = c("@AirAsia @AirAsiaId finally they let us fly with 9.20 flight today. Manual boarding pass. Phew, that was a great relief!", 
    "@AirAsia your direct debit (Maybank) payment gateways is not working. Is it something you are working to fix?", 
    "RT @AirAsia: Kindly note that CIMB Direct Debit service will be unavailable tonight from (GMT+8) 1145hrs on 31 Jan until 0600hrs on 3 Feb 2…", 
    "RT @AirAsia: Skipped breakfast this morning? Now you can enjoy a great breakfast onboard with our new breakfast meals! http://t.co/957ZaLjY…", 
    "xdek ke flight @AirAsia Malaysia to LA... hahah..bagi la promo murah2 sikit, kompom aku beli...", 
    "You know there is a problem when customer service asks you to wait for 103 minutes and your no is 42 in the queue. @AirAsia"
    ), class = "factor"), created = structure(c(5L, 4L, 4L, 3L, 2L, 
    1L), .Label = c("1/2/2014 16:14", "1/2/2014 17:00", "3/2/2014 0:54", 
    "3/2/2014 0:58", "3/2/2014 1:28"), class = "factor")), .Names = c("text", 
    "created"), class = "data.frame", row.names = c(NA, -6L))

【问题讨论】:

标签: regex r substring


【解决方案1】:

假设你的数据框叫tweets,那么

no.rts <- tweets[grep("^RT ", tweets$text, invert=TRUE),]

会做你想做的事(并将结果放在一个名为no.rts的新数据框中)。

grep 语句表示忽略 tweets$text 中所有以 RT 开头的行 (^)。如果没有invert=TRUE,它将选择所有以RT 开头的行。

【讨论】:

  • 嗨,如果我想删除 tweets$text ="NA" 所在的行怎么办?
  • no.nas &lt;- tweets[!is.na(tweets$text),] 会给你这个
【解决方案2】:

grepl 也可以。假设d是数据集,

> d[!grepl("^RT", d$text), ]
##                        text        created
## 2 You know there...@AirAsia  3/2/2014 0:58
## 3 @AirAsia... great relief!  3/2/2014 0:58
## 4 @AirAsia...orking to fix?  3/2/2014 0:54
## 5 xdek ke flight ...        1/2/2014 17:00

【讨论】:

    【解决方案3】:

    或者使用stringi包中的stri_sub函数获取前两个字符,然后检查它们是否等于“RT”:

    require(stringi)
    df[stri_sub(df$text,1,2)!="RT",]
    

    【讨论】:

    • 签出base::substr
    • stri_sub 更好,因为您可以使用负值从字符串末尾开始计数。所以要获取最后两个字符 stri_sub("abc123",-2,-1).
    • 嗨,如果我想删除 df$text ="NA" 所在的行怎么办?
    【解决方案4】:

    以上所有工作,我更喜欢子集,因为它更易读:

    no.rts <- subset( tweets, ! grepl("^RT ", text) )
    

    【讨论】:

    • 它还会删除is.na(text)所在的行。
    猜你喜欢
    • 1970-01-01
    • 2017-10-04
    • 2015-01-18
    • 2019-07-26
    • 1970-01-01
    • 1970-01-01
    • 2020-10-31
    • 2014-11-30
    • 2019-10-16
    相关资源
    最近更新 更多