【发布时间】:2021-08-02 02:11:18
【问题描述】:
我正在从 Reddit 抓取 cmets 并尝试删除空行/cmets。
许多行显示为空,但我似乎无法删除它们。当我使用 is_empty 时,它们不会显示为空。
> Reddit[25,]
[1] ""
> is_empty(Reddit$text[25])
[1] FALSE
> Reddit <- subset(Reddit, text != "")
> Reddit[25,]
[1] ""
我错过了什么吗?我尝试了其他几种方法来删除这些行,但它们也没有奏效。
编辑: 在回答 cmets 时包含 dput 示例:
RedditSample <- data.frame(text=
c("I liked coinbase, used it before. But the fees are simply too much. If they were to take 1% instead 2.5% I would understand. It's much simpler and long term it doesn't matter as much.",
"But Binance only charges 0.1% so making the switch is worth it fairly quickly. They also have many more coins. Approval process took me less than 10 minutes, but always depends on how many register at the same time.",
"", "Here's a 10%/10% referal code if you chose to register: KHELMJ94",
"What is a spot wallet?"))
【问题讨论】:
-
您能否更新您的示例数据以包含相关行?您可以使用
dput(Reddit[23:27,])获取具有相关值的数据的可重现副本。 -
试试
subset(Reddit, nchar(text) != 0)。rlang::is_empty("")返回FALSE但rlang::is_empty(character(0))返回TRUE。 -
@Lief Esbenshade 我已经尝试按照您的建议添加。如果我误解了,请道歉。
标签: r web-scraping data-cleaning is-empty