使用条件对多行进行子集答案

【问题标题】：Subset multiple rows with condition使用条件对多行进行子集
【发布时间】：2015-01-05 16:14:54
【问题描述】：

我有一个.txt 文件读入一个名为power 的table，其中对9 个变量进行了超过200 万次观察。我试图将power 子集化为包含“01/02/2007”或“02/02/2007”的两行。创建子集后，RStudio 环境说我最终得到了零观察值，但变量相同。

如何获取仅包含“01/02/2007”和“02/02/2007”行的数据子集？

我看到了类似的帖子，但我的数据集仍然有错误。见链接：Select multiple rows conditioning on ID in R

我的数据：

#load data
> power <- read.table("textfile.txt", stringsAsFactors = FALSE, head = TRUE)
#subsetted first column called Date
> head(power$Date)
#[1] 16/12/2006 16/12/2006 16/12/2006 16/12/2006 16/12/2006 16/12/2006

> str(power$Date)
 chr [1:2075259] "16/12/2006" "16/12/2006" "16/12/2006" "16/12/2006" ...

我的代码：

> subpower <- subset(power, Date %in% c("01/02/2007", "02/02/2007"))

子集数据：

> str(powersub$Date)
 chr(0)

【问题讨论】：

除了这个子集问题，我建议你在继续分析之前将你的字符日期转换为真实的日期格式。
另外，16/12/2006 看起来像 dd/mm/yyyy 格式，虽然模棱两可，但您的子集标准可能是 mm/dd/yyyy 格式。

标签： r subset

【解决方案1】：

试试：

> subpower = power[power$Date %in% c("01/02/2007", "02/02/2007") ,]
> subpower
        Date Val
1 01/02/2007  14
8 02/02/2007  28

（使用来自@akrun 答案的功率数据）

此外，如果您使用子集的正确名称：“subpower”而不是“powersub”，您自己的代码将起作用！

> subpower <- subset(power, Date %in% c("01/02/2007", "02/02/2007"))
> subpower
        Date Val
1 01/02/2007  14
8 02/02/2007  28
>
> str(subpower)
'data.frame':   2 obs. of  2 variables:
 $ Date: chr  "01/02/2007" "02/02/2007"
 $ Val : int  14 28

【讨论】：

【解决方案2】：

我猜你的数据集可能有trailing/leading 列的空格，因为

subset(power, Date %in% c("01/02/2007", "02/02/2007"))
#       Date Val
#1 01/02/2007  14
#8 02/02/2007  28

如果我将行更改为

power$Date[1] <- '01/02/2007 '
power$Date[8] <- ' 02/02/2007'

subset(power, Date %in% c("01/02/2007", "02/02/2007"))
#[1] Date Val 
<0 rows> (or 0-length row.names)

您可以使用stringr 中的str_trim

library(stringr)
subset(power, str_trim(Date) %in% c('01/02/2007', '02/02/2007'))
#         Date Val
#1 01/02/2007   14
#8  02/02/2007  28

或使用gsub

subset(power, gsub("^ +| +$", "", Date) %in% c('01/02/2007', '02/02/2007'))
#         Date Val
#1 01/02/2007   14
#8  02/02/2007  28

或者不删除空格的另一种选择是使用grep

subset(power, grepl('01/02/2007|02/02/2007', Date))
#         Date Val
#1 01/02/2007   14
#8  02/02/2007  28

数据

power <- structure(list(Date = c("01/02/2007", "16/12/2006", "16/12/2006", 
"16/12/2006", "16/12/2006", "16/12/2006", "16/12/2006", "02/02/2007"
), Val = c(14L, 24L, 23L, 22L, 23L, 25L, 23L, 28L)), .Names = c("Date", 
"Val"), class = "data.frame", row.names = c(NA, -8L))

【讨论】：

或者也许他们可以在 read.table 中使用 strip.white（希望我没记错）参数？
@beginneR 是的，这也是可能的。但是，在最近的另一篇文章中，OP 使用了所有这些东西，并且由于一些格式问题仍然得到这些空间。
啊，我不知道。谢谢阿克伦。

【解决方案3】：

你的方法是正确的，尝试用

读取文本文件

power <- read.table("textfile.txt", stringsAsFactors = FALSE)

【讨论】：

我进行了将变量转换为字符的编辑，但结果相同。 RStudio 上的环境说 9 bs。存在 9 个变量。子集删除了所有观察结果。
嗯，我会建议来自爵士乐的链接