在data.table r中按名称选择不连续的列答案

【问题标题】：Selecting non-consecutive columns by name in data.table r在data.table r中按名称选择不连续的列
【发布时间】：2020-08-23 11:51:54
【问题描述】：

我的磁盘上有数据库，我想偶尔使用: 在data.table 中使用列名来选择多个列。

以前的答案仅包括使用索引进行列选择，这对我的情况来说是不可取的。

示例如下：

library(gapminder)
data(gapminder)
setDT(gapminder)

names(gapminder) # [1] "country"   "continent" "year"      "lifeExp"   "pop"       "gdpPercap"

# I would like to select columns from `country` to `year` and pop

gapminder[,country:year] # this one works



gapminder[,country:year + pop] # doesn't work
gapminder[,c(country:year,pop)] # doesn't work either

gapminder[,.SD, .SDcols = c(country:year,pop)] # doesn't work

我在这方面摸不着头脑。我会很感激任何建议。

【问题讨论】：

标签： r data.table

【解决方案1】：

更新

咨询了data.table FAQs，具体这部分

1.2 为什么 DT[,"region"] 返回一个 1 列的 data.table 而不是一个向量？请参阅上面的答案。改用 DT$region。或 DT[["region"]]。

1.3 为什么 DT[, region] 返回“region”列的向量？我想要一个 1 列的 data.table。尝试 DT[ , .(region)] 代替。 .() 是 list() 的别名，并确保返回 data.table。

我意识到还有一个更简单的解决方案。

为了使用保留列名的 cbind，您需要传递两个数据表。列命名为 V4 的问题是因为您将向量传递给 cbind。

但您可以控制 data.table 是返回向量还是 1 列 data.table。以下是您的情况：

newest_gapminder2 <- cbind(gapminder[, country:year], gapminder[, 'pop'])

或

newest_gapminder3 <- cbind(gapminder[, country:year], gapminder[, .(pop)])

原始回复

我找到了你的问题，因为我有同样的问题！我想创建一个数据表的子集，而不需要列出每一列。

我尝试了一些不同的东西，发现了一些我可以忍受的东西......

## create a data table for this example
dt <- data.table("col1"=1:5, "col2"=2:6, "col3"=letters[2:6], "col4"=letters[1:5], "col5"=3:7)
dim(dt)
dt

## the goal is to create a subset of this data frame that contains col1, col3, col4, and col5

抱歉，我没有使用您的数据。不过这应该是一样的。

# method 1
## subset out a vector and give the column name
col1 <- dt[, col1]

## use cbind on the object and the data table subset
## the object name takes the place of the column name in the table
new_dt <- cbind(col1, dt[, col3:col5])

## check that the result is a data.table
class(new_dt)
dim(new_dt)
new_dt

效果还可以，但感觉有点 hack-y。我在尝试这样的事情时遇到的问题：

dt_alt <- cbind(dt[, col1], dt[, col3:col5])

但是dt[ , col1] 创建了一个向量而不是数据表，并且当它在 cbind 中被强制转换时，结果名称为V1。所以我想，也许避免绑定单个列会更容易，然后在事后删除不需要的列。

# method 2
## take two different subsets/slices and cbind them 
new_dt2 <- cbind(dt[, col1:col2], dt[, col3:col5])

## take out col2
new_dt2[, col2 := NULL]

class(new_dt2)
dim(new_dt2)
new_dt2

这稍微好一点，但后来我想知道一些更精简的东西。我考虑过在 data.table 中进行链接，并希望将其与方法 2 结合使用。我想感谢 this post 的 := NULL 技术。

# method 3
## thinking about how data.table works, can the := NULL be chained? 
## spoiler: it can!
## this feels like kind of a hack but...
new_dt3 <-cbind(dt[,col1:col2][, col2:=NULL], dt[,col3:col5])

class(new_dt3)
dim(new_dt3)
new_dt3

好吧，毕竟，我觉得我没有使用你问题中的 gapminder 数据，所以这是我的方法 #3 应用于你的数据：

gapminder <- cbind(gapminder[, country:year]), gapminder[, pop:gdpPercap][, gdpPercap := NULL])

我使用here 描述的技术对其进行计时。

   user  system elapsed 
   0.00    0.00    0.02

这三种技术都具有可比性。不过，我不确定这将如何在多 GB 数据集上执行。

【讨论】：

【解决方案2】：

另一种选择：

gapminder[, c(.SD, .(pop=pop)), .SDcols=country:year]

或者如果你有更多的列，

cols <- setNames(c("pop", "lifeExp"), c("pop", "lifeExp"))
gapminder[, c(.SD, mget(cols)), .SDcols=country:year]

输出：

          country continent year      pop lifeExp
   1: Afghanistan      Asia 1952  8425333  28.801
   2: Afghanistan      Asia 1957  9240934  30.332
   3: Afghanistan      Asia 1962 10267083  31.997
   4: Afghanistan      Asia 1967 11537966  34.020
   5: Afghanistan      Asia 1972 13079460  36.088
  ---                                            
1700:    Zimbabwe    Africa 1987  9216418  62.351
1701:    Zimbabwe    Africa 1992 10704340  60.377
1702:    Zimbabwe    Africa 1997 11404948  46.809
1703:    Zimbabwe    Africa 2002 11926563  39.989
1704:    Zimbabwe    Africa 2007 12311143  43.487

【讨论】：

此方法添加任意名称，V4 for pop。使用这种方法让我犹豫不决
@MatthewSon，已在更新中解决了您的问题
快速修复看起来相当简洁，但是有什么办法可以在 data.table 中使用多个冒号:？我的意思是，比如选择 (a:c) 和 (e:g)。
问题不在[ 内，但您可以编写一个函数在传递到.SDcols 之前获取列

【解决方案3】：

我不确定在data.table 中是否真的有一个简单的解决方案，但也许您可以cbind 具有单个列名的列范围。

library(data.table)
cbind(gapminder[,country:year], gapminder[, 'pop'])

但是，dplyr 可以实现您想要的行为。

library(dplyr)
gapminder %>% select(country:year, pop)


#       country continent year      pop
#1: Afghanistan      Asia 1952  8425333
#2: Afghanistan      Asia 1957  9240934
#3: Afghanistan      Asia 1962 10267083
#4: Afghanistan      Asia 1967 11537966
#5: Afghanistan      Asia 1972 13079460
#6: Afghanistan      Asia 1977 14880372

【讨论】：