【问题标题】:How to transform values of some columns in place by data.table package in R [duplicate]如何通过R中的data.table包转换某些列的值[重复]
【发布时间】:2021-06-16 22:19:30
【问题描述】:

我想将一些列从“chr”或“num”更改为“factor”,其余列不受影响,这是我的代码:

>library("data.table")
>titanic <- fread("titanic.csv")
>str(titanic)
Classes ‘data.table’ and 'data.frame':  887 obs. of  8 variables:
 $ Survived               : int  0 1 1 1 0 0 0 0 1 1 ...
 $ Pclass                 : int  3 1 3 1 3 3 1 3 3 2 ...
 $ Name                   : chr  "Mr. Owen Harris Braund" "Mrs. John Bradley (Florence Briggs Thayer) Cumings" "Miss. Laina Heikkinen" "Mrs. Jacques Heath (Lily May Peel) Futrelle" ...
 $ Sex                    : chr  "male" "female" "female" "female" ...
 $ Age                    : num  22 38 26 35 35 27 54 2 27 14 ...
 $ Siblings/Spouses Aboard: int  1 1 0 1 0 0 0 3 0 1 ...
 $ Parents/Children Aboard: int  0 0 0 0 0 0 0 1 2 0 ...
 $ Fare                   : num  7.25 71.28 7.92 53.1 8.05 ...
>titanic_tmp <- titanic[, lapply(.SD,function(x) factor(x,levels = unique(x))),.SDcols =c(1,2,4,6,7)]
>titanic <- cbind(titanic_tmp,titanic[,c(3,5,8)]) 

所以上面的代码可以解决我的问题,但是太麻烦了,我知道“:=”操作符可以就地更新data.table的列,请问这里如何使用“:=”来更新NO.1列,2,4,6 和 7?或其他方便或简单的方法?

【问题讨论】:

  • 参见?:= 中的示例,从“## using lapply &amp; .SD”开始。

标签: r data.table


【解决方案1】:

data.table 中使用lapply 或类似方法修改多个列的规范且最有效的方法是通过名称向量,在.SDcols= := 分配的 LHS:

cols <- names(titanic)[c(1,2,4,6,7)]
titanic[, c(cols) := lapply(.SD, factor), .SDcols = cols]
#    Survived Pclass                                    Name    Sex   Age Siblings/Spouses Aboard Parents/Children Aboard  Fare
#      <fctr> <fctr>                                  <char> <fctr> <num>                  <fctr>                  <fctr> <num>
# 1:        0      3                  Mr. Owen Harris Braund   male    22                       1                       0  7.25
# 2:        1      1 Mrs. John Bradley (Florence Briggs T... female    38                       1                       0 71.28
# 3:        1      3                   Miss. Laina Heikkinen female    26                       0                       0  7.92
# 4:        1      1 Mrs. Jacques Heath (Lily May Peel) F... female    35                       1                       0 53.10

## and if you need the columns reordered,
setcolorder(titanic, c(1,2,4,6,7,3,5,8))
titanic
#    Survived Pclass    Sex Siblings/Spouses Aboard Parents/Children Aboard                                    Name   Age  Fare
#      <fctr> <fctr> <fctr>                  <fctr>                  <fctr>                                  <char> <num> <num>
# 1:        0      3   male                       1                       0                  Mr. Owen Harris Braund    22  7.25
# 2:        1      1 female                       1                       0 Mrs. John Bradley (Florence Briggs T...    38 71.28
# 3:        1      3 female                       0                       0                   Miss. Laina Heikkinen    26  7.92
# 4:        1      1 female                       1                       0 Mrs. Jacques Heath (Lily May Peel) F...    35 53.10

仅供参考,我将 lapply(.SD, function(x) factor(x, levels=unique(x)) 缩短为 lapply(.SD, factor),因为默认行为是将级别设置为找到的唯一值。如果您愿意,可以恢复为更长的 lapply 格式。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2014-01-30
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-08-06
    相关资源
    最近更新 更多