【发布时间】:2021-05-05 20:49:01
【问题描述】:
我已经尝试复制这篇文章中描述的方法(Trying to create a new column using multiple if else statements in R)
我想对患者血液检查的严重程度进行分类。我的目的是为每个患者的血液工作值给一个已经存在的值一个特定的分数(即 0、1、2、3)。之后,我想将这些新值保存到新列中。 截止值是:
if value is >=150000, score = 0
if value is <150000, score = 1
if value is <100000, score = 2
if value is <50000, score = 3
if value is <20000, score = 4
输入是
> dput (platelets_v1)
structure(list(ID = c(13055908, 13059026, 13154920, 13201107,
13207119, 13207948, 13234892, 13261022, 13082943, 13193903, 13259391,
13283776, 13262499, 13154288, 13207315, 13269178, 13135316, 13055690,
13207670, 13220627, 13233898, 13055009, 13044947, 13181075, 13261607,
13186960, 13240091, 13060589, 13201616, 13260671, 13302375, 13021555,
13054278, 13062360, 13035346, 13077712, 13128769, 13267480, 13160156,
13040172, 13160971, 13239318, 12977871, 13090190, 13321288, 13040530,
13100979, 13124511, 13192142, 13289317, 13315577, 13154966, 13044653,
13079694, 13128639, 13165362, 13207352, 13049409, 12999835, 13210994,
13283675, 13223721, 13064865, 13104602, 13036280, 13040507, 12964437,
13029805, 13029001, 12993036, 13072516, 13060586, 13119819, 13040632
), platelets = c("469.000", "NA", "NA", "243.000", "NA", "NA",
"NA", "334.000", "522.000", "NA", "NA", "NA", "NA", "312.000",
"421.000", "NA", "321.000", "NA", "NA", "NA", "298.000", "263.000",
"109.000", "280.000", "NA", "NA", "430.000", "288.000", "159.000",
"528.000", "NA", "163.000", "NA", "439.000", "NA", "477.000",
"NA", "473.000", "NA", "459.000", "183.000", "343.000", "285.000",
"459.000", "253.000", "NA", "227.000", "NA", "569.000", "NA",
"NA", "NA", "239.000", "382.000", "270.000", "NA", "362.000",
"NA", "146.000", "367.000", "NA", "531.000", "NA", "363000",
"NA", "257000", "158000", "56000", "417", "NA", "171000", "NA",
"NA", "NA")), row.names = c(NA, -74L), class = c("tbl_df", "tbl",
"data.frame"))
我尝试了以下方法:
> labels <- c('0', '1', '2','3', '4')
> breaks <- c(500000, 150000, 100000, 50000, 20000)
> teste01 <- platelets_v1 %>% mutate(platelets_v1 = cut(platelets_v1, breaks = breaks, labels = labels, include.lowest = TRUE))
想要的结果:
ID platelets score
13055908 469000 0
13059026 NA NA
13154920 NA NA
13201107 243000 0
等等
任何灯光都将不胜感激。
【问题讨论】:
-
使用
case_when或cut() -
由于输出是数字,
findInterval是最好的选择。 -
数据框中的血小板似乎是字符类型
标签: r dataframe if-statement dplyr