【问题标题】:Filling NAs with multiple values in R在 R 中用多个值填充 NA
【发布时间】:2013-07-12 18:16:07
【问题描述】:

我正在使用 R 中的一个数据集,该数据集在我的 vectorFirstOfHCPCS.Code 中缺少观察结果。我想根据另一个向量FirstOfService.Description 中的值对这些 NA/HCPC 代码进行编码。并非每个NA 都将填充相同的值,而是NA 可以编码为6 个可能的值。我尝试运行一个循环来填充 NA,但我认为因为我没有在循环中列出每个 FirstOfService.Description,R 不知道如何处理这些值。这是我的循环代码和产生的错误(根据金丝雀的建议更新):

    for (i in 1:248308){
  if (is.na(Master$FirstOfHCPCS.Code[i])&Master$FirstOfService.Description[i]%in%c("State Mental Retardation Facility - Inpatient (ICF/MR) PT65",
      "Local Psychiatric Hospital/IMD PT68", "Local Psychiatric Hospital - Acute Community PT73","State Psychiatric Hospital - Inpatient PT22"))
{Master$FirstOfHCPCS.Code[i]=2}
  if (is.na(Master$FirstOfHCPCS.Code[i])&Master$FirstOfService.Description[i]%in%c("Inpatient Hospital Ancillary Services - Room and Board",
      "Inpatient Hospital Ancillary Services - Leave of Absence",
      "Inpatient Hospital Ancillary Services - Pharmacy",
      "Inpatient Hospital Ancillary Services - Medical/Surgical Supplies and Devices",
      "Inpatient Hospital Ancillary Services - Laboratory",
      "Inpatient Hospital Ancillary Services -EKG/ECG",
      "Inpatient Hospital Ancillary Services - EEG",
      "Inpatient Hospital Ancillary Services - Psychiatric/Psychological Treatments/Services",
      "Inpatient Hospital Ancillary Services - Other Diagnosis Services",
      "Inpatient Hospital Ancillary Services - Other Therapeutic Services"=="Inpatient Hospital Ancillary Services - Radiology",
      "Inpatient Hospital Ancillary Services - Respiratory Services",
      "Inpatient Hospital Ancillary Services -Physical Therapy",
      "Inpatient Hospital Ancillary Services - Occupational Therapy",
      "Inpatient Hospital Ancillary Services - Speech-Language Pathology",
      "Inpatient Hospital Ancillary Services - Emergency Room",
      "Inpatient Hospital Ancillary Services - Pulmonary Function",
      "Inpatient Hospital Ancillary Services - Audiology",
      "Inpatient Hospital Ancillary Services - Magnetic Resonance Technology (MRT)",
      "Inpatient Hospital Ancillary Services - Pharmacy",
      "Additional Codes-ECT Facility Charge")){Master$FirstOfHCPCS.Code[i]=1}
  if (is.na(Master$FirstOfHCPCS.Code[i])&Master$FirstOfService.Description[i]%in%c("Pharmacy (Drugs and Other Biologicals)")){Master$FirstOfHCPCS.Code[i]=3}
  if (is.na(Master$FirstOfHCPCS.Code[i])&Master$FirstOfService.Description[i]%in%c("Crisis Observation Care")){Master$FirstOfHCPCS.Code[i]=4}
  if (is.na(Master$FirstOfHCPCS.Code[i])&Master$FirstOfService.Description[i]%in%c("Outpatient Partial Hospitalization")){Master$FirstOfHCPCS.Code[i]=5}
  if (is.na(Master$FirstOfHCPCS.Code[i])&Master$FirstOfService.Description[i]%in%c("Other")){Master$FirstOfHCPCS.Code[i]=6}}

Error in if (is.na(Master$FirstOfHCPCS.Code[i]) & Master$FirstOfService.Description[i] %in%  : 
  argument is of length zero

我还运行了sum(is.na(Master$FirstOfHCPCS.Code)),以找出我有多少行与NA,然后用该数字(27186)替换循环代码中的248308,但我仍然得到与上面相同的错误。如何用多个值填充 NA?感谢您的帮助!

每个请求、示例代码和所需的输出 (Desired_FirstOfHCPCS.Code)

   ##Sample Code##

FirstOfService.Description<-c("State Mental Retardation Facility - Inpatient (ICF/MR) PT65","Wraparound", "Inpatient Hospital Ancillary Services - Room and Board",
                              "Pharmacy (Drugs and Other Biologicals)","Local Psychiatric Hospital - Acute Community PT73","State Psychiatric Hospital - Inpatient PT22","Case Management","Crisis Observation Care","Outpatient Partial Hospitalization",
                              "Other")
Desired_FirstOfHCPCS.Code<-c(2, 85, 1, 3, 2, 2, 11, 4, 5, 6)

FirstOfHCPCS.Code<-c(NA, 85, NA, NA, NA, NA, 11, NA, NA, NA)

df<-data.frame(FirstOfService.Description, FirstOfHCPCS.Code)

df

输出:

                                    FirstOfService.Description FirstOfHCPCS.Code
1  State Mental Retardation Facility - Inpatient (ICF/MR) PT65                NA
2                                                   Wraparound                85
3       Inpatient Hospital Ancillary Services - Room and Board                NA
4                       Pharmacy (Drugs and Other Biologicals)                NA
5            Local Psychiatric Hospital - Acute Community PT73                NA
6                  State Psychiatric Hospital - Inpatient PT22                NA
7                                              Case Management                11
8                                      Crisis Observation Care                NA
9                           Outpatient Partial Hospitalization                NA
10                                                       Other                NA

我想要的样子:

#Desired Output
df2<-data.frame(FirstOfService.Description, Desired_FirstOfHCPCS.Code)
df2

                                    FirstOfService.Description Desired_FirstOfHCPCS.Code
1  State Mental Retardation Facility - Inpatient (ICF/MR) PT65                         2
2                                                   Wraparound                        85
3       Inpatient Hospital Ancillary Services - Room and Board                         1
4                       Pharmacy (Drugs and Other Biologicals)                         3
5            Local Psychiatric Hospital - Acute Community PT73                         2
6                  State Psychiatric Hospital - Inpatient PT22                         2
7                                              Case Management                        11
8                                      Crisis Observation Care                         4
9                           Outpatient Partial Hospitalization                         5
10                                                       Other                         6

【问题讨论】:

  • 请注意,您必须使用is.na 函数来比较R 中的NA
  • 我很确定你既不需要for 循环,也不需要你想出的可怕的if 构造。提供a reproducible example 并显示预期的输出,然后有人会向您展示如何以更有效的方式使用更少的代码来完成。你可能也有兴趣阅读?match

标签: r na


【解决方案1】:

首先,拥有一些可重现的代码会很有用,这样我们就知道您正在使用什么(我们不知道您的数据框由什么组成)。

否则,看起来有两个问题。

1) 你不能使用== NA;相反,请使用is.na()

NA == NA
[1] NA
is.na(NA)
[1] TRUE

2) 另一个问题是您使用的是 AND 而不是 OR。在第一个示例中,您的描述不能是“国家精神发育迟滞设施......”和“当地精神病院......”。

请尝试使用%in% 例如,

is.na(Master$FirstOfHCPCS.Code[i]) & 
Master$FirstOfService.Description[i] %in% c("State Mental Retardation Facility - Inpatient (ICF/MR) PT65", "Local Psychiatric Hospital/IMD PT68")

还有很多其他方法可以清理这段代码(for 循环和手动分配在这里非常耗时且容易出错),但有一个开始。

【讨论】:

  • 当我运行这段代码时,我得到了这个错误:Error in if (is.na(Master$FirstOfHCPCS.Code[i]) &amp; Master$FirstOfSerivce.Description[i] %in% : argument is of length zero
  • 我运行了一个循环代码示例,这是我运行的代码:for (i in 1:248308){if(is.na(Master$FirstOfHCPCS.Code[i])&amp;Master$FirstOfSerivce.Description[i]%in%c("State Mental Retardation Facility - Inpatient (ICF/MR) PT65", "Local Psychiatric Hospital/IMD PT68", "Local Psychiatric Hospital - Acute Community PT73","State Psychiatric Hospital - Inpatient PT22")) {Master$FirstOfHCPCS.Code[i]=2}}
  • 它工作正常,只是你在原始代码中拼错了FirstOfSerivce,我错误地复制了它。更正为FirstOfService
  • 糟糕!但即使进行了更改,我仍然收到上述错误。我更新并更正了原始帖子中的代码。
  • 鉴于您的代码示例和更正的脚本(使用df 而不是Master),我没有看到任何错误。故障排除时,将每一块都分解成最小的部分。例如,设置i &lt;- 1,只打印is.na(df$FirstOfHCPCS.Code[i]),然后打印df$FirstOfService.Description[i],然后打印df$FirstOfService.Description[i] %in% c("State Mental Retardation Facility - Inpatient (ICF/MR) PT65", "Local Psychiatric Hospital/IMD PT68")。在尝试运行整段代码之前,确保这些都返回预期的结果,你会发现你的错误。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2021-12-22
  • 2021-08-03
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2023-03-29
相关资源
最近更新 更多