使用缺失值映射 R 中的数据答案

【问题标题】：Mapping Data in R with missing values使用缺失值映射 R 中的数据
【发布时间】：2016-10-20 03:21:11
【问题描述】：

我是 R 的新手，我正在尝试将数据字典定义映射到一组数据以制作更具可读性的文本。

例如，基于目前在 Kaggle 上的 Ames Iowa 住房数据集中的数据字典，我正在尝试绘制房屋分区图。

mapping <- list(
  'A'='Agriculture',
  'C (all)'='Commercial',
  'FV'='Floating Village Residential',
  'I'='Industrial',
  'RH'='Residential High Density',
  'RL'='Residential Low Density',
  'RP'='Residential Low Density Park',
  'RM'='Residential Medium Density'
)

housingData$MSZoning <- as.factor(as.character(mapping[origData$MSZoning]))

然而，原始数据集并不包含所有这些数据点的值。

> table(origData$MSZoning)

C (all)      FV      RH      RL      RM 
     10      65      16    1151     218

与我的代码映射后，键值对不对齐。（例如，农业被映射到“C”。）我相信源数据中的空值会影响我的映射。

> table(housingData$MSZoning, origData$MSZoning)

                               C (all)   FV   RH   RL   RM
  Agriculture                       10    0    0    0    0
  Commercial                         0   65    0    0    0
  Floating Village Residential       0    0   16    0    0
  Industrial                         0    0    0 1151    0
  Residential High Density           0    0    0    0  218

确保这些键和值正确对齐的更合适的方法是什么？

【问题讨论】：

哇，4.5 年了，这是你的第一个问题？这令人印象深刻......说真的，也许recode 会为你工作？此外，虽然它确实有一些优势，但您使用factors 是否有特定原因？
谢谢，r2evans。在您的帮助下，我能够回答这个问题。关于这些因素，我从“优秀、好、一般、差”系列中复制了代码。我同意这种实现可能不是最合适的因素用法。
（除了回答你自己的问题，你应该“接受”它，除非你在等待别人提供答案。）
会的。在这个时间点，它说我不能接受我自己的答案，直到两天过去了，但我一定会回来检查并这样做。

标签： r data-dictionary data-mapping

【解决方案1】：

使用 recode 命令，我能够使这段代码正常工作。

library(car)

housingData$MSZoning <- recode(housingData$MSZoning,
  "'A'='Agriculture';
  'C (all)'='Commercial';
  'FV'='Floating Village Residential';
  'I'='Industrial';
  'RH'='Residential High Density';
  'RL'='Residential Low Density';
  'RP'='Residential Low Density Park';
  'RM'='Residential Medium Density'"
)

现在，运行表格交叉表，我可以正确看到值映射。

> table (housingData$MSZoning, origData$MSZoning)

                               C (all)   FV   RH   RL   RM
  Commercial                        10    0    0    0    0
  Floating Village Residential       0   65    0    0    0
  Residential High Density           0    0   16    0    0
  Residential Low Density            0    0    0 1151    0
  Residential Medium Density         0    0    0    0  218

【讨论】：