【发布时间】:2018-10-30 08:41:57
【问题描述】:
我正在使用 Kaggles https://www.kaggle.com/c/two-sigma-connect-rental-listing-inquiries/data
json 训练文件来分析特征和数据并应用其他算法来检查我是否可以提高准确性。
比如我有一个专栏:features:
示例:
l <- structure(list(`4` = c("Dining Room", "Pre-War", "Laundry in Building",
"Dishwasher", "Hardwood Floors", "Dogs Allowed", "Cats Allowed"
), `6` = c("Doorman", "Elevator", "Laundry in Building", "Dishwasher",
"Hardwood Floors", "No Fee"), `9` = c("Doorman", "Elevator",
"Laundry in Building", "Laundry in Unit", "Dishwasher", "Hardwood Floors"
), `10` = list(), `15` = c("Doorman", "Elevator", "Fitness Center",
"Laundry in Building")), .Names = c("4", "6", "9", "10", "15"
))
我想构建一个如下所示的数据框:
name nested list
4 <list = list(c("Dining Room", "Pre-War", "Laundry in Building",
"Dishwasher", "Hardwood Floors", "Dogs Allowed", "Cats Allowed"))>
6 <list = list(c("Doorman", "Elevator", "Laundry in Building", "Dishwasher", "Hardwood Floors", "No Fee"))>
9 <list = list(c("Doorman", "Elevator",
"Laundry in Building", "Laundry in Unit", "Dishwasher", "Hardwood Floors"))>
10 <list = list(c())>
15 <list = list(c("Doorman", "Elevator", "Fitness Center",
"Laundry in Building")))>
请告知如何执行此操作。
我有点困惑如何转换它。
我的最终目标是构建一个将所有这些特征联合起来的数据框,并且每个 4、6、10、15 ... 如果它们具有这些特征,它们将有自己的 1 和 0,它们的一个热编码。
请指教。
【问题讨论】:
标签: r json dplyr tidyverse tidyr