【问题标题】:How to extract the list names and values to a dataframe如何将列表名称和值提取到数据框中
【发布时间】:2018-10-30 08:41:57
【问题描述】:

我正在使用 Kaggles https://www.kaggle.com/c/two-sigma-connect-rental-listing-inquiries/data

json 训练文件来分析特征和数据并应用其他算法来检查我是否可以提高准确性。

比如我有一个专栏:features:

示例:

    l <- structure(list(`4` = c("Dining Room", "Pre-War", "Laundry in Building", 
"Dishwasher", "Hardwood Floors", "Dogs Allowed", "Cats Allowed"
), `6` = c("Doorman", "Elevator", "Laundry in Building", "Dishwasher", 
"Hardwood Floors", "No Fee"), `9` = c("Doorman", "Elevator", 
"Laundry in Building", "Laundry in Unit", "Dishwasher", "Hardwood Floors"
), `10` = list(), `15` = c("Doorman", "Elevator", "Fitness Center", 
"Laundry in Building")), .Names = c("4", "6", "9", "10", "15"
))

我想构建一个如下所示的数据框:

name     nested list
4        <list = list(c("Dining Room", "Pre-War", "Laundry in Building", 
"Dishwasher", "Hardwood Floors", "Dogs Allowed", "Cats Allowed"))>
6        <list = list(c("Doorman", "Elevator", "Laundry in Building", "Dishwasher", "Hardwood Floors", "No Fee"))>
9        <list = list(c("Doorman", "Elevator", 
"Laundry in Building", "Laundry in Unit", "Dishwasher", "Hardwood Floors"))>  
10       <list = list(c())>
15       <list = list(c("Doorman", "Elevator", "Fitness Center", 
"Laundry in Building")))>

请告知如何执行此操作。

我有点困惑如何转换它。

我的最终目标是构建一个将所有这些特征联合起来的数据框,并且每个 4、6、10、15 ... 如果它们具有这些特征,它们将有自己的 1 和 0,它们的一个热编码。

请指教。

【问题讨论】:

    标签: r json dplyr tidyverse tidyr


    【解决方案1】:

    一种方法是使用data.table::rbindlist() 函数,该函数的参数为​​fill = TRUE。这允许您绑定具有不同列数的数据框。但是,在您的情况下,诀窍是让空数据框也出现在那里。为了实现这一点,我们添加了一个 if 语句,它为空列表元素创建一个 NA 数据框,即

    library(data.table)
    rbindlist(lapply(l, function(i) {d <- as.data.frame(t(i)); 
                                    if(!ncol(d)){d <- data.frame(V1 = NA)}; d}), fill = TRUE)
    

    给出,

                V1       V2                  V3                  V4              V5              V6           V7 
    1: Dining Room  Pre-War Laundry in Building          Dishwasher Hardwood Floors    Dogs Allowed Cats Allowed 
    2:     Doorman Elevator Laundry in Building          Dishwasher Hardwood Floors          No Fee         <NA> 
    3:     Doorman Elevator Laundry in Building     Laundry in Unit      Dishwasher Hardwood Floors         <NA> 
    4:        <NA>     <NA>                <NA>                <NA>            <NA>            <NA>         <NA> 
    5:     Doorman Elevator      Fitness Center Laundry in Building            <NA>            <NA>         <NA> 
    

    【讨论】:

    • 好的,非常感谢,但是我怎样才能把它变成虚拟的呢?
    • 上面的预期输出是什么?仅供参考,看看model.matrix()library(dummies)
    • 我希望与其他列 V1...V7 一起获得 0 和 1,如果它们确实具有或不具有所有功能的联合。
    猜你喜欢
    • 2019-10-31
    • 2021-10-24
    • 2022-01-01
    • 2021-09-14
    • 2021-08-27
    • 1970-01-01
    • 1970-01-01
    • 2021-02-15
    • 1970-01-01
    相关资源
    最近更新 更多