【问题标题】:How do I create a data frame into an edgelist with two columns?如何将数据框创建到具有两列的边缘列表中?
【发布时间】:2020-06-11 19:01:36
【问题描述】:

我有一个关于将数据转换为两列以便制作边缘列表的具体问题。我附上了数据的截图。最高可达 V10,每一行代表创作同一首歌曲的艺术家。我想用艺术家的名字创建一个边缘列表。例如,对于包含人员 A、B、C、D 的行,我想创建:

甲乙

A C

A D

B C

B D

C D

我目前使用的代码是:

reltest <- t(do.call(cbind, lapply(cleanartists[sapply(cleanartists, length) >= 2], combn, 2)))

但这为我提供了艺术家姓名之间所有可能的组合,而不仅仅是具有现有关系的那些。这是我的数据的样子:

 > head(cleanartists, n = 20)
                        V1                        V2              V3              V4   V5   V6   V7   V8   V9  V10
1             Bethel Music              Jenn Johnson            <NA>            <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2                Gal Costa            Caetano Veloso            <NA>            <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3                     JAYZ                Kanye West            <NA>            <NA> <NA> <NA> <NA> <NA> <NA> <NA>
4                     2Pac                 Danny Boy            <NA>            <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5                 Ludacris                   Shawnna            <NA>            <NA> <NA> <NA> <NA> <NA> <NA> <NA>
6         Richard Armitage            The Dwarf Cast            <NA>            <NA> <NA> <NA> <NA> <NA> <NA> <NA>
7                 Ludacris                     TPain            <NA>            <NA> <NA> <NA> <NA> <NA> <NA> <NA>
8   The Velvet Underground                  Lou Reed            <NA>            <NA> <NA> <NA> <NA> <NA> <NA> <NA>
9     The Stanley Brothers  The Clinch Mountain Boys            <NA>            <NA> <NA> <NA> <NA> <NA> <NA> <NA>
10      The Carter Sisters           Mother Maybelle            <NA>            <NA> <NA> <NA> <NA> <NA> <NA> <NA>
11               Lady Gaga              Colby ODonis            <NA>            <NA> <NA> <NA> <NA> <NA> <NA> <NA>
12                 Rihanna                      JAYZ            <NA>            <NA> <NA> <NA> <NA> <NA> <NA> <NA>
13              Lil Yachty              Trippie Redd            <NA>            <NA> <NA> <NA> <NA> <NA> <NA> <NA>
14              Brian Tuey            James McCawley  Kevin Sherwood  Treyarch Sound <NA> <NA> <NA> <NA> <NA> <NA>
15   Sister Rosetta Tharpe              The Rosettes            <NA>            <NA> <NA> <NA> <NA> <NA> <NA> <NA>
16             Bing Crosby       The Andrews Sisters            <NA>            <NA> <NA> <NA> <NA> <NA> <NA> <NA>
17            Stone Poneys            Linda Ronstadt            <NA>            <NA> <NA> <NA> <NA> <NA> <NA> <NA>
18                  J Cole                     Drake            <NA>            <NA> <NA> <NA> <NA> <NA> <NA> <NA>
19 The Last Shadow Puppets               Alex Turner      Miles Kane            <NA> <NA> <NA> <NA> <NA> <NA> <NA>
20               Gal Costa            Caetano Veloso            <NA>            <NA> <NA> <NA> <NA> <NA> <NA> <NA>

【问题讨论】:

标签: r dataframe dplyr


【解决方案1】:

保留 rbase 函数,但添加 magrittr (%&gt;%) 以使代码更具可读性,试试这个:

# add the pipe (%>%) operator
library(magrittr)

# tibble just to make an dataset easily
dtf <- tibble::tribble(
  ~V1, ~V2, ~V3, ~V4, ~V5,
  "A", "B", NA, NA, NA,
  "A", "B", "C", NA, NA,
  "D", "E", "F", NA, NA,
  "F", "G", NA, NA, NA
) %>% as_data_frame()


dtf %>% 
  apply(., 1, function(.x){   # for each row in the dataset
    .x[!is.na(.x)] %>%        # as char vector, remove the NA values
      combn(2) %>%            # make combinations of 2 of the elements 
      t() %>%                 # transpose the matrix output of combn
      as.data.frame()         # transform the matrix in a data frame
  }) %>% 
  do.call(rbind, .)           # bind the data dataframes

你会得到:

  V1 V2
1  A  B
2  A  B
3  A  C
4  B  C
5  D  E
6  D  F
7  E  F
8  F  G

与代码相同:

# without '%>%' operator
do.call(rbind,apply(dtf, 1, function(.x){as.data.frame(t(combn(.x[!is.na(.x)],2)))}))

【讨论】:

    【解决方案2】:

    您可以使用apply 将您的函数应用于每一行,然后只取不是NA 的元素。通过here 的方法,您可以消除重复项。

    test_data <- data.frame(V1 = c("A", "D"),
                            V2 = c("B", "B"),
                            V3 = c("C", NA),
                            V4 = c("D", NA),
                            stringsAsFactors = FALSE)
    
    combinations <- t(do.call("cbind", apply(test_data, 1, function(x) combn(x[!is.na(x)], 2))))
    
    library(dplyr)
    combinations_cleaned <- data.frame(combinations, stringsAsFactors = FALSE) %>%
      mutate(key = paste0(pmin(X1, X2), pmax(X1, X2), sep = "")) %>%
      distinct(key, .keep_all = TRUE) %>% 
      select(-key)
    
    combinations_cleaned
      key
    1  AB
    2  AC
    3  AD
    4  BC
    5  BD
    6  CD
    
    

    【讨论】:

      猜你喜欢
      • 2020-07-20
      • 2021-11-09
      • 2018-06-21
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-12-17
      • 2015-05-11
      相关资源
      最近更新 更多