Separate an element of a data frame and split into two columns in alphabetical manner答案

【问题标题】：Separate an element of a data frame and split into two columns in alphabetical mannerSeparate an element of a data frame and split into two columns in alphabetical manner
【发布时间】：2022-12-02 00:47:44
【问题描述】：

I have this data frame:

> d
      gene_pair
1   ABHD4_ABHD5
2     ABL1_ABL2
3       ABR_BCR
4   ACAP2_ACAP3
5  ACTX_ACTR1B
6 ACVR2A_ACVR2B

This is the dput:

> dput(d)
structure(list(gene_pair = c("ABHD4_ABHD5", "ABL1_ABL2", "ABR_BCR", 
"ACAP2_ACAP3", "ACTX_ACTR1B", "ACVR2A_ACVR2B")), row.names = c(NA, 
6L), class = "data.frame")

I would like to create a new column called sorted gene pair, where I make sure the genes are in alphabetical order.

I have tried:

d %>%
  rowwise() %>% 
  mutate(paste(sort(strsplit(gene_pair, '_')), collapse = '_'))

But I get an atomic error

Expected outcome of the sorted_gene_pair column:

> d
    sorted_gene_pair
1   ABHD4_ABHD5
2     ABL1_ABL2
3       ABR_BCR
4   ACAP2_ACAP3
5  ACTR1B_ACTX
6 ACVR2A_ACVR2B

【问题讨论】：

标签： r tidyverse

【解决方案1】：

You'll need to unlist to use sort (needs an atomic vector, not a list):

library(dplyr)

d |>
  rowwise() |>
  mutate(sorted_gene_pair = paste(sort(unlist(strsplit(gene_pair, '_'))), collapse = '_')) |>
  ungroup()

Output:

# A tibble: 6 × 2
  gene_pair     sorted_gene_pair
  <chr>         <chr>        
1 ABHD4_ABHD5   ABHD4_ABHD5  
2 ABL1_ABL2     ABL1_ABL2    
3 ABR_BCR       ABR_BCR      
4 ACAP2_ACAP3   ACAP2_ACAP3  
5 ACTX_ACTR1B   ACTR1B_ACTX  
6 ACVR2A_ACVR2B ACVR2A_ACVR2B

【讨论】：

__________thanks!