【问题标题】:Keep one column after cbind two dataframe在 cbind 两个数据框后保留一列
【发布时间】:2021-04-03 14:50:13
【问题描述】:

这是我的两个数据框

dput(head(C1_com))
structure(list(Term = c("GO:0030198", "GO:0043062", "GO:0001944", 
"GO:0072358", "GO:0001568", "GO:0048514"), LogP = c(-17.4296193682, 
-16.3090192653, -17.0759726333, -17.0759726333, -15.9170353092, 
-14.7864136301)), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))
> dput(head(C2_com))
structure(list(Term = c("GO:0030198", "GO:0043062", "GO:0030335", 
"GO:0040017", "GO:0051272", "GO:2000147"), LogP = c(-11.3445846204, 
-10.5074739613, -10.1220888832, -9.9838733854, -9.5214690772, 
-9.3731567195)), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

我想在cbind 之后只保留一个公共列,这给了我这个

 head(C1_C2)
        Term      LogP       Term       LogP
1 GO:0030198 -17.42962 GO:0030198 -11.344585
2 GO:0043062 -16.30902 GO:0043062 -10.507474
3 GO:0001944 -17.07597 GO:0030335 -10.122089
4 GO:0072358 -17.07597 GO:0040017  -9.983873
5 GO:0001568 -15.91704 GO:0051272  -9.521469
6 GO:0048514 -14.78641 GO:2000147  -9.373157

我想只保留常用术语的一列。我可以这样做

在删除其中一个术语列的 cbind 之后只想保留第一个“术语”列,但这是一个漫长的过程。有什么我可以与 cbind 一起使用并且只保留一列“术语”。

更新

我的两个起始数据框都有相同的列名。有没有办法我可以在执行cbind 时标记列,前两个来自C1_com,第 3,4 个来自C2_com?要知道

这里是我的最终输出

dput(head(C1_C2))
structure(list(Term = c("GO:0042330", "GO:0006935", "GO:0098609", 
"GO:0001655", "GO:0072001", "GO:0001822"), LogP = c(-15.5665740868, 
-15.3333915705, -15.1730394873, -14.2710870407, -13.0316539848, 
-11.7720012424), Term = c("GO:0006935", "GO:0042330", "GO:0098609", 
"GO:0030155", "GO:0045785", "GO:0048589"), LogP = c(-9.1846695955, 
-9.0333614068, -8.2012718158, -6.9630841551, -3.1110110087, -5.6023202524
), Term = c("GO:0098609", "GO:0030155", "GO:0045785", "GO:0002009", 
"GO:0048729", "GO:0060562"), LogP = c(-8.400270409, -5.1046710312, 
-2.2877603428, -5.0328708902, -4.8403582471, -3.367532764), Term = c("GO:0048589", 
"GO:0042330", "GO:0006935", "GO:0048729", "GO:0001655", "GO:0002009"
), LogP = c(-12.0251459649, -7.4342736812, -7.2221883529, -11.3806941521, 
-10.2926537215, -9.6593776685), Term = c("GO:0006935", "GO:0042330", 
"GO:0048729", "GO:0002009", "GO:0060562", "GO:0072073"), LogP = c(-7.1913732375, 
-7.1140368886, -7.668196714, -4.6060571139, -3.1414409878, -2.5797852608
), Term = c("GO:0006935", "GO:0042330", "GO:0098609", "GO:0030155", 
"GO:0045785", "GO:0048589"), LogP = c(-10.6304171879, -10.5285058082, 
-8.2142677691, -7.8757600983, -6.1772502878, -7.4503144922)), row.names = c(NA, 
6L), class = "data.frame")

我想只保留第一个词列

head(C1_C2)
        Term      LogP       Term      LogP       Term      LogP       Term       LogP       Term      LogP       Term
1 GO:0042330 -15.56657 GO:0006935 -9.184670 GO:0098609 -8.400270 GO:0048589 -12.025146 GO:0006935 -7.191373 GO:0006935
2 GO:0006935 -15.33339 GO:0042330 -9.033361 GO:0030155 -5.104671 GO:0042330  -7.434274 GO:0042330 -7.114037 GO:0042330
3 GO:0098609 -15.17304 GO:0098609 -8.201272 GO:0045785 -2.287760 GO:0006935  -7.222188 GO:0048729 -7.668197 GO:0098609
4 GO:0001655 -14.27109 GO:0030155 -6.963084 GO:0002009 -5.032871 GO:0048729 -11.380694 GO:0002009 -4.606057 GO:0030155
5 GO:0072001 -13.03165 GO:0045785 -3.111011 GO:0048729 -4.840358 GO:0001655 -10.292654 GO:0060562 -3.141441 GO:0045785
6 GO:0001822 -11.77200 GO:0048589 -5.602320 GO:0060562 -3.367533 GO:0002009  -9.659378 GO:0072073 -2.579785 GO:0048589
        LogP
1 -10.630417
2 -10.528506
3  -8.214268
4  -7.875760
5  -6.177250
6  -7.450314

并删除其余的术语列。因为它们都是相同的,但具有不同的 p 值,这是不同比较的结果。所以我的目标是查看每个术语的富集度如何变化,在这种情况下以 pvalues 的形式报告。

【问题讨论】:

  • Term 列在每个列中都有不同的值。你想保留哪一个?
  • 是的,我后来发现,我所做的是我采取了交叉点并保留了两者共有的那些行,然后我必须保留两个 pvalue 列

标签: r dataframe cbind


【解决方案1】:

如果您使用left_join,那么您将只保留术语列的一份副本,即new_df <- left_join(C1_com, C2_com, by = "Term")。这是你想要的?当然,如果术语列实际上并不相同,您会得到一些奇怪的结果。

【讨论】:

  • “当然,如果术语列实际上并不相同,您会得到一些奇怪的结果。”我的目标是保持术语列相同,并查看每个术语的 pvalue 有什么区别,即数字列
猜你喜欢
  • 1970-01-01
  • 2017-12-24
  • 1970-01-01
  • 2021-09-14
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2011-12-19
相关资源
最近更新 更多