【问题标题】:How to make the rows of a data.frame into individual unique data.frames如何将 data.frame 的行变成单独的唯一 data.frames
【发布时间】:2019-03-01 19:13:24
【问题描述】:

我有一个 data.frame,其中包含一堆具有唯一 ID 的行,后跟一个氨基酸序列。我想知道是否有办法将行拆分为单独的唯一 data.frame。

这是一个例子

bigdf

>ENSCAFP00000018847.4  
FGHFGHFGHFGHFHFGHFGHFGHFGHFHFGHFGHFHFGHFGHFHFHFHFGTNSEGAFQTA
DLLETSVPSHMPLETQTLSPQTFDWTLILANSNSEAETRDTKTTFPAMEGRAFTKMTPSK
>ENSCAFP00000018847.3  
VCXVNSFRERYTRIOUHFSDAADSSAASAAAAGPVVTANHVEEPAMTPGVRTNSEGAFQTA
DLLETSVPSHMPLETQTLSPQTFDWTLILANSNSEAETRDTKTTFPAMEGRAFTKMTPSK
>ENSCAFP00000018847.2  
ASDASDADASDASDASDASDASSADASASRPGPVVTANHVEEPAMTPGVRTNSEGAFQTA
DLLETSVPSHMPLETQTLSPQTFDWTLILANSNSEAETRDTKTTFPAMEGRAFTKMTPSK
>ENSCAFP00000018847.1  
QWEQWEQWEQWEWQREWRQWEQWRQRQQRERPGPVVTANHVEEPAMTPGVRTNSEGAFQTA
DLLETSVPSHMPLETQTLSPQTFDWTLILANSNSEAETRDTKTTFPAMEGRAFTKMTPSK

如果我可以将新 data.frames 的名称作为它们的 ID,那就太好了,希望结果看起来像这样

ENSCAFP00000018847.4

>ENSCAFP00000018847.4  
FGHFGHFGHFGHFHFGHFGHFGHFGHFHFGHFGHFHFGHFGHFHFHFHFGTNSEGAFQTA
DLLETSVPSHMPLETQTLSPQTFDWTLILANSNSEAETRDTKTTFPAMEGRAFTKMTPSK

ENSCAFP00000018847.3

>ENSCAFP00000018847.3  
VCXVNSFRERYTRIOUHFSDAADSSAASAAAAGPVVTANHVEEPAMTPGVRTNSEGAFQTA
DLLETSVPSHMPLETQTLSPQTFDWTLILANSNSEAETRDTKTTFPAMEGRAFTKMTPSK

ENSCAFP00000018847.2

>ENSCAFP00000018847.2  
ASDASDADASDASDASDASDASSADASASRPGPVVTANHVEEPAMTPGVRTNSEGAFQTA
DLLETSVPSHMPLETQTLSPQTFDWTLILANSNSEAETRDTKTTFPAMEGRAFTKMTPSK

ENSCAFP00000018847.1

>ENSCAFP00000018847.1 QWEQWEQWEQWEWQREWRQWEQWRQRQQRERPGPVVTANHVEEPAMTPGVRTNSEGAFQTADLLETSVPSHMPLETQTLSPQTFDWTLILANSNSEAETRDTKTTFPAMEGRAFTKMTPSK

我知道这应该是一件奇怪的事情,但需要对数千个不同的氨基酸序列执行此操作,所以如果我能找到一种方法将它们全部拆分到 R 中会很酷

dput(df[1:3, c(1)])
c("> ENSCAFP00000018847.4 MFFINIISLIIPILLAVAFLTLVERKVLGYMQLRKGPNIVGPYGLLQPIADAVKLFTKEPLRPLTSSMSMFILAPILALSLALTMWIPLPMPYPLINMNLGVLFMLAMSSLAVYSILWSGWASNSKYALIGALRAVAQTISYEVTLAIILLSVLLMNGSFTLSTLIITQEHMWLIFPAWPLAMMWFISTLAETNRAPFDLTEGESELVSGFNVEYAAGPFALFFLAEYANIIMMNILTTILFFGAFHNPFMPELYSINFTMKTLLLTICFLWIRASYPRFRYDQLMHLLWKNFLPLTLALCMWHVALPIITASIPPQT", 
"> ENSCAFP00000018847.3 MKPPILIIIMATIMTGTMIVMLSSHWLLIWIGFEMNMLAIIPILMKKYNPRAMEASTKYFLTQATASMLLMMGVTINLLYSGQWVISKISNPIASIMMTTALTMKLGLSPFHFWVPEVTQGITLMSGMILLTWQKIAPMSILYQISPSINTNLLMLMALTSVLVGGWGGLNQTQLRKIMAYSSIAHMGWMAAIITYNPTMMVLNLTLYILMTLSTFMLFMLNSSTTTLSLSHMWNKFPLITSMILILMLSLGGLPPLSGFIPKWMIIQELTKNNMIIIPTLMAITALLNLYFYLRLTYSTALTMFPSTNNMKMKWQFEYTKKATLLPPLIITSTMLLPLTPMLSVLD", 
"> ENSCAFP00000018847.2 MFINRWLFSTNHKDIGTLYLLFGAWAGMVGTALSLLIRAELGQPGTLLGDDQIYNVIVTAHAFVMIFFMVMPIMIGGFGNWLVPLMIGAPDMAFPRMNNMSFWLLPPSFLLLLASSMVEAGAGTGWTVYPPLAGNLAHAGASVDLTIFSLHLAGVSSILGAINFITTIINMKPPAMSQYQTPLFVWSVLITAVLLLLSLPVLAAGITMLLTDRNLNTTFFDPAGGGDPILYQHLFWFFGHPEVYILILPGFGMISHIVTYYSGKKEPFGYMGMVWAMMSIGFLGFIVWAHHMFTVGMDVDTRAYFTSATMIIAIPTGVKVFSWLATLHGGNIKWSPAMLWALGFIFLFTVGGLTGIVLANSSLDIVLHDTYYVVAHFHYVLSMGAVFAIMGGFAHWFPLFSGYTLNDTWAKIHFTIMFVGVNMTFFPQHFLGLSGMPRRYSDYPDAYTTWNTVSSMGSFISLTAVMLMIFMIWEAFASKREVAMVELTTTNIEWLHGCPPPYHTFEEPTYVIQK"
)

【问题讨论】:

  • bigdf 只有一栏吗? ID后面的两行是连续的字符串还是两个字符串?
  • 没错,它只有一列。它是一个连续的字符串
  • 您是否反对将它们全部保存在一个具有 ID 列和序列列的数据框中?它比加载不同的数据帧要简洁得多。
  • 您可以与dput 共享几个示例行吗? @Lyngbakr 建议的更好方法是,
  • 您的dput 只分享了一栏。是否还有其他列需要我们担心,或者您只想将这一列向量转换为多个数据框?

标签: r


【解决方案1】:

您可以将所有行放在一个命名的数据框列表中,然后使用list2env() 将它们放在全局环境中,如下所示:

dfs <- apply(bigdf, MARGIN = 1, as.data.frame) names(dfs) <- str_sub(bigdf[,1], start = 1, end = 20) list2env(dfs, envir = .GlobalEnv)

【讨论】:

    【解决方案2】:

    您可以跨行使用apply 函数和as.data.frame

    mydfs <- apply(df, 1, as.data.frame)
    

    mydfs 将作为单个数据框的行列表。请注意,他们将被强制执行。

    【讨论】:

      猜你喜欢
      • 2012-09-20
      • 2014-07-28
      • 2017-02-25
      • 2018-04-13
      • 1970-01-01
      • 1970-01-01
      • 2019-12-18
      • 2020-05-23
      • 2023-03-27
      相关资源
      最近更新 更多