【发布时间】:2020-12-16 17:45:40
【问题描述】:
我有一个包含目标预测信息的 txt 文件,我想将其解析为 R 中的数据框。文件中的信息已经采用最简单的方式。每行将成为未来数据框中的一行,只有 4 列,应该如下所示:
MicroRNA Transcript Type Energy
miR-981|LQNS02278082.1_33127_3p TRINITY_GG_20135_c0_g1_i5.mrna1 7_A1 -0.70
但是,我在 R 中所做的事情不起作用。
a <- read_lines("results")
> head(a)
[1] "MicroRNA = miR-981|LQNS02278082.1_33127_3p\t\tTranscript = TRINITY_GG_20135_c0_g1_i5.mrna1 Dir=antisense TAG=Neuronal acetylcholine receptor subunit alpha-9\t\tType = 7_A1\t\tEnergy = -0.70 Kcal/mol"
[2] "MicroRNA = miR-981|LQNS02278082.1_33127_3p\t\tTranscript = TRINITY_GG_20135_c0_g1_i5.mrna1 Dir=antisense TAG=Neuronal acetylcholine receptor subunit alpha-9\t\tType = 7_A1\t\tEnergy = -5.77 Kcal/mol"
[3] "MicroRNA = LQNS02278125.1_38470_3p\t\tTranscript = TRINITY_GG_22182_c1_g1_i2.mrna1 Dir=antisense TAG=Uncharacterized protein\t\tType = 7_A1\t\tEnergy = -1.77 Kcal/mol"
[4] "MicroRNA = LQNS02278125.1_38470_3p\t\tTranscript = TRINITY_GG_22182_c1_g1_i2.mrna1 Dir=antisense TAG=Uncharacterized protein\t\tType = 7_A1\t\tEnergy = -5.20 Kcal/mol"
[5] "MicroRNA = LQNS02278075.1_32377_3p\t\tTranscript = TRINITY_GG_143691_c0_g1_i3.mrna1 Dir=sense TAG=Acidic phospholipase A2 PA4\t\tType = 7_A1\t\tEnergy = -3.30 Kcal/mol"
[6] "MicroRNA = miR-317|LQNS02000228.1_2413_3p\t\tTranscript = TRINITY_GG_4592_c2_g1_i10.mrna1 Dir=sense TAG=Serine/threonine-protein phosphatase 2A regulatory subunit B'' subunit gamma\t\tType = 7_m8\t\tEnergy = -6.35 Kcal/mol"
dput(head(a,4))
c("MicroRNA = miR-981|LQNS02278082.1_33127_3p\t\tTranscript = TRINITY_GG_20135_c0_g1_i5.mrna1 Dir=antisense TAG=Neuronal acetylcholine receptor subunit alpha-9\t\tType = 7_A1\t\tEnergy = -0.70 Kcal/mol",
"MicroRNA = miR-981|LQNS02278082.1_33127_3p\t\tTranscript = TRINITY_GG_20135_c0_g1_i5.mrna1 Dir=antisense TAG=Neuronal acetylcholine receptor subunit alpha-9\t\tType = 7_A1\t\tEnergy = -5.77 Kcal/mol",
"MicroRNA = LQNS02278125.1_38470_3p\t\tTranscript = TRINITY_GG_22182_c1_g1_i2.mrna1 Dir=antisense TAG=Uncharacterized protein\t\tType = 7_A1\t\tEnergy = -1.77 Kcal/mol",
"MicroRNA = LQNS02278125.1_38470_3p\t\tTranscript = TRINITY_GG_22182_c1_g1_i2.mrna1 Dir=antisense TAG=Uncharacterized protein\t\tType = 7_A1\t\tEnergy = -5.20 Kcal/mol"
)
re <- rex(
capture(name = "MicroRNA", alpha),
"[",
spaces,
capture(name = "Transcript", alpha),
"[",
spaces,
capture(name = "Type", alpha),
"[",
spaces,
capture(name = "Energy", digits),
"]:")
re_matches(a, re)
MicroRNA Transcript Type Energy
1 <NA> <NA> <NA> <NA>
2 <NA> <NA> <NA> <NA>
3 <NA> <NA> <NA> <NA>
知道如何在 R 或 shell 中执行此操作吗? 谢谢!
【问题讨论】: