【发布时间】:2020-03-21 03:03:34
【问题描述】:
我有两个标签文件,例如: file1.txt
Clustername Seqname1 Seqname2
Cluster1 Seq1(+) SeqA
Cluster1 Seq2(-) SeqA
Cluster1 Seq3(+) SeqB
Cluster1 Seq300(+) SeqB
Cluster1 Seq90(+) SeqL
Cluster1 Seq90(+) SeqO
Cluster1 Seq2(-) SeqC
Cluster2 Seq8(-) SeqY
Cluster2 Seq8(-) SeqH
Cluster2 Seq8(-) SeqP
Cluster2 Seq79(-) SeqY
Cluster3 Seq10(+) SeqK
Cluster3 Seq10(+) SeqS
Cluster3 Seq10(+) SeqT
Cluster4 Seq300(+) SeqB
file2.txt
Clustername Names
Cluster1 SeqA
Cluster1 Seq1(+)
Cluster1 SeqC
Cluster1 Seq2(-)
Cluster1 SeqO
Cluster1 Seq3(+)
Cluster1 Seq90(+)
Cluster1 SeqB
Cluster1 SeqG
Cluster2 Seq8(-)
Cluster2 SeqY
Cluster2 SeqH
Cluster3 Seq10(+)
Cluster3 SeqK
Cluster4 SeqB
Cluster4 Seq300(+)
正如您在file2.txt 中看到的那样,Cluster1 中不存在 SeqL,那么我想删除该行:
Cluster1 Seq90(+) SeqL 来自 file1.txt
Seq300(+) 在Cluster1 中也不存在,然后我删除该行:
Cluster1 Seq300(+) SeqB
来自 file1.txt
同样适用于:
Cluster2 Seq8(-) SeqP
Cluster2 Seq79(-) SeqY
file2.txt中的CLuster2中没有SeqP,Cluster2中也没有Seq79(-),然后我删除行:
Cluster2 Seq8(-) SeqP
Cluster2 Seq79(-) SeqY
来自 file1.txt
同样适用于:
Cluster3 Seq10(+) SeqS
Cluster3 Seq10(+) SeqT
因为SeqS和SeqT不在file2.txt中的Cluster2中,所以我从file1.txt中删除以下两行:
Cluster3 Seq10(+) SeqS
Cluster3 Seq10(+) SeqT
最后我应该得到一个 ex file1.txt,例如:
Clustername Seqname1 Seqname2
Cluster1 Seq1(+) SeqA
Cluster1 Seq2(-) SeqA
Cluster1 Seq3(+) SeqB
Cluster1 Seq90(+) SeqO
Cluster1 Seq2(-) SeqC
Cluster2 Seq8(-) SeqY
Cluster2 Seq8(-) SeqH
Cluster3 Seq10(+) SeqK
Cluster4 Seq300(+) SeqB
【问题讨论】:
标签: python python-3.x pandas dataframe merge