【发布时间】:2022-01-27 23:02:41
【问题描述】:
我在带有格式列的 csv 文件中也有 8 个人口: pop
我正在尝试使用此代码仅提取 AD 和 DP 值:
import io
import os
import pandas as pd
def read_vcf(path1):
with open(path1, 'r') as f:
lines = [l for l in f if not l.startswith('##')]
return pd.read_csv(
io.StringIO(''.join(lines)),
dtype={'#CHROM': str, 'POS': int, 'ID': str, 'REF': str, 'ALT': str,
'QUAL': str, 'FILTER': str, 'INFO': str},
sep='\t'
).rename(columns={'#CHROM': 'CHROM'})
def extract_AD(info):
AD= int((info.split(':')[1]).split(',')[0])
return AD
path1 = "C://Users//USER//Desktop//Anas/VCFs_1/test_1.vcf"
file =read_vcf(path1)
pop1 = file[["FORMAT","NEN_001","NEN_003","NEN_200","NEN_300","LAB_004","LAB_300","LAB_400","LAB_500"]]
cols_to_apply = ["NEN_001","NEN_003","NEN_200","NEN_300","LAB_004","LAB_300","LAB_400","LAB_500"]
tst1pop1 = pd.DataFrame(pop1)
AD= tst1pop1[cols_to_apply].applymap(extract_AD)
#AD= pop1["NEN_001"].apply(extract_AD)
def extract_DP(info):
DP = info.split(':')[2:3]
return DP
print("AD Values:"+"\n",AD)
DP= tst1pop1[cols_to_apply].applymap(extract_DP)
print("DP Values:\n",DP)
Sum1 = AD.sum(axis=1)
print(Sum1)
SumAD = sum(Sum1)
print(SumAD)
但它给了我列表中的 DP 值,所以我无法对它们求和
输出: Output
如何从列表中获取整数的 dp 值,以便按行求和?
【问题讨论】:
标签: python bioinformatics