【发布时间】:2022-11-17 22:23:01
【问题描述】:
我有两个单独的数据框,名为 df1 和 df2,如下所示:
Scaffold Position Ref_Allele_Count Alt_Allele_Count Coverage_Depth Alt_Allele_Frequency
0 1 11 7 51 58 0.879310
1 1 16 20 95 115 0.826087
2 2 9 9 33 42 0.785714
3 2 12 86 51 137 0.372263
4 2 67 41 98 139 0.705036
5 3 8 0 0 0 0.000000
6 4 99 32 26 58 0.448276
7 4 101 100 24 124 0.193548
8 4 115 69 26 95 0.273684
9 5 6 40 57 97 0.587629
10 5 19 53 87 140 0.621429
Scaffold Position Ref_Allele_Count Alt_Allele_Count Coverage_Depth Alt_Allele_Frequency
0 1 11 7 64 71 0.901408
1 1 16 10 90 100 0.900000
2 2 9 79 86 165 0.521212
3 2 12 12 73 85 0.858824
4 2 67 54 96 150 0.640000
5 3 8 0 0 0 0.000000
6 4 99 86 28 114 0.245614
7 4 101 32 25 57 0.438596
8 4 115 97 16 113 0.141593
9 5 6 86 43 129 0.333333
10 5 19 59 27 86 0.313953
我已经在 Allele_Count 和 Coverage Depth 中找到 df1 和 df2 的总和值,但我需要将两个 df 的结果 Alt_Allele_Count 和 Coverage_Depth 彼此相除以细化总等位基因频率 (AF)。我尝试将两个变量分开并收到错误消息: TypeError: float() 参数必须是字符串或数字,而不是“DataFrame” 当我尝试将它们转换为浮点数时,当我将其作为 df 放置时将其转换为该表:
Alt_Allele_Count Coverage_Depth
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN
5 NaN NaN
6 NaN NaN
7 NaN NaN
8 NaN NaN
9 NaN NaN
10 NaN NaN
到目前为止我的代码:
import csv
import pandas as pd
import numpy as np
df1 = pd.read_csv('C:/Users/Tom/Python_CW/file_pairA_1.csv')
df2 = pd.read_csv('C:/Users/Tom/Python_CW/file_pairA_2.csv')
print(df1)
print(df2)
Ref_Allele_Count = (df1[['Ref_Allele_Count']] + df2[['Ref_Allele_Count']])
print(Ref_Allele_Count)
Alt_Allele_Count = (df1[['Alt_Allele_Count']] + df2[['Alt_Allele_Count']])
print(Alt_Allele_Count)
Coverage_Depth = (df1[['Coverage_Depth']] + df2[['Coverage_Depth']]).astype(float)
print(Coverage_Depth)
AF = Alt_Allele_Count / Coverage_Depth
print(AF)
【问题讨论】:
标签: python python-3.x pandas dataframe