非常感谢您的帮助 :) 它可以满足我的要求,但现在我尝试对其进行一些改进,顺便说一下,如果在数据框中找不到任何条目,则符合标准,则应删除初始行。
所以我的想法是,我希望我的数据被“压缩/清理”。
所以最终结果应该只包含数据,其中至少有 2 行或更多行在 RT 和 DT 方面相互匹配。
如果 a 行在 RT 和 DT 方面是“唯一的”,则它不应出现在最终结果中。
df1 = pd.DataFrame([[1, 760, 36.00, 14.1, 15000, 22], [3, 104, 35.95, 14.13, 12000, 22], [4, 120, 34, 13, 16000, 22 ], [2, 184, 36.05, 14.12, 11000, 22],[8, 8, 8, 8, 8, 22],[7, 7, 7, 7, 7, 22],[6, 6, 6 , 6, 6, 22]], columns=list(["ID","mz","DT","RT", "area", "random"]))
结果 = ([1, 760, 36.00, 14.1 , 15000], [2, 184, 36.05, 14.12, 11000], [3, 104, 36.95, 14.13, 12000])
下面的代码或多或少可以完成这项工作,但需要大量时间,因为两个 for 循环...
import pandas as pd
df1 = pd.read_csv("test7.csv")
#df1 = pd.DataFrame([[1, 760, 36.00, 14.1 , 15000, 22], [3, 104, 35.95, 14.13, 12000, 22], [4, 120, 34, 13, 16000, 22], [2, 184, 36.05, 14.12, 11000, 22],[8, 8, 8, 8, 8, 22],[7, 7, 7, 7, 7, 22],[6, 6, 6, 6, 6, 22]], columns=list(["ID","mz","DT","RT", "area", "random"]))
df_list = pd.DataFrame()
final = pd.DataFrame()
a = len(df1)
df2 = df1
for i in range(a):
current_row_dt = df2.loc[i]['DT']
current_row_RT = df2.loc[i]['RT']
for b in range(a):
compared_row_dt = df2.loc[b]['DT']
compared_row_RT = df2.loc[b]['RT']
if compared_row_dt <= (current_row_dt + 0.1) and compared_row_dt >= (current_row_dt - 0.1):
if compared_row_RT <= (current_row_RT + 0.1) and compared_row_RT >= (current_row_RT - 0.1):
df_i = df2.loc[b]
df_list = df_list.append(df_i)
df_dup = df_list[df_list.duplicated(keep=False)]
df_final = df_dup.drop_duplicates()
print(df_final)
df_final.to_csv("test7_sorted.csv")