【发布时间】:2018-01-17 16:40:41
【问题描述】:
我有一个数据框:
import pandas as pd
df = pd.DataFrame({'start' : [5, 10, '$%%', 20], 'stop' : [10, 20, 30, 40]})
df['length_of_region'] = pd.Series([0 for i in range(0, len(df['start']))])
我只想计算非零数值行值的区域长度,如果值不正确,则为带有错误注释的行计算跳过函数。这是我目前所拥有的:
df['Notes'] = pd.Series(["" for i in range(0, len(df['region_name']))])
for i in range(0, len(df['start'])):
if pd.isnull(df['start'][i]) == True:
df['Notes'][i] += 'Error: Missing value for chromosome start at region %s, required value;' % (df['region_name'][i])
df['critical_error'][i] = True
num_error = num_error+1
else:
try:
#print (df['start'][i]).isnumeric()
start = int(df['start'][i])
#print start
#print df['start'][i]
if start == 0:
raise ValueError
except:
df['Notes'][i] += 'Error: Chromosome start should be a non zero number at region %s; ' % (df['region_name'][i])
#print df['start'][i]
df['critical_error'][i] = True
num_error = num_error+1
for i in range(0, len(df['start'][i])):
if df['critical_error'][i] == True:
continue
df['length_of_region'][i] = (df['stop'][i] - df['start'][i]) + 1.0
但是,pandas 将 df['start'] 转换为 str 变量,即使我使用 int 进行转换,我也会收到以下错误:
df['length_of_region'][i] = (df['stop'][i] - df['start'][i]) + 1.0
TypeError: 不支持的操作数类型 -: 'numpy.int64' 和 'str'
我在这里缺少什么?感谢您的宝贵时间!
【问题讨论】: