我认为您需要在read_csv 中添加参数skipinitialspace:
skipinialspace : boolean,默认为 False,分隔符后跳过空格
测试:
import pandas as pd
import numpy as np
import io
temp=u"""uid, f_1, f_2
1, "1", 1.19
2, "2", 2.3
3, "0", 4.8"""
print pd.read_csv(io.StringIO(temp))
uid f_1 f_2
0 1 "1" 1.19
1 2 "2" 2.30
2 3 "0" 4.80
#doesn't work dtype
print pd.read_csv(io.StringIO(temp), dtype= {'f_1': np.int64}).dtypes
uid int64
f_1 object
f_2 float64
dtype: object
print pd.read_csv(io.StringIO(temp), skipinitialspace=True).dtypes
uid int64
f_1 int64
f_2 float64
dtype: object
如果您想从列f_1 中删除第一个和最后一个字符",请使用converters:
import pandas as pd
import io
temp=u"""uid, f_1, f_2
1, "1", 1.19
2, "2", 2.3
3, "0", 4.8"""
print pd.read_csv(io.StringIO(temp))
uid f_1 f_2
0 1 "1" 1.19
1 2 "2" 2.30
2 3 "0" 4.80
#remove "
def converter(x):
return x.strip('"')
#define each column
converters={'f_1': converter}
df = pd.read_csv(io.StringIO(temp), skipinitialspace=True, converters = converters)
print df
uid f_1 f_2
0 1 1 1.19
1 2 2 2.30
2 3 0 4.80
print df.dtypes
uid int64
f_1 object
f_2 float64
dtype: object
如果您需要将integer 列f_1 转换为string,请使用dtype:
import pandas as pd
import io
temp=u"""uid, f_1, f_2
1, 1, 1.19
2, 2, 2.3
3, 0, 4.8"""
print pd.read_csv(io.StringIO(temp)).dtypes
uid int64
f_1 int64
f_2 float64
dtype: object
df = pd.read_csv(io.StringIO(temp), skipinitialspace=True, dtype = {'f_1' : str })
print df
uid f_1 f_2
0 1 1 1.19
1 2 2 2.30
2 3 0 4.80
print df.dtypes
uid int64
f_1 object
f_2 float64
dtype: object
注意:不要忘记将io.StringIO(temp) 更改为a.csv。
解释str vs object 是here。