【发布时间】:2021-05-12 03:46:32
【问题描述】:
我是 Python 初学者。我有多个 XLSX 输入文件,我想在 Pandas 数据帧中读取 XLSX,检查我感兴趣的字段是否具有正确的数据类型,然后将数据帧转换为 CSV。
其实我用的是下面的脚本:
# import needed modules
import pandas as pd
import numpy as np
import os
# Select the input folder
GMR_folder = r'C:\Users\Me\Desktop\MyFolder'
# Read all the files within the folder
files = os.listdir(GMR_folder)
# Read xlsx files within the folder
files_xls = [f for f in files if f[-4:] == 'xlsx']
for file in files_xls:
last_path = file
member = (file[:-5])
file_path = GMR_folder + "\\" + last_path
# print(file_path)
# Read the excel using specified datatype for the specified column
dataExcel = pd.read_excel(file_path,
skiprows=range(0,3)
# define the datatype for each column we're intersted in
dtype={'Col1':np.str,
'Col2':np.str,
'Col3':np.str,
'Col4':np.str,
'Col5':np.str,
'GPS Latitude (DD format) *': np.float32,
'GPS Longitude (DD format) *': np.float32,
'GPS Latitude (Degrés décimaux) *':np.float32,
'GPS Longitude (Degrés décimaux) *':np.float32,
"Col10":np.int64,
"Col11":np.float64,
"Col12":np.float64,
"Col13":np.int64,
"Col14":np.float64,
"Col15":np.float64
})
# Insert the member ID in Col1
dataExcel["Col1"] = member
# Export the dataframe into a csv using the right encoding, useful to avoid strange char
dataExcel.to_csv(member+'.csv',
encoding="utf-8-sig")
print(member + ' csv created')
print( 'all csv created')
实际上该脚本适用于某些 XLSX,但对于其他一些我有以下错误:
Unable to convert column Col13 to type <class 'numpy.int64'>
其他需要转换为 float32 的列也会发生这种情况。
我该如何解决这个错误?在无法转换为正确数据类型的行中有 NA 值会很棒。我该怎么办?
【问题讨论】:
-
尝试
pd.Int64获取可为空的整数列。
标签: pandas numpy types integer number-formatting