【问题标题】:Pandas dataframe only reading first value, NaN for everything elsePandas 数据框仅读取第一个值,其他所有值均为 NaN
【发布时间】:2021-01-22 15:39:10
【问题描述】:

我正在尝试使用 pandas 读取 csv,然后插入到 SQL 表中。当我打印(数据)时,我正在正确地从 csv 读取数据,但是一旦我将它添加到数据框中,它只会读取第一列,并为 csv 中的每个其他值插入 NaN。下面的代码和输出;

data = pd.read_csv (localFilePath)
print(data)
df = pd.DataFrame(data, columns= ['Date','EECode','LastName','FirstName', \
           'HomeDepartmentCode','HomeDepartmentDesc','PayClass','InPunchTime', \
           'OutPunchTime','DepartmentCode','DepartmentDesc','JobCodesCode', \
           'JobCodesDesc','TeamCode','TeamDesc','EarnCode'])
print(df)

for row in df.itertuples():
    SQLInsert = ('''
                INSERT INTO [Reporting].[dbo].[Paycom_Missing_Punch] 
                (Date, EECode, LastName, FirstName, HomeDepartmentCode, 
                HomeDepartmentDesc, PayClass, InPunchTime, OutPunchTime, 
                DepartmentCode, DepartmentDesc, JobCodesCode, JobCodesDesc, 
                TeamCode, TeamDesc, EarnCode)
                VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)
                '''
                )
     args = row.Date, row.EECode, row.LastName, row.FirstName, \
                row.HomeDepartmentCode, row.HomeDepartmentDesc, row.PayClass, row.InPunchTime, \
                row.OutPunchTime, row.DepartmentCode, row.DepartmentDesc, row.JobCodesCode, \
                row.JobCodesDesc, row.TeamCode, row.TeamDesc, row.EarnCode
                          
    #print(SQLInsert) 
    #print(args)
    cursor.execute(SQLInsert, args)     
conn.commit()

打印时输出(数据);

         Date  EE Code  ...               Team Desc Earn Code
0  01/21/2021     1435  ...             Indiana DWD       NaN
1  01/21/2021     1435  ...             Indiana DWD       NaN
2  01/22/2021     1180  ...             Supervisors       NaN
3  01/21/2021     1664  ...  Technical Support Desk       NaN
4  01/21/2021     1078  ...             Supervisors       NaN

将其添加到数据框后输出;

         Date  EECode  LastName  ...  TeamCode  TeamDesc  EarnCode
0  01/21/2021     NaN       NaN  ...       NaN       NaN       NaN
1  01/21/2021     NaN       NaN  ...       NaN       NaN       NaN
2  01/22/2021     NaN       NaN  ...       NaN       NaN       NaN
3  01/21/2021     NaN       NaN  ...       NaN       NaN       NaN
4  01/21/2021     NaN       NaN  ...       NaN       NaN       NaN

我认为问题在于我如何将值传递给数据框,但从我读过或看到的所有内容来看,我这样做的方式看起来是正确的。

【问题讨论】:

  • 不确定是否是这种情况,但您的 args 似乎不在循环范围内?
  • 这是从复制/粘贴到堆栈的格式问题...我更新了它
  • 将其更改为 data.itertuples() 得到了相同的结果,它正在读取第一列并为每隔一列插入 NaN。是否有可能读取分隔符错误,这就是为什么它只获得第一个值?

标签: python sql pandas


【解决方案1】:

问题在于您执行df 的方式。您首先使用data 创建数据框。然后,您尝试使用不存在的名称创建它的另一个数据框。要解决您的问题,只需执行以下操作:

>>> col_names = ['Date','EECode','LastName','FirstName', \
           'HomeDepartmentCode','HomeDepartmentDesc','PayClass','InPunchTime', \
           'OutPunchTime','DepartmentCode','DepartmentDesc','JobCodesCode', \
           'JobCodesDesc','TeamCode','TeamDesc','EarnCode']

>>> df = pd.read_csv(localFilePath)
>>> df.columns = col_names

【讨论】:

  • 当我这样做时,我收到错误“熊猫不允许通过新属性名称创建列”,但如果我这样做; df = pd.read_csv(localFilePath, names=col_names) 我收到关于尝试从 sting 转换日期的错误,所以看起来它解决了实际问题,但我遇到了一个新问题。
  • 万一有人读到这个,新的问题是我必须明确地将一个列命名为一个日期,因为 pandas 将它作为一个字符串读取,我就是这样做的; df['Date'] = pd.to_datetime(df['Date'])
  • 您可以在使用parse_dates 加载它时执行此操作。查看手册read_csv()
猜你喜欢
  • 2018-09-18
  • 1970-01-01
  • 1970-01-01
  • 2020-01-04
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2020-08-21
  • 2021-06-29
相关资源
最近更新 更多