【问题标题】:ValueError: You are trying to merge on object and int64 columns when use pandas mergeValueError:您正在尝试在使用 pandas 合并时合并对象和 int64 列
【发布时间】:2020-10-16 08:41:42
【问题描述】:

test.csv 数据是这样的:

device_id,upload_time,latitude,longitude,mileage,other_vals,speed,upload_time_1
11115304371,2020-08-05 05:10:05+00:00,23.140366,114.18685,0,,0,202008
1234,2020-08-05 05:10:33+00:00,22.994716,114.2998,0,,0,202008
11115304371,2020-08-05 05:20:55+00:00,22.994716,114.2998,0,,3.8,202008
11115304371,2020-08-05 05:24:02+00:00,22.994916,114.299683,0,,2.1,202008
11115304371,2020-08-05 05:24:30+00:00,22.99545,114.2998,0,,6.5,202008
11115304371,2020-08-05 05:29:30+00:00,22.995433,114.299766,0,,3.4,202008
11115304371,2020-08-05 05:34:30+00:00,22.995433,114.299766,0,,3.4,202008
11115304371,2020-08-05 05:39:30+00:00,22.995433,114.299766,0,,3.4,202008
822649e2d142a486,2020-08-05 05:44:30+00:00,22.995433,114.299766,0,,3.4,202008
11115304371,2020-08-05 05:44:53+00:00,22.995433,114.299766,0,,3.4,202008
11115304371,2020-08-05 05:45:40+00:00,22.995433,114.299766,0,,5.8,202008

而且 info.csv 数据是这样的:

car_id,device_id,car_type,car_num,marketer_name
1,11110110037,1,AAA,T1
2,11115304371,1,BBB,T2
3,11111100345,1,CCC,T3
4,11111100242,1,DDD,T4
5,12221100034,1,EEE,T5
6,12221100230,1,FFF,T6
7,14465301234,1,GGG,T7

当我使用此代码合并 2 个数据框时。

import pandas as pd

df_device_data = pd.read_csv(r'E:/test.csv', encoding='utf-8', parse_dates=[1], low_memory=False)
df_common_car_info = pd.read_csv(r'E:/info.csv', encoding='utf-8', low_memory=False)
result = pd.merge(df_device_data, df_common_car_info, how='left', on='device_id')
result.to_csv(r'E:/result.csv', index=False, mode='w', header=True)

发生了这个错误:

ValueError:您正在尝试合并 object 和 int64 列。如果 你想继续你应该使用 pd.concat

如何解决?

【问题讨论】:

  • 因为这个 device_id "822649e2d142a486",你的 test.csv device_id 是一个对象类型,而另一个文件中的 device_id 是一个 int。将 info.csv 中的 deviceid 转换为字符串。

标签: python pandas


【解决方案1】:

解决方案:
只需在您的代码中添加下面提到的行,它就会像魔术一样工作:)

df_device_data['device_id'] = df_device_data['device_id'].astype(str)
df_common_car_info['device_id'] = df_common_car_info['device_id'].astype(str)

最终代码:

import pandas as pd

df_device_data = pd.read_csv(r'/home/piyushsambhi/Downloads/test.csv', encoding='utf-8', parse_dates=[1], low_memory=False)
df_common_car_info = pd.read_csv(r'/home/piyushsambhi/Downloads/info.csv', encoding='utf-8', low_memory=False)

df_device_data['device_id'] = df_device_data['device_id'].astype(str) #this line is not required as per your data and problem statement, but for just in case purpose. It is best to handle errors before they occur :)
df_common_car_info['device_id'] = df_common_car_info['device_id'].astype(str)

result = pd.merge(df_device_data, df_common_car_info, how='left', on='device_id')
result.to_csv(r'/home/piyushsambhi/Downloads/result.csv', index=False, mode='w', header=True)

输出:

device_id,upload_time,latitude,longitude,mileage,other_vals,speed,upload_time_1,car_id,car_type,car_num,marketer_name
11115304371,2020-08-05 05:10:05,23.140366,114.18685,0,,0.0,202008,2.0,1.0,BBB,T2
1234,2020-08-05 05:10:33,22.994716,114.2998,0,,0.0,202008,,,,
11115304371,2020-08-05 05:20:55,22.994716,114.2998,0,,3.8,202008,2.0,1.0,BBB,T2
11115304371,2020-08-05 05:24:02,22.994916,114.299683,0,,2.1,202008,2.0,1.0,BBB,T2
11115304371,2020-08-05 05:24:30,22.99545,114.2998,0,,6.5,202008,2.0,1.0,BBB,T2
11115304371,2020-08-05 05:29:30,22.995433,114.29976599999999,0,,3.4,202008,2.0,1.0,BBB,T2
11115304371,2020-08-05 05:34:30,22.995433,114.29976599999999,0,,3.4,202008,2.0,1.0,BBB,T2
11115304371,2020-08-05 05:39:30,22.995433,114.29976599999999,0,,3.4,202008,2.0,1.0,BBB,T2
822649e2d142a486,2020-08-05 05:44:30,22.995433,114.29976599999999,0,,3.4,202008,,,,
11115304371,2020-08-05 05:44:53,22.995433,114.29976599999999,0,,3.4,202008,2.0,1.0,BBB,T2
11115304371,2020-08-05 05:45:40,22.995433,114.29976599999999,0,,5.8,202008,2.0,1.0,BBB,T2

【讨论】:

  • device_id 11115304371 在 test.csv 和 info.csv 中,但 result.csv 与 device_id 11115304371 不匹配。
  • 我将不得不检查为什么类型对象在合并数据帧时会出现此问题,但它适用于 str。我也更新了相同的代码
  • 乐于助人:)
【解决方案2】:

当我使用此代码时df.astype(str):

import pandas as pd

df_device_data = pd.read_csv(r'E:/test.csv', encoding='utf-8', parse_dates=[1], low_memory=False) 
df_device_data['device_id'] = df_device_data['device_id'].astype(str)
df_common_car_info = pd.read_csv(r'E:/info.csv', encoding='utf-8', low_memory=False) 
df_common_car_info['device_id'] = df_common_car_info['device_id'].astype(str)
result = pd.merge(df_device_data, df_common_car_info, how='left', on='device_id')
result.to_csv(r'E:/result.csv', index=False, mode='w', header=True)

结果是对的。

【讨论】:

    猜你喜欢
    • 2020-01-07
    • 1970-01-01
    • 2019-08-12
    • 2020-07-15
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2015-10-10
    • 2019-07-22
    相关资源
    最近更新 更多