【发布时间】:2019-10-13 03:56:45
【问题描述】:
我有两个数据框,会话 1 和会话 2,我想加入字段“ga:dimension1”。
sessions1.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15775 entries, 0 to 15774
Data columns (total 9 columns):
ga:dimension1 15775 non-null object
ga:date 15775 non-null object
ga:deviceCategory 15775 non-null object
ga:landingPagePath 15775 non-null object
ga:userType 15775 non-null object
ga:operatingSystem 15775 non-null object
ga:operatingSystemVersion 15775 non-null object
ga:sessions 15775 non-null int64
ga:bounces 15775 non-null int64
dtypes: int64(2), object(7)
memory usage: 1.1+ MB
sessions2.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15774 entries, 0 to 15773
Data columns (total 9 columns):
ga:dimension1 15774 non-null object
ga:source 15774 non-null object
ga:medium 15774 non-null object
ga:campaign 15774 non-null object
ga:adContent 15774 non-null object
ga:keyword 15774 non-null object
ga:channelGrouping 15774 non-null object
ga:sessions 15774 non-null int64
ga:bounces 15774 non-null int64
dtypes: int64(2), object(7)
memory usage: 1.1+ MB
看看前几行,它们至少看起来是一样的:
sessions1.head()
ga:dimension1 ga:date ... ga:sessions ga:bounces
0 1567331564026.evxjzuot 20190901 ... 1 1
1 1567331572999.vtnsczsj 20190901 ... 1 1
2 1567331693070.fkdbmcj6 20190901 ... 1 1
3 1567335919816.ctz12xcl 20190901 ... 1 0
4 1567345181556.b3yowmbh 20190901 ... 1 1
sessions2.head()
ga:dimension1 ga:source ... ga:sessions ga:bounces
0 1567331564026.evxjzuot (direct) ... 1 1
1 1567331572999.vtnsczsj (direct) ... 1 1
2 1567331693070.fkdbmcj6 (direct) ... 1 1
3 1567335919816.ctz12xcl (direct) ... 1 0
4 1567345181556.b3yowmbh (direct) ... 1 1
但是,当我尝试这个时:
sessions_combined = sessions1.join(sessions2,
on = 'ga:dimension1',
how = 'left')
我收到一条错误消息:
ValueError:您正在尝试合并 object 和 int64 列。如果 你想继续你应该使用 pd.concat
为什么会这样,我应该如何将两个数据框连接在一起?
【问题讨论】:
-
您需要使用
merge而不是join。 Join 正在尝试将 session1 的索引加入到 session2 的列ga:dimension1。