【发布时间】:2019-10-18 06:28:48
【问题描述】:
我正在尝试 concat() 熊猫中的两个 DataFrame。其中一个数据框只是我从另一个数据框中获取并转换的一些列,所以我绝不会使用它们。但是当我尝试连接它们时,我得到一个错误,说它们不能连接在一起,所以它们几乎是对角连接的,行数加倍(因为每个都有相同的行)并且列数增加一列加上另一个。
理想情况下,我希望行数保持不变,列数是一个中的列加上另一个中的列。以下是我的代码:
## In the below code I create new names for the scaled fields by adding SC_ to
## their existing names
SC_ExplanVars = []
for var in explan_vars:
sc_var= "SC_" + var
SC_ExplanVars.append(sc_var)
## Scale the columns from my dataframe that will be used as explanatory
## variables
X_Scale = preprocessing.scale(data[ExplanVars])
## Put my newly scaled explanatory variables into a DataFrame with same headers
## but with SC_ infont
X_Scale = pd.DataFrame(X_Scale, columns = SC_ExplanVars)
## Concatenate scaled variables onto original dataset
datat = pd.concat([data, X_Scale], axis=1)
我收到警告:
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\api.py:77: RuntimeWarning: '<' not supported between instances of 'str' and 'int', sort order is undefined for incomparable objects
result = result.union(other)
编辑
下面是我所描述的表格。它只是前 10 行,我已将其更改为仅一列,但似乎仍然给我同样的问题
Data=
Col1
297
297
297
297
275
275
275
400
400
400
X_Scale =
SC_Col1
-0.4644471998668502
-0.4644471998668502
-0.4644471998668502
-0.4644471998668502
-0.8849343767010354
-0.8849343767010354
-0.8849343767010354
1.5041973098568349
1.5041973098568349
1.5041973098568349
连接后
datat =
Col1 SC_Col1
297.0 NaN
297.0 NaN
297.0 NaN
297.0 NaN
275.0 NaN
275.0 NaN
275.0 NaN
400.0 NaN
400.0 NaN
400.0 NaN
NaN -0.4644471998668502
NaN -0.4644471998668502
NaN -0.4644471998668502
NaN -0.4644471998668502
NaN -0.8849343767010354
NaN -0.8849343767010354
NaN -0.8849343767010354
NaN 1.5041973098568349
NaN 1.5041973098568349
NaN 1.5041973098568349
【问题讨论】:
-
您能否展示您的数据框样本并发布MCVE?由于您没有说出
explan_vars、data、preprocessing是什么,因此无法重现错误... -
使用您在编辑中发布的两个数据框可以正常工作。我无法重现您的行为:我的串联数据框中有两列、十行且没有 NaN。我只能认为问题出在之前的某个地方。从警告中,也许你有一些是字符串的整数。
-
你试过用
ignore_index=True做concat吗?
标签: python-3.x pandas dataframe scikit-learn