【问题标题】:Pandas concatenation of multiple dataframes returns null values多个数据帧的 Pandas 连接返回空值
【发布时间】:2016-08-26 18:24:40
【问题描述】:

我有一个数据框 (df),我将其分解为 4 个新的 dfs(mediaclientcode_typedate)。 media 有一列空值,而其他三列只是 1-dim dfs,每个都由空值组成。替换每个数据帧中的空值后,我尝试pd.concat获取单个df并得到以下结果。

 code_type
0   P
1   P
2   P
3   P
4   P
5   P

code_name   media_type  acq.    revenue
0   RASH    NaN         50.0     34004.0
1   100     NaN         10.0     1035.0
2   NEWS    NaN         61.0     3475.0
3   DR      NaN         53.0     4307.0
4   SPORTS  NaN         45.0     6503.0
5   DOUBL   NaN         13.0     4205.0

    client_id
0   2.0
1   2.0
2   2.0
3   2.0
4   2.0
5   2.0

    date
0   2016-08-15
1   2016-08-15
2   2016-08-15
3   2016-08-15
4   2016-08-15
5   2016-08-15

pd.merge media 用另一个单独的 df 替换 media.media_type 下的 NaN,它附加了一个新的 media_type_y

code_name   media_type_x    acq.    revenue  media_type_y
0   RASH       NaN          282     34004.0  Radio
1   100        NaN          119     1035.0   NaN
2   NEWS       NaN           81     3475.0   SiriusXM
3   DR         NaN           33     4307.0   SiriusXM
4   SPORTS     NaN           25     6503.0   SiriusXM
5   DOUBL      NaN           23     4205.0   Podcast

然后我删除 media_type_x 并将 media_type_y 重命名为 media_type

final = m.loc[:,('code_name','media_type_y', 'acquisition', 'revenue')]
final = final.rename(columns={'media_type_y': 'media_type'})

所以当我连接时,我有一个完整的 df。

clean = pd.concat([media, client, code_type, date], axis=1)  

    code    media       acq.    revenue   client code_type  date
0   RASH    Radio       50.0    34004.0     NaN     NaN     NaT
1   100     NaN         10.0    1035.0      NaN     NaN     NaT
2   NEWS    SiriusXM    61.0    3475.0      NaN     NaN     NaT
3   DR      SiriusXM    53.0    4307.0      NaN     NaN     NaT
4   SPORTS  SiriusXM    45.0    6503.0      NaN     NaN     NaT
5   DOUBL   Podcast     13.0    4205.0      NaN     NaN     NaT


clean.client 应该都是2
clean.code_type 应该都是P
clean.date 应该都是08/15/2016

dfs 自己显示数据,只有当我连接时才会丢失信息。我认为这可能与索引有关,但我不确定。也可能与我有一个包含strint 的列(参见上面的clean.code)这一事实有关,这可能是我得到下面列出的运行时错误的原因。

//anaconda/lib/python3.5/site-packages/pandas/indexes/api.py:71: RuntimeWarning: unorderable types: int()

【问题讨论】:

标签: python pandas concatenation


【解决方案1】:

从这里开始:

  code_name media_type  acq.  revenue
0      RASH      Radio  50.0  34004.0
1       100        NaN  10.0   1035.0
2      NEWS   SiriusXM  61.0   3475.0
3        DR   SiriusXM  53.0   4307.0
4    SPORTS   SiriusXM  45.0   6503.0
5     DOUBL    Podcast  13.0   4205.0

试试这个:

df['client_id'] = 2
df['date']      = '08/15/2016'
df['code_type'] = 'P'
df

    code_name media_type  acq.  revenue  client_id        date code_type
0      RASH      Radio  50.0  34004.0          2  08/15/2016         P
1       100        NaN  10.0   1035.0          2  08/15/2016         P
2      NEWS   SiriusXM  61.0   3475.0          2  08/15/2016         P
3        DR   SiriusXM  53.0   4307.0          2  08/15/2016         P
4    SPORTS   SiriusXM  45.0   6503.0          2  08/15/2016         P
5     DOUBL    Podcast  13.0   4205.0          2  08/15/2016         P

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2019-06-27
    • 1970-01-01
    • 2020-09-02
    • 2018-10-26
    • 1970-01-01
    • 2018-01-26
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多