【问题标题】:Pandas: enter missing rows in a dataframePandas: enter missing rows in a dataframe
【发布时间】:2022-12-01 22:30:08
【问题描述】:

I'm collecting time series data, but sometimes for some time points there is no data to be collected. Just say for example I am collecting data across four time points, I might get a dataframe like this:

df_ = pd.DataFrame({'group': ['A']*3+['B']*3,
                    'time': [1,2,4,1,3,4],
                    'value': [100,105,111,200,234,222]})

sometimes there is a datapoint missing and so there is no row for that point, I would like groupby and to forward fill with the previous value to create a new row form which would look like this:

df_missing_completed = pd.DataFrame({'group': ['A']*4+['B']*4,
                                     'time': [1,2,3,4,1,2,3,4],
                                     'value': [100, 101, 105,111,200, 202, 234,222]})

I had the idea that I could create an new dataframe as a template with all the dates and time points, without any values, join it with the real data which would induce NA's, and do a ffillon the value column to fill in the missing data, like below:

df_template = pd.DataFrame({'group': ['A']*4+['B']*4,
                                 'time': [1,2,3,4,1,2,3,4]})
df_final = pd.merge(df_template, df_, on = ['group', 'time'], how='left')
df_final['filled_values'] = df_final['value'].fillna(method='ffill')

but this seems like a messy solution, and with the real data the df_templete will be more complex to create. Does anyone know a better one? Thanks!

【问题讨论】:

  • You don't really do a ffill here, can you clarify the logic?
  • you may want to have a look at reindex with method='ffill'
  • Thanks @mozway. I edited my question to clarify how it would work with a template dataframe and ffill. But I don't find it to be a satisfying solution.

标签: python pandas


【解决方案1】:

I would use:

(df_.pivot(index='time', columns='group', values='value')
    .ffill().unstack().reset_index(name='value')
)

Output:

  group  time  value
0     A     1  100.0
1     A     2  105.0
2     A     3  105.0
3     A     4  111.0
4     B     1  200.0
5     B     2  200.0
6     B     3  234.0
7     B     4  222.0

【讨论】:

    猜你喜欢
    • 2022-12-19
    • 2022-11-20
    • 2022-12-01
    • 1970-01-01
    • 2022-12-01
    • 2021-11-30
    • 2022-12-02
    • 2010-09-23
    • 2022-12-15
    相关资源
    最近更新 更多