【问题标题】:Pandas Tick Data Averaging By Hour and Plotting For Each Week Of HistoryPandas Tick 数据按小时平均并绘制历史的每一周
【发布时间】:2013-08-16 16:59:38
【问题描述】:

我一直在关注这里的答案:

Pandas: how to plot yearly data on top of each other

这需要时间序列并在新图上绘制每天的最后一个数据点。图上的每条线代表一周的数据量(例如每周 5 个数据点):

我使用以下代码来做到这一点:

#Chart by last price
daily = ts.groupby(lambda x: x.isocalendar()[1:]).agg(lambda s: s[-1])
daily.index = pd.MultiIndex.from_tuples(daily.index, names=['W', 'D'])
dofw = "Mon Tue Wed Thu Fri Sat Sun".split()
grid = daily.unstack('D').rename(columns=lambda x: dofw[x-1])
grid[-5:].T.plot()

我想做的不是按一天中的最后一个数据点进行汇总,而是按小时汇总(因此平均每小时的数据)并绘制每周的每小时数据。所以图表看起来与链接图像中的图表相似,只是它每条线每天有 24 个数据点,而不是每条线每天只有一个数据点

有什么方法可以将 Pandas DataFrame 粘贴到这篇文章中?当我单击复制粘贴时,它会粘贴为列表

编辑:

考虑到最近一周的不完整数据用于图表目的的最终代码:

# First we read the DataFrame and resample it to get a mean on every hour
df = pd.read_csv(r"MYFILE.csv", header=None,
                 parse_dates=[0], index_col=0).resample('H', how='mean').dropna()
# Then we add a week field so we can filter it by the week
df['week']= df.index.map(lambda x: x.isocalendar()[1])
start_range = list(set(df['week']))[-3]
end_range = list(set(df['week']))[-1]
# Create week labels
weekdays = 'Mon Tue Wed Thu Fri Sat Sun'.split()

# Create the figure
fig, ax = plt.subplots()

# For every week we want to plot
for week in range(start_range,end_range+1):
    # Select out the week
    dfw = df[df['week'] == week].copy()
    # Here we align all the weeks to span over the same time period so they
    # can be shown on the graph one over the other, and not one next to
    # the other.
    dfw['timestamp'] = dfw.index.values - (week * np.timedelta64(1, 'W'))
    dfw = dfw.set_index(['timestamp'])
    # Then we plot our data
    ax.plot(dfw.index, dfw[1], label='week %s' % week)
    # Now to set the x labels. First we resample the timestamp to have
    # a date frequency, and set it to be the xtick values
    if week == end_range:
        resampled = resampled.index + pd.DateOffset(weeks=1)
    else:        
        resampled = dfw.resample('D')
   # newresampled = resampled.index + pd.DateOffset(weeks=1)
    ax.set_xticks(resampled.index.values)
    # But change the xtick labels to be the weekdays.
    ax.set_xticklabels(weekdays)
# Plot the legend
plt.legend()

【问题讨论】:

  • 您可以将其导出为 csv 并粘贴您的数据样本,或者将其放入 gist 中,这将非常有帮助。
  • 热爱 GIST 概念!这是我正在使用的时间序列数据的链接:
  • 这不是 csv?你如何解析这个?
  • 对不起,我以一行为例,“23:37.9 26.1”变成了两个独立的项目,“23:37.9”是索引,“26.1”是值

标签: python matplotlib pandas


【解决方案1】:

解决方案在代码中说明。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# First we read the DataFrame and resample it to get a mean on every hour
df = pd.read_csv('trayport.csv', header=None,
                 parse_dates=[0], index_col=0).resample('H', how='mean').dropna()
# Then we add a week field so we can filter it by the week
df['week']= df.index.map(lambda x: x.isocalendar()[1])

# Create week labels
weekdays = 'Mon Tue Wed Thu Fri Sat Sun'.split()

# Create the figure
fig, ax = plt.subplots()

# For every week we want to plot
for week in range(1, 4):
    # Select out the week
    dfw = df[df['week'] == week].copy()
    # Here we align all the weeks to span over the same time period so they
    # can be shown on the graph one over the other, and not one next to
    # the other.
    dfw['timestamp'] = dfw.index.values - (week * np.timedelta64(1, 'W'))
    dfw = dfw.set_index(['timestamp'])
    # Then we plot our data
    ax.plot(dfw.index, dfw[1], label='week %s' % week)
    # Now to set the x labels. First we resample the timestamp to have
    # a date frequency, and set it to be the xtick values
    resampled = dfw.resample('D')
    ax.set_xticks(resampled.index.values)
    # But change the xtick labels to be the weekdays.
    ax.set_xticklabels(weekdays)
# Plot the legend
plt.legend()

结果如下:

【讨论】:

  • 该代码运行良好,谢谢。关于 X 轴的更多想法?如果我们把图表放大,会有帮助吗?
  • 我们可以用周数制作一个简单的图例/键吗? (只要 1 - 4 就可以了)
  • 我想通了。更新了答案。它会执行您指定的所有操作。
  • 看起来不错!一件小事 - 如果最后一周只有两天的数据,那么 x 标签只会从星期一到星期二。我尝试添加一条额外的行来根据前一周(应该有完整数据)设置 x 轴,但这不起作用:dfw2 = df[df['week'] == week-1]。复制()
  • 感谢您抽出宝贵时间顺便研究一下 - 非常感谢!
【解决方案2】:

您可以使用resample(DataFrame 或 Series)方法:

df.resample('H')

默认情况下它使用how='mean'(即,这将按小时平均结果)。

【讨论】:

    猜你喜欢
    • 2015-08-19
    • 1970-01-01
    • 1970-01-01
    • 2017-11-12
    • 2016-10-17
    • 1970-01-01
    • 1970-01-01
    • 2021-08-25
    • 2023-03-06
    相关资源
    最近更新 更多