DateFormatter 返回错误的日期 - Matplotlib/Pandas [关闭]答案

【问题标题】：DateFormatter returning wrong dates - Matplotlib/Pandas [closed]DateFormatter 返回错误的日期 - Matplotlib/Pandas [关闭]
【发布时间】：2013-09-22 13:42:39
【问题描述】：

我正在尝试使用 matplotlib 和 pandas 绘制一些数据。但是，当使用 DateFormatter 时，日期的渲染不正确取决于我从 DataFrame 中过滤出的内容：

下面两个示例中的日期使用 matplotlib 呈现为“2013 年 8 月 20 日 00 日”，正如预期的那样：

df['metric2'].plot()
ax = gca()
ax.xaxis.set_major_formatter(DateFormatter('%B %d %H %Y'))
draw()

df[df['metric1']>1000]['metric2'].plot()
ax = gca()
ax.xaxis.set_major_formatter(DateFormatter('%B %d %H %Y'))
draw()

但是使用下面的代码，日期被呈现为“February 01 00 1048”：

df[df['browser']=='Chrome/29']['metric2'].plot()
ax = gca()
ax.xaxis.set_major_formatter(DateFormatter('%B %d %H %Y'))
draw()

【问题讨论】：

如果没有看到其中一些数据，就很难诊断出问题。
可能相关stackoverflow.com/questions/13988111/…，因为pandas仍然在搞砸日期处理代码。
原始文件中的日期看起来像“2013-08-18 00”，后跟浏览器（采用上述格式）和 3 个指标。以下是我如何将文件中的数据提取到 pandas 中：def dateParserHour(time_string): return datetime.datetime.strptime(time_string, '%Y-%m-%d %H') 和 pd.read_table('file.txt', index_col=0, parse_dates=True, date_parser=dateParserHour)
您可以只显示df.head() 或其他数据子集而不是尝试描述它吗？谢谢。
我找到了解决办法。出于某种原因，当我绘制上面的第三个示例时，matplotlib 无法与我的 TimeSeries 配合使用。如果我用下面的代码重建索引然后绘图（使用相同的 DateFormatter() 函数，它工作正常。df2 = df[df['browser']=='Chrome/29']['metric2']; df2.index = df2.index.astype(datetime.datetime);

标签： python matplotlib pandas

【解决方案1】：

我们需要有一组具体的数据和一个可供参考的程序。这里没有问题：

数据.txt：

2013-08-18 00   IE  1000    500 3000
2013-08-19 00   FF  2000    250 6000
2013-08-20 00   Opera   3000    450 9000
2001-03-21 00   Chrome/29   3000    450 9000
2013-08-21 00   Chrome/29   3000    450 9000
2014-01-22 00   Chrome/29   3000    750 9000

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as md
import datetime as dt


df = pd.read_table(
    'data.txt', 
    index_col=0, 
    parse_dates=True,
    date_parser=lambda s: dt.datetime.strptime(s, '%Y-%m-%d %H'),
    header=None,
    names=['browser', 'metric1', 'metric2', 'metric3']
)

print df

df[df['browser']=='Chrome/29']['metric2'].plot()
ax = plt.gca()
ax.xaxis.set_major_formatter(md.DateFormatter('%B %d %H %Y'))
plt.draw()
plt.show()


--output:--
              browser  metric1  metric2  metric3
2013-08-18         IE     1000      500     3000
2013-08-19         FF     2000      250     6000
2013-08-20      Opera     3000      450     9000
2001-03-21  Chrome/29     3000      450     9000
2013-08-21  Chrome/29     3000      450     9000
2014-01-22  Chrome/29     3000      750     9000

并且调整轴以便您可以更好地看到点（设置x轴的日期范围，设置y轴的范围）：

...
df[df['browser']=='Chrome/29']['metric2'].plot(style='r--')
ax = plt.gca()
ax.xaxis.set_major_formatter(md.DateFormatter('%B %d %H %Y'))

ax.set_xlim(dt.datetime(2000, 1, 1,), dt.datetime(2017, 1, 1))
ax.set_ylim(400, 1000)
...
...

只要您拒绝发布一个最小示例以及产生您不想要的输出的数据...

【讨论】：

我不明白为什么这个答案被否决了
我最初投了反对票，因为所有这些答案都显示了预期的行为（不是真的有用，因为 OP 没有看到这种行为）。但是，否决票可能有点矫枉过正。我很抱歉。
抱歉延迟回复。我准备的样品和上面的一样。唯一的区别是我的索引名称为“小时”（这是原始文件中列的标签）。我创建了一个新文件，其中仅包含原始文件的前 5 行以重新运行分析。当我这样做时，我在 matplotlib 中的日期按预期显示。原始文件中 TimeSeries 中的某个值是否可能导致问题？仅通过扫描原始文件中的唯一值，我没有发现任何问题。
唯一的区别是我的索引名为“小时”你能解释一下这是什么意思吗？