使用 Python 2.7 计算标准差并绘制曲线拟合答案

【问题标题】：calculate standard deviation and plot the curving fit using Python 2.7使用 Python 2.7 计算标准差并绘制曲线拟合
【发布时间】：2016-03-31 07:54:05
【问题描述】：

我想编写一个pyhon代码来分析来自网页（http://owww.met.hu/eghajlat/eghajlati_adatsorok/bp/Navig/202_EN.htm）的100年数据中每个月的标准差。

  datum  m_ta m_tax     m_taxd m_tan     m_tand
------- ----- ----- ---------- ----- ----------
1901-01  -4.7   5.0 1901-01-23 -12.2 1901-01-10
1901-02  -2.1   3.5 1901-02-06  -7.9 1901-02-15
1901-03   5.8  13.5 1901-03-20   0.6 1901-03-01
1901-04  11.6  18.2 1901-04-10   7.4 1901-04-23
1901-05  16.8  22.5 1901-05-31  12.2 1901-05-05
1901-06  21.0  24.8 1901-06-03  14.6 1901-06-17
1901-07  22.4  27.4 1901-07-30  16.9 1901-07-04
1901-08  20.7  25.9 1901-08-01  14.7 1901-08-29
....

我写的标准差代码是

def sd(x):
    l = pd.DataFrame()
    for e in range(1, 13): 
            r = x[x.index.str.contains("-" + str(e).zfill(2))] 
            l = l.append(r.std().to_frame().transpose(), ignore_index=True) 

    return l

standard = sd(df)

在这里，我想绘制m_ta 与数据本身的曲线拟合。有人可以帮助我如何绘制它。谢谢！

【问题讨论】：

标签： python-2.7 pandas

【解决方案1】：

一个流行的绘图库是matplotlib，pandas 有一个方便的界面。要绘制线图，您只需调用df.column_name.plot()。

无论如何，希望这会有所帮助：

import requests
from lxml import html

# GET THE DATA
# body > div > pre > font
tree = html.fromstring(requests.get('http://owww.met.hu/eghajlat/eghajlati_adatsorok/bp/Navig/202_EN.htm').text)
lines = [l.text.split() for l in tree.xpath('//body/div/pre/font')]

# IMPORT DATA INTO PANDAS
import pandas as pd
import numpy as np

df = pd.DataFrame(lines[2:], columns = lines[0]).convert_objects(convert_numeric=True)
df['datum'] = pd.to_datetime(df.datum, format='%Y-%m')
df = df.set_index('datum')

print 'Standard deviation of m_ta: %f' % df.m_ta.std()

# PLOT
from matplotlib import pyplot as plt
df.m_ta.plot()
plt.show()

std 是 7.962143，这是图片：

【讨论】：

谢谢。我了解 std 的图表，但我仍然坚持曲线拟合。
我想我误读了这个问题。不管怎样，@Alexander 已经解决了这个问题。

【解决方案2】：

感谢@Yakym 提供将数据加载为df 的方法。

获得后，您可以提取月份并将其用于分组：

df['month'] = df.index.month
df['monthly_mean'] = df.groupby('month').m_ta.transform('mean')
df['monthly_std'] = df.groupby('month').m_ta.transform('std')

由于要在一个图表中查看的数据太多，您可能希望将每个月的数据视为一个单独的数据框。我已经使用字典理解来做到这一点。

dfs = {m: df.loc[df.month == m, :] for m in df.month.unique()}

现在您可以单独查看每个月的结果。例如，这里是一月。

n = 1
dfs[n].m_ta.plot(title='Month {0}'.format(n));
dfs[n].monthly_mean.plot();
(dfs[n].monthly_mean + dfs[n].monthly_std).plot();
(dfs[n].monthly_mean - dfs[n].monthly_std).plot()

【讨论】：

@Alexander....非常感谢。在这里，您是否为所有 m_ta、m_tax 和 m_tan 绘制了图？
不，这只是m_ta。看来m_tax 和m_tan 可能分别是给定月份的最大值和最小值。 m_ta：月平均气温，m_tax：当月日最高平均气温，m_tan：当月日最低平均气温