计算r中气象站数据的平均值答案

【问题标题】：Calculate mean of weather station data in r计算r中气象站数据的平均值
【发布时间】：2014-10-22 20:23:53
【问题描述】：

我正在处理一个包含大约 800 个气象站的数据集，每个站的月气温值从 1986 年到 2014 年。数据分为三列：(1) 站名，(2) 日期（年和月） ) 和 (3) 温度。一般来说，数据看起来像这样：

STATION    DATE    TEMP
Station 1  198601  -15
Station 1  198602  -16
Station 1  201401  -10
Station 1  201402  -14
Station 2  198601  -11
Station 2  198602  -9
Station 2  201401  -5
Station 2  201402  -4

我需要提取不同年份范围内给定月份每个气象站的平均温度。例如，如果我需要知道 1986-1990 年每个气象站的 7 月平均温度。我的理想输出将是一个新列表或数据框，根据我指定的日期范围给出每个站点的平均温度。

我确信这可以使用 for 循环来完成，但我不太擅长创建此类代码。任何建议将不胜感激。

【问题讨论】：

标签： r

【解决方案1】：

使用 dplyr 代替数据表

weather <- data.frame(station = c("Station 1", "Station 1", "Station 1", "Station 1",
                              "Station 2", "Station 2", "Station 2", "Station 2"),
                  date = c(198601, 198602, 201401, 201402, 198601, 198602, 201401, 201402),
                  temp = c(-15, -16, -10, -14, -11, -9, -5, -4))


library(dplyr)
library(stringr)
# get month and year columns in data
weather <- mutate(weather,
              year = str_extract(date, "\\d{4}"),
              month = str_extract(date, "\\d{2}$"))

# get the mean for each station for each month
mean_station <- group_by(weather, station, month) %>%
  summarise(mean_temp = mean(temp, na.rm = T))

如果您只需要在特定日期范围内执行此操作，您可以在年份添加过滤器

mean_station <- group_by(weather, station, month) %>%
  filter(year >= 1986, year <= 2015) %>%
  summarise(mean_temp = mean(temp, na.rm = T))

【讨论】：

【解决方案2】：

像这样...？

> df$month <- substr(df$DATE, 5, 6)
> result <- aggregate(TEMP~STATION+month, mean, data=df)
> data.frame(Year=unique(substr(df$DATE, 1, 4)), result)
  Year  STATION month  TEMP
1 1986 Station1    01 -12.5
2 2014 Station2    01  -8.0
3 1986 Station1    02 -15.0
4 2014 Station2    02  -6.5

【讨论】：

【解决方案3】：

或许

library(data.table)
setDT(df)[, list(MeanTemp = mean(TEMP)), 
                by = list(STATION, Mon = substr(DATE, 5, 6))]

#      STATION Mon MeanTemp
# 1: Station 1  01    -12.5
# 2: Station 1  02    -15.0
# 3: Station 2  01     -8.0
# 4: Station 2  02     -6.5

【讨论】：

【解决方案4】：

我也在学习 R，可能无法直接回答你的问题，但我想提一下 seas 包有助于分析此类数据

例如

require(seas)
pdf( paste("test",".pdf", sep="") )  
for (i in 1: length(STATION)){
d1 <-mksub(mdata,id=STATION[i]) # making a subset for each station based on name/unique id
dat.ss <- seas.sum(d1)
plot(dat.ss)  
}    
graphics off ()

您必须确保数据集的 str() 是 seas 需要的格式。有了这么大的数据集，我建议循环和函数有助于快速进行数据分析。如果有另一种循环方式，如果您可以分享，将不胜感激

【讨论】：