在 R 中使用 NetCDF 嵌套 for 循环答案

【问题标题】：Nested for loops with NetCDFs in R在 R 中使用 NetCDF 嵌套 for 循环
【发布时间】：2018-02-19 23:03:59
【问题描述】：

我非常感谢所有想帮助我解决问题的人我真的被困住了。但提前：这是一个复杂的话题，我尽力解释我打算用我的代码做什么。它是关于 NetCDF 文件中的气候数据，其中包含 1971 年至 2000 年和 2071 年至 2100 年期间的月度温度 (tas) 和降水量 (pr) 数据。历史时期的 nc 文件包含大约。 440x400 网格点（欧洲地图）。未来时期的 nc 文件包含 1x1 网格点（用于感兴趣的城市）。每个网格点有 360 个温度或降水值（取决于模型），30 年期间的每个月都有一个值。换句话说：每个网格点有360个点的分布。现在，我想迭代计算单个城市网格点（2071-2100）的分布与每个欧洲（1971-2000）网格点分布之间的统计差异。我将获得每个欧洲网格点的平均绝对距离。这个想法是在欧洲网格栅格中找到其温度或降水分布与未来感兴趣的城市分布最相似的网格点。我必须对 30 种不同的气候模型进行计算。

# List filenames of the directory

hist.files <- list.files("/historical", full.names = TRUE)
rcp.files <- list.files("/rcp", full.names = TRUE)

#Create array for desired ‘similarity indices’. One matrix per climate model run.

sim.array <- array(NA, dim = c(440,400,30))

#Looping through the models of the period 1971-2000. Some containing precipitation data others temperature (see if…else) 

for(k in 1:length(hist.files))   {
        hist.data <- nc_open(hist.files[k])   

   if(grepl("pr", hist.data$filename)){
    hist.tas <- ncvar_get(hist.data, "pr")
        }else{
    hist.tas <- ncvar_get(hist.data, "tas") 
    hist.tas <- kelvin.to.celsius(hist.tas, round=2)
   }

#Looping through the models of the 2071 to 2100 period (city). Some containing precipitation data others temperature (see if…else)

for(r in 1:length(rcp.files)) {
    rcp.data <- nc_open(rcp.files[r])
    if(grepl("pr", rcp.data$filename)){
    rcp.tas <- ncvar_get(rcp.data, "pr") 
        }else{
    rcp.tas <- ncvar_get(rcp.data, "tas")
    rcp.tas <- kelvin.to.celsius(rcp.tas, round=2)
        }

#This if statement because hist contains more models than rcp and I want to exclusively use the models contained in both of them.  

if(hist.data %in% rcp.data) {  

#Looping through the grid points of ‘hist’ model k. Lastly the function that calculates for each grid point of the model a difference value (always to the one grid point of ‘rcp’). My idea of the break statement was to loop nrow and ncol the same times, but I’m not sure if break does what I intended to.       

for(i in 1:nrow(hist.tas)) { 
       for(j in 1:ncol(hist.tas)) {
    sim.array[i,j,k] <- abs(sum(rcp.tas - hist.tas[i,j,])/360)
break
    }
  print(sim.array[i,j,k])
  }
 }
}   
}
sim.array[1,1,1]

嗯，我得到了一个充满 NA 的数组。没有错误消息，但是出了点问题！谁能找到错误？我很感激任何帮助。非常感谢您！

更新：您的建议似乎是一个合理的解决方案！直到现在我还没有时间申请它们，但我会稍后再做！我一直在考虑向量化，但没有设法从 3 维数组中制作向量，而最终没有一个充满不同向量的凌乱代码......我都不知道如何删除与 hist 和 rcp 不匹配的模型。使用 intersect 和 %in% 我知道不匹配文件的索引...但是必须有比手动记录所有这些索引以进行删除更好的方法，不是吗？请查看一些历史文件名：

> hist.files.tas <- list.files("/historical", full.names = TRUE, pattern = "tas")
> hist.files.tas
 [1] "/historical/tas_CNRM-CERFACS-CNRM-CM5_CLMcom-CCLM4-8-17_r1i1p1.nc"   
 [2] "/historical/tas_CNRM-CERFACS-CNRM-CM5_CNRM-ALADIN53_r1i1p1.nc"       
 [3] "/historical/tas_CNRM-CERFACS-CNRM-CM5_RMIB-UGent-ALARO-0_r1i1p1.nc"  
 [4] "/historical/tas_CNRM-CERFACS-CNRM-CM5_SMHI-RCA4_r1i1p1.nc"           
 [5] "/historical/tas_ICHEC-EC-EARTH_CLMcom-CCLM4-8-17_r12i1p1.nc"         
 [6] "/historical/tas_ICHEC-EC-EARTH_DMI-HIRHAM5_r3i1p1.nc"                
 [7] "/historical/tas_ICHEC-EC-EARTH_KNMI-RACMO22E_r12i1p1.nc"             
 [8] "/historical/tas_ICHEC-EC-EARTH_KNMI-RACMO22E_r1i1p1.nc"              
 [9] "/historical/tas_ICHEC-EC-EARTH_SMHI-RCA4_r12i1p1.nc"                 
[10] "/historical/tas_IPSL-IPSL-CM5A-MR_INERIS-WRF331F_r1i1p1.nc"          
[11] "/historical/tas_IPSL-IPSL-CM5A-MR_SMHI-RCA4_r1i1p1.nc"               
[12] "/historical/tas_MOHC-HadGEM2-ES_CLMcom-CCLM4-8-17_r1i1p1.nc"         
[13] "/historical/tas_MOHC-HadGEM2-ES_KNMI-RACMO22E_r1i1p1.nc"             
[14] "/historical/tas_MOHC-HadGEM2-ES_SMHI-RCA4_r1i1p1.nc"

有更多带有变量 tasmax 和 tasmin 的模型。 hist 总共有 71 个文件，而 rcp 只有 30 个。你能给我举个例子，说明如何编写自动代码来删除不匹配的 hist 文件吗？十分感谢！

【问题讨论】：

您查看过 1 个文件（summary、plot）吗？是否有任何 NA（-32767 或 -9999）？ sum(..., na.rm = TRUE) 有帮助吗？
是的，不幸的是，NAs 不是导致错误的原因...谢谢！

标签： r for-loop nested-loops netcdf4

【解决方案1】：

在我看来，以下内容毫无意义，并且总是 FALSE：

if (hist.data %in% rcp.data)

所以sim_array 没有任何反应

我会先做这样的事情：

hist.files.pr <- list.files("/historical", full.names = TRUE, pattern="pr")
hist.files.tas <- list.files("/historical", full.names = TRUE, pattern="tas")
rcp.files.pr <- list.files("/rcp", full.names = TRUE, pattern="pr")
rcp.files.tas <- list.files("/rcp", full.names = TRUE, pattern="tas")

此时您可以从“hist”中删除不在“rcp”中的模型的文件

hist.files.tas <- c( "/historical/tas_CNRM-CERFACS-CNRM-CM5_CLMcom-CCLM4-8-17_r1i1p1.nc", "/historical/tas_CNRM-CERFACS-CNRM-CM5_CNRM-ALADIN53_r1i1p1.nc", "/historical/tas_CNRM-CERFACS-CNRM-CM5_RMIB-UGent-ALARO-0_r1i1p1.nc", "/historical/tas_CNRM-CERFACS-CNRM-CM5_SMHI-RCA4_r1i1p1.nc", "/historical/tas_ICHEC-EC-EARTH_CLMcom-CCLM4-8-17_r12i1p1.nc", "/historical/tas_ICHEC-EC-EARTH_DMI-HIRHAM5_r3i1p1.nc", "/historical/tas_ICHEC-EC-EARTH_KNMI-RACMO22E_r12i1p1.nc", "/historical/tas_ICHEC-EC-EARTH_KNMI-RACMO22E_r1i1p1.nc", "/historical/tas_ICHEC-EC-EARTH_SMHI-RCA4_r12i1p1.nc", "/historical/tas_IPSL-IPSL-CM5A-MR_INERIS-WRF331F_r1i1p1.nc", "/historical/tas_IPSL-IPSL-CM5A-MR_SMHI-RCA4_r1i1p1.nc", "/historical/tas_MOHC-HadGEM2-ES_CLMcom-CCLM4-8-17_r1i1p1.nc", "/historical/tas_MOHC-HadGEM2-ES_KNMI-RACMO22E_r1i1p1.nc", "/historical/tas_MOHC-HadGEM2-ES_SMHI-RCA4_r1i1p1.nc")

# in this example, fut files is a subset of hist files; that should be OK if their filename structure is the same

rcp.files.tas <- hist.files.tas[1:7]

getModels <- function(ff) {
    base <- basename(ff)
    s <- strsplit(base, "_")
    sapply(s, function(i) i[[2]])
}

getHistModels <- function(hist, fut) {
    h <- getModels(hist)
    uh <- unique(h)
    uf <- unique(getModels(fut))
    uhf <- uh[uh %in% uf]
    hist[h %in% uhf]
}


hist.files.tas.selected <- getHistModels(hist.files.tas, rcp.files.tas)
# hist.files.pr.selected <- getHistModels(hist.files.pr, rcp.files.pr)

可以通过执行以下操作来避免双循环 (k, r)：

library(raster)
his.pr <- values(stack(hist.files.pr.selected, var="pr")))
his.tas <- values(stack(hist.files.tas.selected, var="tas"))
rcp.pr <- values(stack(hist.files.pr, var="pr"))
rcp.tas <- values(stack(hist.files.tas, var="tas"))

并且可能也可以避免行和列上的 (i, j) 循环。 R是矢量化的。也就是说，您可以执行(1:10) - 2 之类的操作。

无论哪种方式，使用所有这些嵌套循环都很难阅读您的代码。如果您确实需要它们，最好调用函数。如需更多帮助，请提供一些示例数据而不是我们没有的文件，或提供一些文件。

【讨论】：

您好罗伯特，感谢您的回答！请参阅我上面关于文件名的更新。
我已经使用这些文件名扩展了我的答案

【解决方案2】：

因为在我的数据集中，除了“tas”和“pr”之外，实际上还有两个变量“tasmax”和“tasmin”，Robert 的方法对我的案例来说已经写了很多。因此，我尝试了另一种方法，最终成功了，尽管它没有单独列出每个变量的文件（一个缺点，是的！）。

history 和 rcp 的列表和匹配文件：

为了匹配文件，我需要没有目录的文件的纯名称，否则 which(!hist %in% rcp) 将始终为 FALSE（如 Robert 所示）。

hist

no.match.h

由于我需要 nc_open 包含目录的文件名，我必须创建一个相应的文件列表并减去不匹配的文件

hist.files

hist.files.cl

rcp.files.cl

【讨论】：