使用 R 中的匹配矩阵数据对提取数据答案

【问题标题】：Extract data using a matching matrix pair of data in R使用 R 中的匹配矩阵数据对提取数据
【发布时间】：2012-10-30 21:53:16
【问题描述】：

我有两个包含纬度、经度和温度数据的数据集。一个数据集对应于一个感兴趣的地理区域，对应的纬度/经度对构成该区域的边界和内容（矩阵维度 = 4518x2）

另一个数据集包含包含感兴趣区域的较大区域的纬度/经度和温度数据（矩阵维度 = 10875x3）。

我的问题是：如何从与第一个数据集的纬度/经度数据匹配的第二个数据集中提取适当的行数据（纬度、经度、温度）？

我尝试了各种“for loops”、“subset”和“unique”命令，但无法获得匹配的温度数据。

提前致谢！

10/31 编辑：我忘了提到我正在使用“R”来处理这些数据。

感兴趣区域的纬度/经度数据以 4,518 个文件的列表形式提供，每个文件的名称中包含纬度/经度坐标：

x<- dir()

lenx<- length(x)

g <- strsplit(x, "_")

coord1 <- matrix(NA,nrow=lenx, ncol=1)  
coord2 <- matrix(NA,nrow=lenx, ncol=1)

for(i in 1:lenx) {  
coord1[i,1] <- unlist(g)[2+3*(i-1)]  
coord2[i,1] <- unlist(g)[3+3*(i-1)]     
} 

coord1<-as.numeric(coord1)  
coord2<-as.numeric(coord2)

coord<- cbind(coord1, coord2)

纬度/经度和温度数据是从 NCDF 文件中获得的，其中包含 10,875 个纬度/经度对的温度数据：

long<- tempcd$var[["Temp"]]$size[1]   
lat<- tempcd$var[["Temp"]]$size[2]   
time<- tempcd$var[["Temp"]]$size[3]  
proj<- tempcd$var[["Temp"]]$size[4]  

temp<- matrix(NA, nrow=lat*long, ncol = time)  
lat_c<- matrix(NA, nrow=lat*long, ncol=1)  
long_c<- matrix(NA, nrow=lat*long, ncol =1)  

counter<- 1  

for(i in 1:lat){  
    for(j in 1:long){  
        temp[counter,]<-get.var.ncdf(precipcd, varid= "Prcp", count = c(1,1,time,1), start=c(j,i,1,1))  
        counter<- counter+1  
    }  
}  

temp_gcm <- cbind(lat_c, long_c, temp)`

所以现在的问题是如何从“temp_gcm”中删除与“coord”中的纬度/经度数据对相对应的值？

【问题讨论】：

一个非常有趣的问题。感兴趣区域的纬度/经度集是简单地限制了该区域，还是该区域有温度数据的所有纬度/经度对的集合？
我们在这里使用什么语言？我们能否获得一个简短的代码示例来看看您的数据结构是什么样的？
@Nathaniel Ford：纬度/经度数据集对应于区域边界和感兴趣区域内每个网格的质心。
@slashingweapon 哦，对了，我正在使用“R”语言来处理这些数据。我将很快提供数据结构的示例！
你能把边界点和网格质心点分开吗？您可以使用边界点创建多边形并使用“多边形中的点”函数之一（例如包 sp）来选择位于该区域内的点。

标签： r extract match subset

【解决方案1】：

不，

我可以想到很多方法可以做到这一点。最简单但不是最有效的方法是使用 R 的 which() 函数，该函数接受一个逻辑参数，同时遍历要应用匹配的数据帧。当然，这是假设在更大的数据集中最多个匹配。根据您的数据集，我会这样做：

attach(temp_gcm)    # adds the temp_gcm column names to the global namespace
attach(coord)    # adds the coord column names to the global namespace

matched.temp = vector(length = nrow(coord)) # To store matching results
for (i in seq(coord)) {

   matched.temp[i] = temp[which(lat_c == coord1[i] & long_c == coord2[i])]
}

# Now add the results column to the coord data frame (indexes match)
coord$temperature = matched.temp

函数which(lat_c == coord1[i] & long_c == coord2[i])返回数据帧temp_gcm中所有行的向量，满足lat_c和long_c分别从迭代中的行i匹配coord1和coord2（注意：我'我假设这个向量的长度只有 1，即只有 1 个可能的匹配项）。然后，matched.temp[i] 将从满足逻辑条件的数据帧 temp_gcm 中的列 temp 中分配值。请注意，这样做的目的是我们创建一个向量，它的匹配值通过索引对应于数据帧coord 的行。

我希望这会有所帮助。请注意，这是一种基本方法，我建议您查找函数 merge() 以及 apply() 以更简洁地执行此操作。

【讨论】：

很高兴我能帮上忙，Noe。只是关于 R 的一点——仅在 R 基础包中就有吨用于排序和清理数据的函数。大多数来自传统面向对象编程背景（如 Java 或 C++）的用户经常求助于使用循环或其他“技巧”。由于 S 语言（其中 R 是一种方言）主要是为统计计算而构建的，因此其中许多功能已经存在。总有一些功能可以轻松压缩代码。我们这些使用 R 多年的人几乎每天都会发现新的功能和包！
感谢@R_User！这个方法似乎有效！今天下午我设计了另一种方法，我添加了一个额外的“0”列和一个带有 if 语句的 for 循环，用于测试每个纬度/经度对。如果该语句为真，则 0 将更改为 1。然后我仅提取最后一列 = 1 的行。稍后我将发布代码。

【解决方案2】：

我添加了一个全零的附加列，用作 IF 语句的结果。 “x”是 temp_gcm 中的行数。 “y”是列数（代表时间步长）。 “temp_s”是标准化的温度数据

indicator<- matrix(0, nrow = x, ncol = 1)

precip_s<- cbind(precip_s, indicator)

temp_s<- cbind(temp_s, indicator)

for(aa in 1:x){

    current_lat<-latitudes[aa,1] #Latitudes corresponding to larger area

    current_long<- longitudes[aa,1] #Longitudes corresponding to larger area

    for(ab in 1:lenx){ #Lenx coresponds to nrow(coord)

        if(current_lat == coord[ab,1] & current_long == coord[ab,2]) {
            precip_s[aa,(y/12+1)]<-1 #y/12+1 corresponds to "indicator column"
            temp_s[aa,(y/12+1)]<-1
        } 
    }
}


precip_s<- precip_s[precip_s[,(y/12+1)]>0,] #Removes rows with "0"s remaining in "indcator" column

temp_s<- temp_s[temp_s[,(y/12+1)]>0,]


precip_s<- precip_s[,-(y/12+1)] #Removes "indicator column

temp_s<- temp_s[,-(y/12+1)]

【讨论】：