【发布时间】:2013-05-20 12:47:50
【问题描述】:
我对 R 非常陌生,并在论坛上搜索过这个问题,但找不到足够接近的解决方案。我正在尝试在 IP 地址和相应地理位置之间进行映射。我有 2 个数据集。
Set-a (1,60,000 rows):
ip(int) | ID(int)
Set-b (16,00,000 rows):
Ip1(int) | Ip2(int) | Code(str) | Country(str) | Area1(str) | Area2(str)
我正在尝试执行以下操作: 如果 ip 位于 Ip1 和 Ip2 之间,则将 Country & Region 添加到 Set-a。
我正在做以下事情(显然不是一个很好的方法):
ip1<-as.numeric(b$Ip1)
ip2<-as.numeric(b$Ip2)
country<-b$Country
area1<-b$Area1
area2<-b$Area2
for(i in 1:160000){
for(j in 1:1674303){
if(a[i]>ip1[j] & a[i]<ip2[j]) {
a$country[i]<-country[j]
a$area1[i]<-area1[j]
a$area2[i]<-area2[j]}
}
}
谁能告诉我一个有效的方法来做到这一点。这需要很多时间。 (对于 i=1 到 100 需要大约 10 分钟才能运行)。
样本数据集-b为:
Ip1, Ip2, Code, Country, Area1, Area2
"0","16777215","-","-","-","-"
"16777216","16777471","AU","AUSTRALIA","QUEENSLAND","SOUTH BRISBANE"
"16777472","16778239","CN","CHINA","FUJIAN","FUZHOU"
"16778240","16778495","AU","AUSTRALIA","VICTORIA","MELBOURNE"
"16778496","16778751","AU","AUSTRALIA","NEW SOUTH WALES","SYDNEY"
它是连续递增的。
dput(head(a)) & dput(head(b)) 分别是: (参考上面的示例数据)
structure(IP_Addr = c("38825563", "38921619", "42470287", "42471923","42473368","42473428"),
Desc_value = c("0", "1.2", "4.97", "1", "5.9", "22.06")), .Names = c("IP_Addr", "Desc_value"), row.names = c(NA, 6L), class = "data.frame")
structure(list(Ip1 = c("0", "16777216", "16777472", "16778240",
"16778496", "16778752"), Ip2 = c("16777215", "16777471", "16778239",
"16778495", "16778751", "16779263"), Code = c("-", "AU", "CN",
"AU", "AU", "AU"), Country = c("-", "AUSTRALIA", "CHINA", "AUSTRALIA",
"AUSTRALIA", "AUSTRALIA"), Area1 = c("-", "QUEENSLAND", "FUJIAN",
"VICTORIA", "NEW SOUTH WALES", "-"), Area2 = c("-", "SOUTH BRISBANE",
"FUZHOU", "MELBOURNE", "SYDNEY", "-")), .Names = c("Ip1", "Ip2",
"Code", "Country", "Area1", "Area2"), row.names = c(NA, 6L), class = "data.frame")
【问题讨论】:
-
也许你可以提供一个小的示例数据集?
-
如果
Set-b中每个国家/地区的Ip1和Ip2范围都不同,这将更容易,那么您能告诉我们是否是这种情况吗?即使不是,我敢打赌,如果您首先对您的集合进行排序,以便 IP 值按顺序排列,它会导致更简单的“筛选”算法。 -
@Frank, Carl :是的,范围不同并且连续递增顺序。样本数据集为 Ip1, Ip2, Code, Country, Area1, Area2 "16777216","16777471","AU","AUSTRALIA","QUEENSLAND","SOUTH BRISBANE" "16777472","16778239","CN" ,"中国","福建","福州""16778240","16778495","澳大利亚","澳大利亚","维多利亚","墨尔本"
-
请提供一个样本数据集我们可以读入。 :) 另外,最好编辑您的问题以添加信息。我忘了提:你可以使用
dput(a)和dput(b)... -
欢迎来到 Stack Overflow!如您所见,我们对您的数据状态感到非常困惑。请创建一个reproducible example。正如弗兰克所说,粘贴
dput(head(a))和dput(head(b))的输出会非常有用。
标签: r nested-loops apply