【问题标题】:Extracting country names in R在R中提取国家名称
【发布时间】:2020-04-13 12:02:04
【问题描述】:

我有一个格式错误的位置列表。我需要为每个条目提取城市和国家的名称。我不确定如何进行。

列表如下:

c("Groningen", "Netherlands, Groningen", "Netherlands", "Jerusalem, Israel",
 "Nesher, Israel" "Western, United States", "U.S.", "United States",
 "Sacramento, California, USA")

谢谢, 传统知识

【问题讨论】:

  • 到目前为止你尝试过什么?如果国家名称不存在,如何提取?您有 100、1000 或 1000 万个观测值吗?

标签: r data-manipulation


【解决方案1】:

理想情况下,您必须尝试找出是否有一些包可以让您在谷歌地图上搜索。

如果没有,我将从拆分数据开始,将国家名称与国家代码包匹配,然后从那里移动。

library("countrycode")
library("data.table")

d <- data.table(raw = c("Groningen", "Netherlands, Groningen", "Netherlands", "Jerusalem, Israel",
  "Nesher, Israel", "Western, United States", "U.S.", "United States","Sacramento, California, USA"))

d <- cbind(
  d,
  d[, tstrsplit(raw, ",", fixed=TRUE) ]
)

d[, country := countrycode( V1, "country.name", "country.name")]
d[!is.na(country), city := V2]
d[is.na(country), city := V1]
d[is.na(country), country := countrycode( V2, "country.name", "country.name")]

                           raw            V1             V2   V3       country       city
1:                   Groningen     Groningen           <NA> <NA>          <NA>  Groningen
2:      Netherlands, Groningen   Netherlands      Groningen <NA>   Netherlands  Groningen
3:                 Netherlands   Netherlands           <NA> <NA>   Netherlands       <NA>
4:           Jerusalem, Israel     Jerusalem         Israel <NA>        Israel  Jerusalem
5:              Nesher, Israel        Nesher         Israel <NA>        Israel     Nesher
6:      Western, United States       Western  United States <NA> United States    Western
7:                        U.S.          U.S.           <NA> <NA> United States       <NA>
8:               United States United States           <NA> <NA> United States       <NA>
9: Sacramento, California, USA    Sacramento     California  USA          <NA> Sacramento

【讨论】:

  • "是否有一些包可以让你在谷歌地图上搜索。" - library(googleway) 这样做。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2016-06-11
  • 2018-06-08
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2011-07-16
  • 1970-01-01
相关资源
最近更新 更多