在 Pandas Dataframe 中按纬度/经度值分配城市名称答案

【问题标题】：Assigning City Name by Latitude/Longitude values in Pandas Dataframe在 Pandas Dataframe 中按纬度/经度值分配城市名称
【发布时间】：2019-01-09 17:47:15
【问题描述】：

我有这个数据框：

    userId      latitude    longitude        dateTime
0   121165      30.314368   76.384381   2018-02-01 00:01:57
1   95592       13.186810   77.643769   2018-02-01 00:02:17
2   111435      28.512889   77.088154   2018-02-01 00:04:02
3   129532      9.828420    76.310357   2018-02-01 00:06:03
4   95592       13.121986   77.610539   2018-02-01 00:08:54

我想创建一个新的数据框列，例如：

     userId  latitude   longitude    dateTime              city
0   121165  30.314368   76.384381   2018-02-01   00:01:57  Bengaluru
1   95592   13.186810   77.643769   2018-02-01   00:02:17  Delhi
2   111435  28.512889   77.088154   2018-02-01   00:04:02  Mumbai
3   129532  9.828420    76.310357   2018-02-01   00:06:03  Chennai
4   95592   13.121986   77.610539   2018-02-01   00:08:54  Delhi

我看到了这个code here，但它不起作用。

这是那里给出的代码：

from urllib2 import urlopen
import json
def getplace(lat, lon):
    url = "http://maps.googleapis.com/maps/api/geocode/json?"
    url += "latlng=%s,%s&sensor=false" % (lat, lon)
    v = urlopen(url).read()
    j = json.loads(v)
    components = j['results'][0]['address_components']
    country = town = None
    for c in components:
        if "country" in c['types']:
            country = c['long_name']
        if "postal_town" in c['types']:
            town = c['long_name']
    return town, country
for i,j in df['latitude'], df['longitude']:
    getplace(i, j)

我在这个地方遇到错误：

components = j['results'][0]['address_components']

列表索引超出范围

我输入了一些英国的其他纬度经度值，它成功了，但不适用于印度各州。

所以现在我想尝试这样的事情：

if i,j in zip(range(79,80),range(83,84)):
    df['City']='Bengaluru'
elif i,j in zip(range(13,14),range(70,71)):
    df['City']='Delhi'

等等。那么如何使用纬度和经度值以更可行的方式分配城市？

【问题讨论】：

标签： python pandas google-maps numpy dataframe

【解决方案1】：

您使用的代码 sn-p 是 2013 年的； Google API 已更改，'postal_town' 不再可用。

您可以使用以下代码，该代码利用requests 库并在没有返回结果的情况下放置一个保护。

In [48]: def location(lat, long):
    ...:     url = 'http://maps.googleapis.com/maps/api/geocode/json?latlng={0},{1}&sensor=false'.format(lat, long)
    ...:     r = requests.get(url)
    ...:     r_json = r.json()
    ...:     if len(r_json['results']) < 1: return None, None
    ...:     res = r_json['results'][0]['address_components']
    ...:     country  = next((c['long_name'] for c in res if 'country' in c['types']), None)
    ...:     locality = next((c['long_name'] for c in res if 'locality' in c['types']), None)
    ...:     return locality, country
    ...:

In [49]: location(28.512889, 77.088154)
Out[49]: ('Gurugram', 'India')

这个函数搜索'locality'，实际上并没有为DataFrame 的第二行返回任何内容。您可以通过检查结果来选择您想要的字段（这是lat，long 的值为30.314368, 76.384381）

[{'long_name': 'Udyog Vihar',
  'short_name': 'Udyog Vihar',
  'types': ['political', 'sublocality', 'sublocality_level_2']},
 {'long_name': 'Kapas Hera Estate',
  'short_name': 'Kapas Hera Estate',
  'types': ['political', 'sublocality', 'sublocality_level_1']},
 {'long_name': 'Gurugram',
  'short_name': 'Gurugram',
  'types': ['locality', 'political']},
 {'long_name': 'Gurgaon',
  'short_name': 'Gurgaon',
  'types': ['administrative_area_level_2', 'political']},
 {'long_name': 'Haryana',
  'short_name': 'HR',
  'types': ['administrative_area_level_1', 'political']},
 {'long_name': 'India', 'short_name': 'IN', 'types': ['country', 'political']},
 {'long_name': '122016', 'short_name': '122016', 'types': ['postal_code']}]

要将其应用于您的DataFrame，您可以像这样使用numpy 的vectorize（请记住，第二行不会返回任何内容）

In [71]: import numpy as np

In [72]: df['locality'] = np.vectorize(location)(df['latitude'], df['longitude'])

In [73]: df
Out[73]:
   userId   latitude  longitude             dateTime   locality
0  121165  30.314368  76.384381  2018-02-01 00:01:57    Patiala
1   95592  13.186810  77.643769  2018-02-01 00:02:17       None
2  111435  28.512889  77.088154  2018-02-01 00:04:02   Gurugram
3  129532   9.828420  76.310357  2018-02-01 00:06:03  Ezhupunna
4   95592  13.121986  77.610539  2018-02-01 00:08:54  Bengaluru

附：我注意到所需输出的城市位置不正确。

P.P.S.您还应该注意，这可能需要一些时间，因为该函数每次都需要查询 API

您也可以创建范围更广的定位函数，但它会非常粗糙，并且您可能覆盖的区域太广。然后，您可以按照前面显示的相同方式使用该函数

In [21]: def location(lat, long):
    ...:     if 9 <= lat < 10 and 76 <= long < 77:
    ...:         return 'Chennai'
    ...:     elif 13 <= lat < 14 and 77 <= long < 78:
    ...:         return 'Dehli'
    ...:     elif 28 <= lat < 29 and 77 <= long < 78:
    ...:         return 'Mumbai'
    ...:     elif 30 <= lat < 31 and 76 <= long < 77:
    ...:         return 'Bengaluru'
    ...:     

In [22]: df['city'] = np.vectorize(location)(df['latitude'], df['longitude'])

In [23]: df
Out[23]: 
   userId   latitude  longitude             dateTime       city
0  121165  30.314368  76.384381  2018-02-01 00:01:57  Bengaluru
1   95592  13.186810  77.643769  2018-02-01 00:02:17      Dehli
2  111435  28.512889  77.088154  2018-02-01 00:04:02     Mumbai
3  129532   9.828420  76.310357  2018-02-01 00:06:03    Chennai
4   95592  13.121986  77.610539  2018-02-01 00:08:54      Dehli

【讨论】：

你的数据框有多大？
大约 220 万
有没有其他选项，比如我想要 5-6 个主要城市，所以我想将 lat 和 long 值放在一个 zip 范围内，并将名称 rest 填充为 null？
我已经更新了答案。如果对您有帮助，请采纳并点赞。谢谢
yaa 第二部分解决了 if，else 命令，但我不知道为什么从 Google API 获取不起作用，尽管当我通过调用它的位置函数来打印它时锻炼非常快