使用函数从 webscrape 创建数据框答案

【问题标题】：Using a function to create a dataframe from a webscrape使用函数从 webscrape 创建数据框
【发布时间】：2020-12-30 04:24:28
【问题描述】：

我有一个接受经度、纬度和 UNIX 时间格式的函数。并输出带有天气相关列的单行数据框

['time', 'summary', 'icon', 'precipIntensity', 'precipProbability','precipType', 'temperature', 'apparentTemperature', 'dewPoint','humidity', 'pressure', 'windSpeed', 'windBearing', 'cloudCover','uvIndex', 'visibility']

 def get_weather(latitude,longitude,unix):
        url = "https://dark-sky.p.rapidapi.com/"+latitude+','+longitude+','+unix
        headers = {
        'x-rapidapi-key': "xxxxxxxxxxxxxxMYKEYxxxxxxxxxxxxxxx",
        'x-rapidapi-host': "dark-sky.p.rapidapi.com"}
        response = requests.request("GET", url, headers=headers)
        data = response.json()
        weather = data['currently']
        weather = pd.DataFrame(weather, index=[0])

我想遍历我的数据集（10000 行）并创建一个新数据集，其中包含每行的所有相应天气数据。

【问题讨论】：

首先创建包含所有行的列表（使用append()），然后将其转换为DataFrame

标签： python pandas dataframe web-scraping python-requests

【解决方案1】：

据我了解，您有一个包含纬度、经度和 unix 信息的数据集，并且您想遍历该数据集以使用上述函数创建一个新数据框

假设您的位置数据框是

location_df = pd.DataFrame([[10,10,5], [2,3,8], [9,9,10]],
columns=['lat','long','unix'])

要遍历每一行，请使用 df.iterrows() 并使用 append with ignore_index=True 来自动增加索引。在您的情况下，假设该函数返回天气数据帧，然后：

precipitation_df = pd.DataFrame(columns=['precipIntensity', 'precipProbability','temp']) # assume 3 values returned
for index, row in location_df.iterrows():
    latitude = row['lat']
    longitude = row['long']
    unix = row['unix']
    precipitation_df = precipitation_df.append(get_weather(latitude,longitude,unix), ignore_index=True)

很想知道是否有人有更有效的方法。

【讨论】：