【问题标题】:Webscraping data from a json source, why i get only 1 row?从 json 源 Web 抓取数据,为什么我只得到 1 行?
【发布时间】:2019-07-18 18:22:49
【问题描述】:

我正在尝试从带有 python 的网站、网上商店获取一些信息。

我试过这个:

def proba():

    my_url = requests.get('https://www.telekom.hu/shop/categoryresults/?N=10994&contractType=list_price&instock_products=1&Ns=sku.sortingPrice%7C0%7C%7Cproduct.displayName%7C0&No=0&Nrpp=9&paymentType=FULL')
    data = my_url.json()
    results = []
    products = data['MainContent'][0]['contents'][0]['productList']['products']
    for product in products:
        name = product['productModel']['displayName']
        try:
            priceGross = product['priceInfo']['priceItemSale']['gross']
        except:
            priceGross = product['priceInfo']['priceItemToBase']['gross']
        url = product['productModel']['url']
        results.append([name, priceGross, url])
    df = pd.DataFrame(results, columns = ['Name', 'Price', 'Url'])    
# print(df)  ## print df
    df.to_csv(r'/usr/src/Python-2.7.13/test.csv', sep=',', encoding='utf-8-sig',index = False )

while True:
    mytime=datetime.now().strftime("%H:%M:%S")
    while mytime < "23:59:59":
    print mytime
    proba()
    mytime=datetime.now().strftime("%H:%M:%S")

在这个网上商店有 9 个项目,但我在 csv 文件中只看到 1 行。

【问题讨论】:

    标签: python json python-2.7 csv web-scraping


    【解决方案1】:

    不完全确定您的最终结果是什么。您要更新现有文件吗?一次获取数据并全部写出?下面显示了后者的示例,其中我将每个新数据帧添加到整个数据帧并使用 Return 语句进行函数调用以提供每个新数据帧。

    import requests
    from datetime import datetime
    import pandas as pd
    
    def proba():
        my_url = requests.get('https://www.telekom.hu/shop/categoryresults/?N=10994&contractType=list_price&instock_products=1&Ns=sku.sortingPrice%7C0%7C%7Cproduct.displayName%7C0&No=0&Nrpp=9&paymentType=FULL')
        data = my_url.json()
        results = []
        products = data['MainContent'][0]['contents'][0]['productList']['products']
        for product in products:
            name = product['productModel']['displayName']
            try:
                priceGross = product['priceInfo']['priceItemSale']['gross']
            except:
                priceGross = product['priceInfo']['priceItemToBase']['gross']
            url = product['productModel']['url']
            results.append([name, priceGross, url])
        df = pd.DataFrame(results, columns = ['Name', 'Price', 'Url'])  
        return df
    
    headers = ['Name', 'Price', 'Url']
    df = pd.DataFrame(columns = headers)
    
    while True:
        mytime = datetime.now().strftime("%H:%M:%S")
        while mytime < "23:59:59":
            print(mytime)
            dfCurrent = proba()
            mytime=datetime.now().strftime("%H:%M:%S")
            df = pd.concat([df, dfCurrent])
    
    df.to_csv(r"C:\Users\User\Desktop\test.csv", encoding='utf-8') 
    

    【讨论】:

      猜你喜欢
      • 2017-12-31
      • 1970-01-01
      • 2019-07-18
      • 1970-01-01
      • 2021-06-10
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多