【问题标题】:How to pass the value from a list in a url string如何在 url 字符串中传递列表中的值
【发布时间】:2018-09-17 11:59:58
【问题描述】:

我对这个 Python 脚本有疑问。我正在尝试从包含主字符串的列表中传递值。我附上了剧本。在这个命令page = requests.get("https://www.google.dz/search?q=lista[url]") 中,我必须在search?q= 之后将我要查找的内容放在google 上。我想搜索多个关键字,所以我做了一个列表。我不知道如何在该命令中传递列表中的值...

import requests
import re
from bs4 import BeautifulSoup

lista = []
lista.append("Samsung S9")
lista.append("Samsung S8")
lista.append("Samsung Note 9")

list_scrape = []

for url in lista:
    page = requests.get("https://www.google.dz/search?q=lista[url]")
    soup = BeautifulSoup(page.content)
    links = soup.findAll("a")
    for link in  soup.find_all("a",href=re.compile("(?<=/url\?q=) 
    (htt.*://.*)")):
        list_scrape.append(re.split(":(?=http)",link["href"].replace("/url?q=","")))

print(list_scrape)

谢谢!

【问题讨论】:

标签: python url beautifulsoup


【解决方案1】:

使用format

for url in lista:
    page = requests.get("https://www.google.dz/search?q={}".format(url))

或者

page = requests.get("https://www.google.dz/search?q=%s" % url)

【讨论】:

  • 假设requests url 是一些开发/生产框而不是谷歌,如果您在url 中进行盲目追加,这样做可能会导致灾难,您需要先正确地encode
【解决方案2】:

试试这个..

for url in lista:
    page = requests.get("https://www.google.dz/search?q="+url)

page = requests.get("https://www.google.dz/search?q={}".format(url))

【讨论】:

    【解决方案3】:

    您可以使用f-string 来代替我认为更pythonic 的方式来执行string formatting

    requests.get(f"https://www.google.dz/search?q={url}")
    # or
    for query in queries:
       html = requests.get(f"https://www.google.dz/search?q={query}")
    

    请注意,由于未指定 user-agent,因此可能会出现下一个问题,因此 Google 阻止了您的请求。

    因为默认的requests user-agentpython-requests。谷歌理解它并阻止请求,因为它不是“真正的”用户访问。 Checks what's your user-agent.


    代码:

    from bs4 import BeautifulSoup
    import requests, lxml
    
    headers = {
        "User-agent":
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
    }
    
    queries = ["Samsung S9", "Samsung S8", "Samsung Note 9"]
    
    for query in queries:
      params = {
        "q": query,
        "gl": "uk",
        "hl": "en"
      }
    
      html = requests.get("https://www.google.com/search", headers=headers, params=params)
      soup = BeautifulSoup(html.text, "lxml")
      
      for result in soup.select('.tF2Cxc'):
        title = result.select_one('.DKV0Md').text
        link = result.select_one('.yuRUbf a')['href']
    
        print(f"{title}\n{link}\n")
    
    -------
    '''
    Samsung Galaxy S9 and S9+ | Buy or See Specs
    https://www.samsung.com/uk/smartphones/galaxy-s9/
    
    Samsung Galaxy S9 - Full phone specifications - GSMArena ...
    https://www.gsmarena.com/samsung_galaxy_s9-8966.php
    ...
    Samsung Galaxy S8 - Wikipedia
    https://en.wikipedia.org/wiki/Samsung_Galaxy_S8
    
    Samsung Galaxy S8 Price in India - Gadgets 360
    https://gadgets.ndtv.com/samsung-galaxy-s8-4009
    ...
    Samsung Galaxy Note 9 Cases - Mobile Fun
    https://www.mobilefun.co.uk/samsung/galaxy-note-9/cases
    
    Samsung Galaxy Note 9 - Wikipedia
    https://en.wikipedia.org/wiki/Samsung_Galaxy_Note_9
    '''
    

    或者,您可以使用来自 SerpApi 的 Google Organic Results API 来实现相同的目的。这是一个带有免费计划的付费 API。

    您的情况的不同之处在于,您无需考虑如何提取某些内容或找出某些内容无法正常工作的原因。真正需要做的就是迭代结构化 JSON 并快速获取您想要的数据,而不会让人头疼。

    要集成的代码:

    import os
    from serpapi import GoogleSearch
    
    queries = ["Samsung S9", "Samsung S8", "Samsung Note 9"]
    
    for query in queries:
      params = {
          "engine": "google",
          "q": query,
          "hl": "en",
          "gl": "uk",
          "api_key": os.getenv("API_KEY"),
      }
    
      search = GoogleSearch(params)
      results = search.get_dict()
    
      for result in results["organic_results"]:
        print(result['title'])
        print(result['link'])
        print()
    
    ------
    '''
    Samsung Galaxy S9 and S9+ | Buy or See Specs
    https://www.samsung.com/uk/smartphones/galaxy-s9/
    
    Samsung Galaxy S9 - Full phone specifications - GSMArena ...
    https://www.gsmarena.com/samsung_galaxy_s9-8966.php
    ...
    Samsung Galaxy S8 - Wikipedia
    https://en.wikipedia.org/wiki/Samsung_Galaxy_S8
    
    Samsung Galaxy S8 Price in India - Gadgets 360
    https://gadgets.ndtv.com/samsung-galaxy-s8-4009
    ...
    Samsung Galaxy Note 9 Cases - Mobile Fun
    https://www.mobilefun.co.uk/samsung/galaxy-note-9/cases
    
    Samsung Galaxy Note 9 - Wikipedia
    https://en.wikipedia.org/wiki/Samsung_Galaxy_Note_9
    '''
    

    免责声明,我为 SerpApi 工作。

    【讨论】:

      猜你喜欢
      • 2019-07-12
      • 1970-01-01
      • 2020-09-10
      • 2019-11-18
      • 2020-11-29
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多