【问题标题】:Grabbing instagram feed using Python使用 Python 抓取 Instagram 提要
【发布时间】:2017-03-14 11:40:48
【问题描述】:

我正在尝试使用 Python 获取特定用户的所有 Instagram 帖子。在我的代码下面:

import requests
from bs4 import BeautifulSoup


def get_images(user):
    url = "https://www.instagram.com/" + str(user)
    source_code = requests.get(url)
    plain_text = source_code.text
    soup = BeautifulSoup(plain_text)
    for image in soup.findAll('img'):
        href = image.get('src')
        print(href)

get_images('instagramuser')

但是,我收到了错误:

UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 14 of the file C:/Users/Bedri/PycharmProjects/untitled1/main.py. To get rid of this warning, change code that looks like this:

BeautifulSoup([your markup])

to this: BeautifulSoup([your markup], "html.parser") markup_type=markup_type))

所以我的问题是,我做错了什么?

【问题讨论】:

    标签: python beautifulsoup web-crawler


    【解决方案1】:

    您应该将解析器传递给BeautifulSoup,这不是错误,只是警告。

    soup = BeautifulSoup(plain_text, "html.parser")
    

    【讨论】:

      【解决方案2】:
      soup = BeautifulSoup(plain_text,'lxml')
      

      我建议使用 > lxml html.parser

      而不是 requests.get 使用 urlopen

      这里的代码全部在一行

      来自 urllib 导入请求 从 bs4 导入 BeautifulSoup

      def get_images(user):
      
          soup = BeautifulSoup(request.urlopen("https://www.instagram.com/"+str(user)),'lxml')
          for image in soup.findAll('img'):
              href = image.get('src')
              print(href)
      get_images('user')
      

      【讨论】:

        猜你喜欢
        • 2012-03-04
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2020-02-16
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多