使用 Python 抓取 Instagram 提要答案

【问题标题】：Grabbing instagram feed using Python使用 Python 抓取 Instagram 提要
【发布时间】：2017-03-14 11:40:48
【问题描述】：

我正在尝试使用 Python 获取特定用户的所有 Instagram 帖子。在我的代码下面：

import requests
from bs4 import BeautifulSoup


def get_images(user):
    url = "https://www.instagram.com/" + str(user)
    source_code = requests.get(url)
    plain_text = source_code.text
    soup = BeautifulSoup(plain_text)
    for image in soup.findAll('img'):
        href = image.get('src')
        print(href)

get_images('instagramuser')

但是，我收到了错误：

UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 14 of the file C:/Users/Bedri/PycharmProjects/untitled1/main.py. To get rid of this warning, change code that looks like this:

BeautifulSoup([your markup])

to this: BeautifulSoup([your markup], "html.parser") markup_type=markup_type))

所以我的问题是，我做错了什么？

【问题讨论】：

标签： python beautifulsoup web-crawler

【解决方案1】：

您应该将解析器传递给BeautifulSoup，这不是错误，只是警告。

soup = BeautifulSoup(plain_text, "html.parser")

【讨论】：

【解决方案2】：

soup = BeautifulSoup(plain_text,'lxml')

我建议使用 > lxml html.parser

而不是 requests.get 使用 urlopen

这里的代码全部在一行

来自 urllib 导入请求从 bs4 导入 BeautifulSoup

def get_images(user):

    soup = BeautifulSoup(request.urlopen("https://www.instagram.com/"+str(user)),'lxml')
    for image in soup.findAll('img'):
        href = image.get('src')
        print(href)
get_images('user')

【讨论】：