【问题标题】:Can't scrape Instagram profile [closed]无法抓取 Instagram 个人资料 [关闭]
【发布时间】:2021-03-23 12:52:32
【问题描述】:
from bs4 import BeautifulSoup
import requests

page = requests.get("https://www.instagram.com/marcelo.codes/")
soup = BeautifulSoup(page.content, "html.parser")

profileName = soup.find('h2', class_="_7UhW9       fKFbl yUEEX   KV-D4              fDxYl     ")

followers = soup.find('span', class_="g47SY")

bio = soup.find('div', class_="-vDIg")

postsAmount = soup.find('span', class_ ="g47SY lOXF2")

print(f"""  
Name: {profileName}
followers: {followers}
bio: {bio}
posts: {postsAmount}
""")

这是我的代码,每次运行结果都是:

python3 er.py 
  
Name: None
followers: None
bio: None
posts: None

为了得到我想要的结果,我应该改变什么?

【问题讨论】:

  • 您应该打印出页面内容以验证您想要的元素确实存在(并且 Instagram 没有阻止您的请求)
  • 您尝试过什么解决问题的方法?你被困在哪里了?

标签: python web-scraping beautifulsoup python-requests instagram


【解决方案1】:

页面将数据存储在页面内的 Javascript 变量中。您可以使用此脚本从中获取日期:

import re
import json
import requests


headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:86.0) Gecko/20100101 Firefox/86.0"
}

url = "https://www.instagram.com/marcelo.codes/"
data = json.loads(
    re.search(
        r"<script type=\"text/javascript\">window\._sharedData = (.*});",
        requests.get(url, headers=headers).text,
    ).group(1)
)

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

print("Bio:")
print(data["entry_data"]["ProfilePage"][0]["graphql"]["user"]["biography"])

print("\nFollowed:")
print(
    data["entry_data"]["ProfilePage"][0]["graphql"]["user"]["edge_followed_by"][
        "count"
    ]
)

print("\nFollowers:")
print(
    data["entry_data"]["ProfilePage"][0]["graphql"]["user"]["edge_follow"][
        "count"
    ]
)

打印:

Bio:
✏ Trying my best and showing my journey into coding.
?? Brazilian.
? Learning Python right now! 
? Taking doubts, and showing my progress.
??  Links.

Followed:
102

Followers:
10

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-04-11
    • 1970-01-01
    • 2015-08-13
    • 1970-01-01
    相关资源
    最近更新 更多