用python抓取网页如何获取文本答案

【问题标题】：Web scraping with python how to get to the text用python抓取网页如何获取文本
【发布时间】：2019-11-29 01:07:12
【问题描述】：

我正在尝试从网站获取文本，但找不到解决方法。需要怎么写？

link="https://www.ynet.co.il/articles/0,7340,L-5553905,00.html"
response = requests.get(link)

soup = BeautifulSoup(response.text,'html.parser')
info = soup.find('div', attrs={'class':'text14'})
name = info.text.strip()
print(name)

这是它的外观：

我每次都没有得到任何东西

【问题讨论】：

您的屏幕截图显示了 DOM，而 beautifulsoup 在源上运行。它们可以不同。
你试过这个response = requests.get(link).text吗？
@Amir 它给出了相同的结果

标签： python python-3.x web-scraping python-requests

【解决方案1】：

import requests
from bs4 import BeautifulSoup
import json
link="https://www.ynet.co.il/articles/0,7340,L-5553905,00.html" 
response = requests.get(link)
soup = BeautifulSoup(response.text,'html.parser') 
info = soup.findAll('script',attrs={'type':"application/ld+json"})[0].text.strip()
jsonDict = json.loads(info)
print(jsonDict['articleBody'])

页面似乎在<script>标签中以json格式存储了所有文章数据，所以试试这个代码。

【讨论】：

在这种情况下呢：ynetnews.com/articles/0,7340,L-5554655,00.html ？知道如何获取文本吗？它在我的方式和你的方式都不好用

【解决方案2】：

解决办法是：

info = soup.find('meta', attrs={'property':'og:description'})

它给了我我需要的文字

【讨论】：