【问题标题】:how to remove unwanted text from retrieving title of a page using python如何使用python从检索页面标题中删除不需要的文本
【发布时间】:2022-01-11 15:31:39
【问题描述】:

大家好,我已经编写了一个 python 程序来检索页面的标题,它可以正常工作,但是对于某些页面,它还会收到一些不需要的文本,如何避免这种情况

这是我的程序

# importing the modules
import requests
from bs4 import BeautifulSoup

# target url
url = 'https://atlasobscura.com'

# making requests instance
reqs = requests.get(url)

# using the BeaitifulSoup module
soup = BeautifulSoup(reqs.text, 'html.parser')

# displaying the title
print("Title of the website is : ")
for title in soup.find_all('title'):
    title_data = title.get_text().lower().strip()
    print(title_data)

这是我的输出

atlas obscura - curious and wondrous travel destinations
aoc-full-screen
aoc-heart-solid
aoc-compass
aoc-flipboard
aoc-globe
aoc-pocket
aoc-share
aoc-cancel
aoc-video
aoc-building
aoc-clock
aoc-clipboard
aoc-help
aoc-arrow-right
aoc-arrow-left
aoc-ticket
aoc-place-entry
aoc-facebook
aoc-instagram
aoc-reddit
aoc-rss
aoc-twitter
aoc-accommodation
aoc-activity-level
aoc-add-a-photo
aoc-add-box
aoc-add-shape
aoc-arrow-forward
aoc-been-here
aoc-chat-bubbles
aoc-close
aoc-expand-more
aoc-expand-less
aoc-forum-flag
aoc-group-size
aoc-heart-outline
aoc-heart-solid
aoc-home
aoc-important
aoc-knife-fork
aoc-library-books
aoc-link
aoc-list-circle-bullets
aoc-list
aoc-location-add
aoc-location
aoc-mail
aoc-map
aoc-menu
aoc-more-horizontal
aoc-my-location
aoc-near-me
aoc-notifications-alert
aoc-notifications-mentions
aoc-notifications-muted
aoc-notifications-tracking
aoc-open-in-new
aoc-pencil
aoc-person
aoc-pinned
aoc-plane-takeoff
aoc-plane
aoc-print
aoc-reply
aoc-search
aoc-shuffle
aoc-star
aoc-subject
aoc-trip-style
aoc-unpinned
aoc-send
aoc-phone
aoc-apps
aoc-lock
aoc-verified

而不是这个,我想只收到这一行

"atlas obscura - curious and wondrous travel destinations"

请帮我提供一些想法,所有其他网站都在工作,只有一些网站会出现这些问题

【问题讨论】:

    标签: python web-scraping beautifulsoup scrapy


    【解决方案1】:

    您的问题是您在页面中发现所有出现的“标题”。美丽的汤有一个属性title 专门用于您正在尝试做的事情。这是您修改后的代码:

    # importing the modules
    import requests
    from bs4 import BeautifulSoup
    
    # target url
    url = 'https://atlasobscura.com'
    
    # making requests instance
    reqs = requests.get(url)
    
    # using the BeaitifulSoup module
    soup = BeautifulSoup(reqs.text, 'html.parser')
    title_data = soup.title.text.lower()
    
    # displaying the title
    print("Title of the website is : ")
    print(title_data)
    

    【讨论】:

      猜你喜欢
      • 2010-09-08
      • 1970-01-01
      • 1970-01-01
      • 2014-09-10
      • 1970-01-01
      • 1970-01-01
      • 2011-01-17
      • 1970-01-01
      • 2016-04-26
      相关资源
      最近更新 更多