【问题标题】:HOW DO I PARSE THE DATA IN THESE HTML TAGS?如何解析这些 HTML 标记中的数据?
【发布时间】:2017-05-27 04:48:26
【问题描述】:

我是 python 新手。我试图从我的学校网站获取一些数据。下面是我编写的仅废弃新闻项目的代码。它有效,但我希望标题、日期和段落换行。我觉得我的代码中缺少一些东西,但我没有挂住它。需要你们的帮助。

from bs4 import BeautifulSoup
from urllib.request import urlopen


page = urlopen("http://www.kibabiiuniversity.ac.ke")
soup = BeautifulSoup(page)

for i in soup.findAll("div", {"class": "blog-thumbnail-inside"}):
    print (i.get_text())
    print ("----------" *20)

这是我要抓取的页面的 html 标记结构。

<div class="blog-thumbnail-inside">
    <h2 class="blog-thumbnail-title post-widget-title-color gdl-title">
        <a href="http://www.kibabiiuniversity.ac.ke">
            Completion of fees & collection of exam cards.
        </a>
    </h2>
    <div class="blog-thumbnail-info post-widget-info-color gdl-divider">
        <div class="blog-thumbnail-date">Posted on 09 Jan 2017</div>
    </div>
    <div class="blog-thumbnail-context">
        <div class="blog-thumbnail-content">
            Download the information on fee payment and collection of exam cards..
        </div>
    </div>
</div>

【问题讨论】:

    标签: beautifulsoup python-3.5


    【解决方案1】:
    for i in soup.findAll("div", {"class": "blog-thumbnail-inside"}):
        print (i.get_text('\n'))  #You can specify a string to be used to join the bits of text together
        print ("----------" *20)
    

    出来:

    Final Undergraduate Examination Timetable for Semester 1 2016/2017
    Posted on 11 Jan 2017
    Download Undergraduate Timetable
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Vacancies for Administrative and Teaching Positions
    Posted on 11 Jan 2017
    Kibabii University is a fully fledged public institution of higher education and research in Kenya with a student population of 6400 and staff population of 346. The University seeks to appoint innovative individuals with experience and excellent credentials
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    

    【讨论】:

    • 谢谢,这正是我所缺少的。
    猜你喜欢
    • 2016-12-08
    • 1970-01-01
    • 2021-07-03
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2015-05-31
    • 2016-11-03
    • 2011-11-06
    相关资源
    最近更新 更多