【发布时间】:2022-01-23 04:08:18
【问题描述】:
我正在尝试使用 html 列表和无序列表对页面进行网页抓取
(嵌套在列表和无序列表中)
但我无法在没有属性的情况下对它们进行网络抓取。
一天下的每个<ul> 标记都包含当天的数据。我知道如何抓取嵌套的 <ul> 和 <li>
标签,但由于缺少属性而无法这样做。我想知道是否可以获取已解析的页面并在包含日期的行下查找标签,以便我可以一次刮掉它们。任何帮助将不胜感激。
这里还有一点代码,
<div class="show-content user_content clearfix enhanced" data-uw-styling-context="true">
<h1 class="page-title" data-uw-styling-context="true">Unit 3 I Week 3</h1>
<div style="background-color: #184366; color: white; padding: 15px;" data-uw-styling-context="true">
<h2 data-uw-styling-context="true"><span style="font-size: 30pt;" data-uw-styling-context="true">Unit 3 | Week 3: January 18th-21st</span></h2>
</div>
<h2 data-uw-styling-context="true">Essential Questions</h2>
<ul data-uw-styling-context="true">
<li aria-level="1" data-uw-styling-context="true"><span data-uw-styling-context="true">How does voice relate to the audience and purpose?</span></li>
<li aria-level="1" data-uw-styling-context="true"><span data-uw-styling-context="true">What techniques does the author use to get his/her point across and communicate?</span></li>
<li aria-level="1" data-uw-styling-context="true"><span data-uw-styling-context="true">How can technology be beneficial and/or detrimental to society?</span></li>
</ul>
<h2 data-uw-styling-context="true">Objectives</h2>
<ul data-uw-styling-context="true">
<li aria-level="1" data-uw-styling-context="true"><span data-uw-styling-context="true">Analyze the concept of utopia/dystopia as presented in the novel</span></li>
<li aria-level="1" data-uw-styling-context="true"><span data-uw-styling-context="true">Create a utopia to represent the ideas of the group and backed up with research</span></li>
<li aria-level="1" data-uw-styling-context="true"><span data-uw-styling-context="true">Analyze expository/informational text </span></li>
<li aria-level="1" data-uw-styling-context="true"><span data-uw-styling-context="true">Understand rhetorical devices and logical fallacies</span></li>
<li aria-level="1" data-uw-styling-context="true"><span data-uw-styling-context="true">Interpret elements of media including television and digital graphics</span></li>
<li aria-level="1" data-uw-styling-context="true"><span data-uw-styling-context="true">Create a TV newscast that organizes and presents research with certain purposes and audiences in mind</span></li>
<li aria-level="1" data-uw-styling-context="true"><span data-uw-styling-context="true">Collaborate to create a professional product</span></li>
<li aria-level="1" data-uw-styling-context="true"><span data-uw-styling-context="true">Explain author’s purpose and message within a text</span></li>
</ul>
<p data-uw-styling-context="true"><img src="https://fisd.instructure.com/courses/56950/files/4791824/download" alt="tear drop line 3.png" data-api-endpoint="https://fisd.instructure.com/api/v1/courses/56950/files/4791824" data-api-returntype="File" style="max-width: 676px;" data-uw-styling-context="true"></p>
<h2 data-uw-styling-context="true">???? Monday</h2>
<ul data-uw-styling-context="true">
<li style="list-style-type: none;" data-uw-styling-context="true">
<ul data-uw-styling-context="true">
<li data-uw-styling-context="true">No School</li>
</ul>
</li>
</ul>
<hr data-uw-styling-context="true">
<h2 data-uw-styling-context="true">???? Tuesday</h2>
<ul data-uw-styling-context="true">
<li style="list-style-type: none;" data-uw-styling-context="true">
<ul data-uw-styling-context="true">
<li data-uw-styling-context="true">????In Class Today:
<ul data-uw-styling-context="true">
<li data-uw-styling-context="true">Read Chapter 4</li>
<li data-uw-styling-context="true">Annotations </li>
<li data-uw-styling-context="true">Book Study</li>
</ul>
</li>
<li data-uw-styling-context="true">????Due Today:</li>
<li data-uw-styling-context="true">????Homework for Next Class:
<ul data-uw-styling-context="true">
<li data-uw-styling-context="true">Study Stems</li>
<li data-uw-styling-context="true">Annotations and Book Study 1-4 due BOC Wed</li>
</ul>
</li>
</ul>
</li>
</ul>
<hr data-uw-styling-context="true">
<h2 data-uw-styling-context="true">???? Wednesday</h2>
<ul data-uw-styling-context="true">
<li style="list-style-type: none;" data-uw-styling-context="true">
<ul data-uw-styling-context="true">
<li data-uw-styling-context="true">????In Class Today:
<ul data-uw-styling-context="true">
<li data-uw-styling-context="true">Subject Complement Notes </li>
<li data-uw-styling-context="true">"There Will Come Soft Rains" </li>
</ul>
</li>
<li data-uw-styling-context="true">????Due Today:
<ul data-uw-styling-context="true">
<li data-uw-styling-context="true">Annotations and Book Study Ch. 1-4</li>
</ul>
</li>
<li data-uw-styling-context="true">????Homework for Next Class:
<ul data-uw-styling-context="true">
<li data-uw-styling-context="true">Study Stems </li>
</ul>
</li>
</ul>
</li>
</ul>
<hr data-uw-styling-context="true">
<h2 data-uw-styling-context="true">???? Thursday</h2>
<ul data-uw-styling-context="true">
<li style="list-style-type: none;" data-uw-styling-context="true">
<ul data-uw-styling-context="true">
<li data-uw-styling-context="true">????In Class Today:
<ul data-uw-styling-context="true">
<li data-uw-styling-context="true">Subject Complement Practice</li>
<li data-uw-styling-context="true">TWCSR</li>
</ul>
</li>
<li data-uw-styling-context="true">????Due Today:</li>
<li data-uw-styling-context="true">????Homework for Next Class:
<ul data-uw-styling-context="true">
<li data-uw-styling-context="true">Study Stems </li>
</ul>
</li>
</ul>
</li>
</ul>
<hr data-uw-styling-context="true">
<h2 data-uw-styling-context="true">???? Friday</h2>
<ul data-uw-styling-context="true">
<li style="list-style-type: none;" data-uw-styling-context="true">
<ul data-uw-styling-context="true">
<li data-uw-styling-context="true">????In Class Today:
<ul data-uw-styling-context="true">
<li data-uw-styling-context="true">Stems Quiz 5 Major Grade</li>
<li data-uw-styling-context="true">TWCSR (Due Monday BOC)</li>
</ul>
</li>
<li data-uw-styling-context="true">????Due Today:</li>
<li data-uw-styling-context="true">????Homework for Next Class:</li>
</ul>
</li>
</ul>
<p data-uw-styling-context="true"><img src="https://fisd.instructure.com/courses/56950/files/4791824/download" alt="tear drop line 3.png" data-api-endpoint="https://fisd.instructure.com/api/v1/courses/56950/files/4791824" data-api-returntype="File" style="max-width: 676px;" data-uw-styling-context="true"></p>
<p data-uw-styling-context="true"><img style="float: left; max-width: 72px;" src="https://fisd.instructure.com/courses/56950/files/4791827/download" alt="Left Arrow (1).png" data-api-endpoint="https://fisd.instructure.com/api/v1/courses/56950/files/4791827" data-api-returntype="File" data-uw-styling-context="true"></p>
<p data-uw-styling-context="true"><br data-uw-styling-context="true"> <a title="Unit 3 Overview" href="https://fisd.instructure.com/courses/111538/pages/unit-3-overview" data-api-endpoint="https://fisd.instructure.com/api/v1/courses/111538/pages/unit-3-overview" data-api-returntype="Page" data-uw-styling-context="true">Unit 3 Homepage</a></p>
<p data-uw-styling-context="true"> </p>
<p data-uw-styling-context="true"><a title="Home" href="https://fisd.instructure.com/courses/111538/pages/home" data-api-endpoint="https://fisd.instructure.com/api/v1/courses/111538/pages/home" data-api-returntype="Page" data-uw-styling-context="true"><img style="float: left; max-width: 72px;" src="https://fisd.instructure.com/courses/56950/files/4791834/download?wrap=1" alt="Home Black.png" data-api-endpoint="https://fisd.instructure.com/api/v1/courses/56950/files/4791834" data-api-returntype="File" data-uw-styling-context="true"> <br data-uw-styling-context="true">Course Homepage</a></p>
<p data-uw-styling-context="true"> </p>
</div>
这是页面的截图,
【问题讨论】:
-
这是来自公共网页吗?
-
不,这是来自登录后的学校页面
标签: python html web-scraping beautifulsoup python-requests