网页不再只是一个页面(不再),而是通常由许多请求组成。 request.get() 的结果是一条数据,可能是浏览器读取和解释的初始 HTML 页面,然后用于请求更多数据。
Python 请求不这样做:不读取和解释页面。它只是得到你所要求的。
因此,您需要像浏览器一样工作的东西:获取第一页、加载额外资源并解释 javascript(这可能会导致加载更多资源)。Selenium 是一个很好的工具。
或者,您需要查看页面以查看正在加载的内容并可能提出该请求。
例如,使用浏览器调试器查看 www.edx.org 页面,您会看到它(主页)加载了一个名为 subject 的文件(实际上是https://www.edx.org/api/v1/catalog/subjects)
而且,如果您查看 那个 文件,您会看到它是 json:
{"count": 31,
"next": null,
"previous": null,
"results": [
{
"name": "Computer Science",
"subtitle": "<p>Take online computer science courses from top institutions including Harvard, MIT and Microsoft. Learn to code with computer science courses including programming, web design, and more.</p>",
"description": "<p>Enroll in the latest computer science courses covering Python, C programming, R, Java, artificial intelligence, cybersecurity, software engineering, and more. Learn from Harvard, MIT, Microsoft, IBM, and other top institutions. Join today.</p>\n<p>Related Topics - <a href=\"/learn/computer-programming\">Programming</a> | <a href=\"/learn/android-development\">Android Development</a> | <a href=\"/learn/apache-spark\">Apache Spark</a> | <a href=\"/learn/app-development\">App Development</a> | <a href=\"/learn/artificial-intelligence\">Artificial Intelligence</a> | <a href=\"/learn/azure\">Azure</a> | <a href=\"https://www.edx.org/learn/big-data\">Big Data</a> | <a href=\"/learn/blockchain-cryptography\">Blockchain</a> | <a href=\"https://www.edx.org/learn/c-programming\">C</a> | <a href=\"https://www.edx.org/learn/c-plus-plus\">C++</a> | <a href=\"https://www.edx.org/learn/c-sharp\">C#</a> | <a href=\"/learn/cloud-computing\">Cloud Computing</a> | <a href=\"/learn/cybersecurity\">Cybersecurity</a> | <a href=\"https://www.edx.org/learn/data-science\">Data Science</a> | <a href=\"https://www.edx.org/learn/data-analysis\">Data Analysis</a> | <a href=\"/learn/databases\">Databases</a> | <a href=\"https://www.edx.org/learn/devops\">Devops</a> | <a href=\"/learn/front-end-web-development\">Front End Web Development</a> | <a href=\"/learn/hadoop\">Hadoop</a> | <a href=\"/learn/html\">HTML</a> | <a href=\"/learn/information-technology\">Information Technology</a> | <a href=\"/learn/java\">Java</a> | <a href=\"/learn/javascript\">JavaScript</a> | <a href=\"/learn/linux\">Linux</a> | <a href=\"/learn/machine-learning\">Machine Learning</a> | <a href=\"/learn/matlab\">Matlab</a> | <a href=\"/learn/mobile-development\">Mobile Development</a> | <a href=\"/learn/python\">Python</a> | <a href=\"https://www.edx.org/learn/r-programming\">R</a> | <a href=\"/learn/robotics\">Robotics</a> | <a href=\"https://www.edx.org/learn/software-engineering\">Software Engineering</a> | <a href=\"https://www.edx.org/learn/sql\">SQL</a> | <a href=\"/learn/t-sql\">T-SQL</a> | <a href=\"https://www.edx.org/learn/user-experience-ux\">UX Design</a> | <a href=\"https://www.edx.org/learn/virtual-reality\">Virtual Reality</a> | <a href=\"/learn/web-development\">Web Development</a> | <a href=\"https://www.edx.org/learn/web-design\">Web Design</a> | <a href=\"https://www.edx.org/masters/online-master-science-computer-science-utaustinx\">Master's in Computer Science</a> | <a href=\"https://www.edx.org/masters/online-master-science-analytics-georgia-tech\">Master's in Analytics</a> | <a href=\"https://www.edx.org/masters/online-master-data-science-uc-san-diego\">Master's in Data Science</a></p>",
"banner_image_url": "https://www.edx.org/sites/default/files/cs-1440x210.jpg",
"card_image_url": "https://www.edx.org/sites/default/files/subject/image/card/computer-science.jpg",
"slug": "computer-science",
"uuid": "e52e2134-a4e4-4fcb-805f-cbef40812580"
},
... etc.
所以,根据你想要做什么,你可以使用request.get('https:/www.edu.org/api/v1/catalog/subjects'),将它从json转换成python对象,问题就解决了!