使用 Python 上的 Web-Crawler 打印文章

【问题标题】：Print article with a Web-Crawler on Python使用 Python 上的 Web-Crawler 打印文章
【发布时间】：2014-09-15 19:19:27
【问题描述】：

我是 Python 新手，我正在尝试制作一个仅打印文章（例如此网站 -http://techcrunch.com/2014/09/15/microsoft-has-acquired-minecraft/）而不是网站上的其他内容的网络爬虫。我试过这个（这不起作用）：

source_code = requests.get('http://techcrunch.com/2014/09/15/microsoft-has-acquired-minecraft/')
plain_text = source_code.text
soup = BeautifulSoup(plain_text)

for link in soup.findAll('div', {'class': 'article-entry text'}):
    title = link.string
    print(title)

及其打印：'无' 谢谢

【问题讨论】：

标签： python web-scraping web-crawler

【解决方案1】：

您只需要这样插入for 循环的文章：

for link in soup.findAll('div', {'class': 'article-entry text'}):
  title = link.string
  print(title)

制作：

title = soup.find('h1', {'class': 'alpha tweet-title'}).get_text()
article = soup.find('div', {'class': 'article-entry text'}.get_text()
print title
print article

你只会得到标题和文章。

有关BeautifulSoup 的文档可能会有所帮助。

【讨论】：

谢谢它正在工作。但它的打印
在左右和不同的行上。有什么方法可以在没有

的情况下获得它，或者可以在同一个字符串中获得它？