【问题标题】:How to format output from Beautiful Soup and Selenium?如何格式化 Beautiful Soup 和 Selenium 的输出?
【发布时间】:2018-01-14 06:16:55
【问题描述】:

我使用以下代码从网站检索经济数据:

from bs4 import BeautifulSoup
from selenium import webdriver

url = 'https://www.fxstreet.com/economic-calendar'

driver = webdriver.Chrome()
driver.get(url)
html = driver.page_source
soup = BeautifulSoup(html, 'lxml')

for tr in soup.findAll('tr',{'class':['fxst-tr-event', 'fxst-oddRow', 'fxit-eventrow', 'fxst-evenRow', 'fxs_cal_nextEvent']}):
    event = tr.find('div', {'class': 'fxit-event-title'}).text
    currency = tr.find('div', {'class': 'fxit-event-name'}).text
    actual = tr.find('div', {'class': 'fxit-actual'}).text
    forecast = tr.find('div', {'class': 'fxit-consensus'}).text
    previous = tr.find('div', {'class': 'fxst-td-previous fxit-previous'}).text
    time = tr.find('div', {'class': 'fxit-eventInfo-time fxs_event_time'}).text
    volatility = tr.find('div', {'class': 'fxit-eventInfo-vol-c fxit-event-info-desktop'}).span['title']

    print(u'\t{}\t{}\t{}\t{}').format(time, currency, event, volatility)

print语句的输出如下:

23:30   
AUD                                     
AiG Performance of Construction Index (Jul)
    Moderate volatility expected
    23:50   
JPY                                     
JP Foreign Reserves (Jul)
    Low volatility expected
    24h 
CAD                                     
August Civic Holiday
    No volatility expected
    01:30   
AUD                                     
ANZ Job Advertisements (Jun)
    Low volatility expected
    n/a 
CNY                                     
Foreign Exchange Reserves (MoM) (Jul)
    Low volatility expected
    05:00   
JPY                                     
Coincident Index (Jun)Preliminar
    Moderate volatility expected
    05:00

是否可以格式化此输出,使其按行打印,如下所示?

    23:30   AUD   AiG Performance of Construction Index (Jul)   Moderate volatility expected
    23:50   JPY   JP Foreign Reserves (Jul)                     Low volatility expected
    24h     CAD   August Civic Holiday                          No volatility expected
    01:30   AUD   ANZ Job Advertisements (Jun)                  Low volatility expected
    n/a     CNY   Foreign Exchange Reserves (MoM) (Jul)         Low volatility expected
    05:00   JPY   Coincident Index (Jun)Preliminary             Moderate volatility expected

最终目标是剪切此输出并将其粘贴到 Excel 文件中。提前致谢!

【问题讨论】:

  • 也许尝试剥离换行符?
  • 所以你想要这样..?
  • print('somethins', end='') # 默认结束是\n
  • 是的,但是中间有换行符,而不仅仅是在末尾​​span>
  • 为什么要使用 Selenium 来获取页面,而不是简单的 url.get()?似乎没有必要

标签: python selenium beautifulsoup


【解决方案1】:

为了补充另一个答案,因为您提到“最终目标是剪切此输出并将其粘贴到 Excel 文件中”,您也可能有兴趣从数据中生成 .csv,所以它可能是在import csv 之后轻松导出到 Excel,而不是复制粘贴,您需要将循环更改为:

with open("data.csv", "w") as csv_file:
    for tr in soup.findAll('tr',{'class':['fxst-tr-event', 'fxst-oddRow', 'fxit-eventrow', 'fxst-evenRow', 'fxs_cal_nextEvent']}):
        event = tr.find('div', {'class': 'fxit-event-title'}).text
        currency = tr.find('div', {'class': 'fxit-event-name'}).text
        actual = tr.find('div', {'class': 'fxit-actual'}).text
        forecast = tr.find('div', {'class': 'fxit-consensus'}).text
        previous = tr.find('div', {'class': 'fxst-td-previous fxit-previous'}).text
        time = tr.find('div', {'class': 'fxit-eventInfo-time fxs_event_time'}).text
        volatility = tr.find('div', {'class': 'fxit-eventInfo-vol-c fxit-event-info-desktop'}).span['title']

        line = [time.strip(),currency.strip(),event.strip(),volatility.strip()]
        writer = csv.writer(csv_file, delimiter=',')
        writer.writerow(line)
        print(line)

【讨论】:

  • 您可能想将 csv.writer 对象的创建移到“for tr”循环之前...
【解决方案2】:

尝试像这样去除换行符:

from bs4 import BeautifulSoup
from selenium import webdriver

url = 'https://www.fxstreet.com/economic-calendar'

driver = webdriver.Chrome()
driver.get(url)
html = driver.page_source
soup = BeautifulSoup(html, 'lxml')

for tr in soup.findAll('tr',{'class':['fxst-tr-event', 'fxst-oddRow', 'fxit-eventrow', 'fxst-evenRow', 'fxs_cal_nextEvent']}):
    event = tr.find('div', {'class': 'fxit-event-title'}).text
    currency = tr.find('div', {'class': 'fxit-event-name'}).text
    actual = tr.find('div', {'class': 'fxit-actual'}).text
    forecast = tr.find('div', {'class': 'fxit-consensus'}).text
    previous = tr.find('div', {'class': 'fxst-td-previous fxit-previous'}).text
    time = tr.find('div', {'class': 'fxit-eventInfo-time fxs_event_time'}).text
    volatility = tr.find('div', {'class': 'fxit-eventInfo-vol-c fxit-event-info-desktop'}).span['title']

    print(u'\t{}\t{}\t{}\t{}').format(time.strip(), currency.strip(), event.strip(), volatility.strip()) 

这样每个字符串都不会有换行符。

【讨论】:

  • .strip() 而不是 .strip('\n') 也一样! :)
猜你喜欢
  • 1970-01-01
  • 2015-08-07
  • 2019-07-21
  • 2011-04-10
  • 2018-08-28
  • 1970-01-01
  • 2022-10-15
  • 2018-05-07
  • 2013-01-09
相关资源
最近更新 更多