Python--Web 抓取表格并仅将特定列写入 CSV 文件答案

【问题标题】：Python--Web scraping a table and writing only specific columns into a CSV filePython--Web 抓取表格并仅将特定列写入 CSV 文件
【发布时间】：2015-12-03 23:24:34
【问题描述】：

我遇到了一些问题。首先，当我尝试从网络抓取中写入 CSV 文件时，没有写入任何内容。该文件确实保存，但它完全是空白的。最终，我希望打开它并调用水温柱来计算平均值。

我的另一个问题是我只想要 CSV 文件中表中的几列。有人可以验证我所做的是否正确吗？我只想要前 3 列，然后是第 14 列。

谢谢！

import sys
import urllib2
import csv
import requests 
from bs4 import BeautifulSoup

r_temp1 = requests.get('http://www.ndbc.noaa.gov/data/realtime2/BZBM3.txt')
html_temp1 = r_temp1.text
soup = BeautifulSoup(html_temp1, "html.parser")
table_temp1 = soup.find('table')
rows_temp1 = table.findAll('tr')
rows_temp1 = rows_temp1[1:]

#writing to a csv file
csvfile_temp1 = open("temp1.csv","wb")
output_temp1 = csv.writer(csvfile_temp1, delimiter=',',quotechar='"',quoting=csv.QUOTE_MINIMAL)
for row in rows_temp1:
    Year = cells[0].text.strip()
    Month = cells[1].text.strip()
    Day = cells[2].text.strip()
    W_temp = cells[14].text.strip()
    output.writerow([Year,Month,Day,W_temp])
csvfile_temp1.close()

【问题讨论】：

标签： python csv web-scraping

【解决方案1】：

您在文件中看不到任何内容，因为rows_temp1 中没有行。该数组为空，因为文本文件中没有表格行。看起来您需要一个带有表格的 HTML 文件，但该文件只是一个纯文本文件。

这是一个可以满足您需求的版本：

import csv
import requests

r_temp1 = requests.get('http://www.ndbc.noaa.gov/data/realtime2/BZBM3.txt')
rows_temp1 = r_temp1.text.split('\n')

#writing to a csv file
csvfile_temp1 = open("temp1.csv","wb")
output_temp1 = csv.writer(csvfile_temp1, delimiter=',',quotechar='"',quoting=csv.QUOTE_MINIMAL)
for row in rows_temp1:
    if not row:  continue
    cells = row.split()
    Year = cells[0].strip()
    Month = cells[1].strip()
    Day = cells[2].strip()
    W_temp = cells[14].strip()
    output_temp1.writerow([Year,Month,Day,W_temp])
csvfile_temp1.close()

【讨论】：

好的，有道理。非常感谢！

【解决方案2】：

运行你的代码给出：

File "hh.py", line 11, in <module>
rows_temp1 = table.findAll('tr')

NameError: name 'table' is not defined

实际上，在第 10 行中，您定义了 table_temp1，而不是 table。不知道您是否还有其他问题，但请先阅读您遇到的错误

【讨论】：