【发布时间】:2016-06-03 18:22:41
【问题描述】:
我是 python 新手,想要一些关于 Web Scraping 的帮助。
我有一个带有 python 的 Raspberry Pi3,我想使用 BeautifulSoap 从网页中提取一些数据并将其写入带有时间戳的文本文件,我保持 Pi 24x7 开启,所以我希望 python 重复在一定的时间间隔后自己,以便我以后可以使用这些值创建图表。
开始,我试过了>
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("https://172.30.83.14/bsnlfup/usage.php")
bsObj = BeautifulSoup(html.read());
print(bsObj.td)"
而输出是别的东西-
<td align="right">
<a href="usage.php"><img alt="" border="0" height="152" src="images/fuph.jpg" width="100%"/></a>
数据包含在一个td标签中,但是页面中有很多td标签,所以它不起作用,我不知道如何让它将数据写入txt文件。
html源码-
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta http-equiv="Expires" content="0">
<meta http-equiv="Pragma" content="No-cache">
<meta http-equiv="Cache-Control" content="no-cache">
<meta name="keywords" content="High-Speed, Broadband, IPTV, Internet, VoIP">
<meta name="description" content="Leading provider of high-speed communication services.">
<link rel="stylesheet" type="text/css" href="css/npm.css">
<title>BSNL BROADBAND</title>
<script language="Javascript" type="text/javascript" src="js/npmcommon.js"></script>
</head>
<body onload="TINIT();" topmargin="0" leftmargin="0" marginheight="0" marginwidth="0" bgcolor="#ffffff">
<div class="portalheader" align="left">
<table style="width: 100%;" border="0" cellspacing="0" cellpadding="0" bgcolor="white">
<tr>
<td align="right">
<a href="usage.php"><img src="images/fuph.jpg" alt="" border="0" height="152" width="100%"></a>
</td>
</tr>
<tr>
<td style="width: 100%; height: 10px; background-color: rgba(29, 117, 182, 1);"></td>
</tr>
</table>
</div>
<div class="serviceservlet">
<table style="width: 100%;" border="0" cellpadding="0" cellspacing="0">
<tbody>
<tr>
<td style="width: 165px; vertical-align: top; background-color: rgb(f, f, f);">
<table border="0" cellpadding="0" cellspacing="0" width="165">
<tbody>
<tr>
<td colspan="3" height="48">
<br>
</td>
</tr>
</tbody>
</table>
<table border="0" cellpadding="0" cellspacing="0" width="165">
<tbody>
<tr>
<td style="width: 10px;">
<br>
</td>
</tr>
</tbody>
</table>
</td>
<td valign="top" width="100%">
<table style="width: 100%; height: 204px;" border="0" cellpadding="0" cellspacing="0">
<tbody>
<tr></tr>
<tr>
<td colspan="2">
<font size="-1" face="Verdana, Arial, Helvetica, sans-serif">
<br>
<b>
You are logged in as
'abcdef_ghijkl@bsnl.in' at 117.000.000.000.
<br>
<br>
</b>
<br>
<br>
</font>
<!--Display the available metered time usage stats-->
<table border="0" width="100%" cellpadding="0" cellspacing="0">
<noscript>
<tr>
<td>
<a href="help.php#Java_script" target="new">
<font color="#FF0000">
<u>You must have JavaScript enabled in order to view usage stats.</u>
</font>
</a>
<br>
<br>
</td>
</tr>
</noscript>
<tr>
<td colspan="4">
<font color="#0A63BF">
<b> </b>
</font>
</td>
</tr>
<tr>
<td>
<i></i>
</td>
</tr>
</table>
<br>
<table border="0" width="100%" cellpadding="0" cellspacing="0">
<noscript>
<tr>
<td>
<a href="help.php#Java_script" target="new">
<font color="#FF0000">
<u>You must have JavaScript enabled in order to view usage stats.</u>
</font>
</a>
<br>
<br>
</td>
</tr>
</noscript>
<tr>
<td colspan="7">
<font color="#0A63BF">
<b> </b>
</font>
</td>
</tr>
<tr align="left">
<th>Download Remaining with High(FUP-original)Speed </th>
</tr>
<tr align="left">
<td>78.647 GB</td>
<td>
<a href="top_up.php?service=HS-I-H-50MB-90GB-10MB-B-M&timeMetered=false"><img name="addBytes" src="images/btn1.png" border="0" alt="[AddBytes]" title="Top up volume quota"></a>
</td>
</tr>
<tr height="10">
<td>
<font color="#0A63BF"></font>
</td>
</tr>
</table>
<p>
<p></p>
</p>
</td>
<td style="width: 10px; background-color: rgb(f,f,f);">
<br>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</div>
<div class="portalfooter" align="left">
<td style="vertical-align: top;">
<table style="width: 100%; height: 86px;" border="0" cellpadding="0" cellspacing="0">
<tbody>
<tr>
<td colspan="3" rowspan="1" style="background-color: rgb(f, f, f);">
<br>
</td>
</tr>
<tr valign="top">
<td style="width: 165px; height: 10px;" border="0">
<br>
</td>
<td class="npm10Text" height="10">
<br>
<br>
<p align="right">2014 BSNL . All rights reserved.</p>
<br>
<br>
</td>
<td align="right" style="vertical-align: middle;"></td>
</tr>
<tr>
<td colspan="3" rowspan="1" style="background-color: rgba(29, 117, 182, 1);">
<br>
</td>
</tr>
</tbody>
</table>
</td>
</div>
</body>
</html>
我想在 "Download Remaining with High(FUP-original)Speed" 之后导出标签中的数据
即我想将 78.647GB 导出到带有时间戳的文本文件中。然后在一个时间间隔后重复并再次将导出的内容添加到同一个文本文件中。
【问题讨论】:
-
使用类似 Chrome 的开发者工具来获取您想要分析的元素的 XPath。
标签: python linux python-3.x