【发布时间】:2015-12-09 16:05:03
【问题描述】:
我正在使用 2.7.8 并且有点惊讶 bcz 我得到了所有文本,但最后一个 之后包含的文本没有得到。喜欢我的html页面:
<html>
<body>
<div class="entry-content" >
<p>Here is a listing of C interview questions on “Variable Names” along with answers, explanations and/or solutions:
</p>
<p>Which of the following is not a valid C variable name?<br>
a) int number;<br>
b) float rate;<br>
c) int variable_count;<br>
d) int $main;</p> <!--not getting-->
<p> more </p>
<p>Which of the following is true for variable names in C?<br>
a) They can contain alphanumeric characters as well as special characters<br>
b) It is not an error to declare a variable to be one of the keywords(like goto, static)<br>
c) Variable names cannot start with a digit<br>
d) Variable can be of any length</p> <!--not getting -->!
</div>
</body>
</html>
和我的代码:
url = "http://www.sanfoundry.com/c-programming-questions-answers-variable-names-1/"
#url="http://www.sanfoundry.com/c-programming-questions-answers-variable-names-2/"
req = Request(url)
resp = urllib2.urlopen(req)
htmls = resp.read()
from bs4 import BeautifulSoup
soup = BeautifulSoup(htmls)
for br in soup.findAll('br'):
next = br.nextSibling
if not (next and isinstance(next,NavigableString)):
continue
next2 = next.nextSibling
if next2 and isinstance(next2,Tag) and next2.name == 'br':
text = str(next).strip()
if text:
print "Found:", next.encode('utf-8')
# print '...........sfsdsds.............',answ[0].encode('utf-8') #
输出:
Found:
a) int number;
Found:
b) float rate;
Found:
c) int variable_count;
Found:
a) They can contain alphanumeric characters as well as special characters
Found:
b) It is not an error to declare a variable to be one of the keywords(like goto, static)
Found:
c) Variable names cannot start with a digit
但是我没有得到最后一个“文本”,例如:
d) int $main
and
d) Variable can be of any length
在
之后我想要得到的输出:
Found:
a) int number;
Found:
b) float rate;
Found:
c) int variable_count;
Found:
d) int $main
Found:
a) They can contain alphanumeric characters as well as special characters
Found:
b) It is not an error to declare a variable to be one of the keywords(like goto, static)
Found:
c) Variable names cannot start with a digit
d) Variable can be of any length
【问题讨论】:
-
添加更多打印语句。当您
continue打印您正在跳过的内容时。将 else 语句放在 if 语句中并打印您正在跳过的内容。 -
好的,我正在尝试......
-
为什么你还在用旧的方式而不是我建议的方式here?..
-
好吧,在某种程度上我面临一些问题,因为我的代码要大得多。由于您提到的较小原因,我解决了我的最后一个问题。但在这里我也面临与你的解决方案相同的情况
标签: python html beautifulsoup html-parsing