【问题标题】:How to print specific elements from parsed HTML?如何从解析的 HTML 中打印特定元素?
【发布时间】:2019-09-29 22:08:47
【问题描述】:

我正在尝试从我的 BS4 解析的 HTML 代码字符串中打印特定的行。

我希望我的最终结果如下所示:

如何打印此问题?

一个。我想打印这个

b.我也想打印这个

c。我也想打印这个

d。我也想打印这个

正确答案是:我想打印这个

Here's my BS4 result prettified 并复制到文本编辑器中,以便更轻松地用眼睛导航。我的最终结果包括打印第 23、33、39、45、51 和 63 行。我该如何实现?

<div class="que multichoice deferredfeedback correct" id="q7">
   <div class="info">
    <h3 class="no">
     Question
     <span class="qno">
      7
     </span>
    </h3>
    <div class="state">
     Correct
    </div>
    <div class="grade">
     Mark 1.00 out of 1.00
    </div>
   </div>
   <div class="content">
    <div class="formulation">
     <h4 class="accesshide">
      Question text
     </h4>
     <input name="q7391425:7_:sequencecheck" type="hidden" value="3"/>
     <div class="qtext">
      HOW DO I PRINT THIS QUESTION?
     </div>
     <div class="ablock">
      <div class="prompt">
       Select one:
      </div>
      <div class="answer">
       <div class="r0">
        <input disabled="disabled" id="q7391425:7_answer0" name="q7391425:7_answer" type="radio" value="0"/>
        <label for="q7391425:7_answer0">
         a. I WANT TO PRINT THIS
        </label>
       </div>
       <div class="r1 correct">
        <input checked="checked" disabled="disabled" id="q7391425:7_answer1" name="q7391425:7_answer" type="radio" value="1"/>
        <label for="q7391425:7_answer1">
         b. I WANT TO PRINT THIS TOO
        </label>
       </div>
       <div class="r0">
        <input disabled="disabled" id="q7391425:7_answer2" name="q7391425:7_answer" type="radio" value="2"/>
        <label for="q7391425:7_answer2">
         c. I WANT TO PRINT THIS ALSO
        </label>
       </div>
       <div class="r1">
        <input disabled="disabled" id="q7391425:7_answer3" name="q7391425:7_answer" type="radio" value="3"/>
        <label for="q7391425:7_answer3">
         d. I WANT TO PRINT THIS AS WELL
        </label>
       </div>
      </div>
     </div>
    </div>
    <div class="outcome">
     <h4 class="accesshide">
      Feedback
     </h4>
     <div class="feedback">
      <div class="rightanswer">
       THE CORRECT ANSWER IS: I WANT TO PRINT THIS
      </div>
     </div>
    </div>
   </div>
  </div>

鉴于 Furas 的洞察力,我意识到我应该提供更多信息。

我现在正在使用

from bs4 import BeautifulSoup as BS
text = (input('Enter Source Code File Name - '))
with open(text) as file:
  data = file.read()
soup = BS(data, 'html.parser')
for qtext in soup.find_all('div', class_='qtext'):
  print(qtext.text.strip())
for labels in soup.find_all('label'):
  print(labels.text.strip())
for ras in soup.find_all('div', class_='rightanswer'):
  print(ras.text.strip())

我拥有的每个源代码txt文件都包含10道选择题,我希望代码以以下格式打印:

QText

答案0-3

正确答案

(然后重复这个循环,直到没有剩余或 9 次)

---就像现在一样,它返回以下内容---

QText(x10)

答案0-3(x10)

正确答案(x10)

如何更改此设置以完成一个循环,即检索 1 个 qtext、4 个答案 0-3,然后是 1 个正确答案,然后再开始另一个循环?

【问题讨论】:

  • 始终将代码、错误消息和数据作为有问题的文本,而不是图像
  • 显示代码和完整的错误信息。

标签: python html python-3.x parsing beautifulsoup


【解决方案1】:

所有元素都可以使用tagclass找到

print(soup.find('div', class_='qtext').text.strip())

# HOW DO I PRINT THIS QUESTION?

for item in soup.find_all('label'):
    print(item.text.strip())

# a. I WANT TO PRINT THIS
# b. I WANT TO PRINT THIS TOO
# c. I WANT TO PRINT THIS ALSO
# d. I WANT TO PRINT THIS AS WELL

print(soup.find('div', class_='rightanswer').text.strip())

# THE CORRECT ANSWER IS: I WANT TO PRINT THIS

你也可以使用.get_text(strip=True)代替.text.strip()


完整代码:

data = '''
<div class="que multichoice deferredfeedback correct" id="q7">
   <div class="info">
    <h3 class="no">
     Question
     <span class="qno">
      7
     </span>
    </h3>
    <div class="state">
     Correct
    </div>
    <div class="grade">
     Mark 1.00 out of 1.00
    </div>
   </div>
   <div class="content">
    <div class="formulation">
     <h4 class="accesshide">
      Question text
     </h4>
     <input name="q7391425:7_:sequencecheck" type="hidden" value="3"/>
     <div class="qtext">
      HOW DO I PRINT THIS QUESTION?
     </div>
     <div class="ablock">
      <div class="prompt">
       Select one:
      </div>
      <div class="answer">
       <div class="r0">
        <input disabled="disabled" id="q7391425:7_answer0" name="q7391425:7_answer" type="radio" value="0"/>
        <label for="q7391425:7_answer0">
         a. I WANT TO PRINT THIS
        </label>
       </div>
       <div class="r1 correct">
        <input checked="checked" disabled="disabled" id="q7391425:7_answer1" name="q7391425:7_answer" type="radio" value="1"/>
        <label for="q7391425:7_answer1">
         b. I WANT TO PRINT THIS TOO
        </label>
       </div>
       <div class="r0">
        <input disabled="disabled" id="q7391425:7_answer2" name="q7391425:7_answer" type="radio" value="2"/>
        <label for="q7391425:7_answer2">
         c. I WANT TO PRINT THIS ALSO
        </label>
       </div>
       <div class="r1">
        <input disabled="disabled" id="q7391425:7_answer3" name="q7391425:7_answer" type="radio" value="3"/>
        <label for="q7391425:7_answer3">
         d. I WANT TO PRINT THIS AS WELL
        </label>
       </div>
      </div>
     </div>
    </div>
    <div class="outcome">
     <h4 class="accesshide">
      Feedback
     </h4>
     <div class="feedback">
      <div class="rightanswer">
       THE CORRECT ANSWER IS: I WANT TO PRINT THIS
      </div>
     </div>
    </div>
   </div>
  </div>
'''

from bs4 import BeautifulSoup as BS

soup = BS(data, 'html.parser')

print(soup.find('div', class_='qtext').text.strip())
for item in soup.find_all('label'):
    print(item.text.strip())
print(soup.find('div', class_='rightanswer').text.strip())

编辑:如果您在 HTML 中有更多问题,那么您可以找到一个标签,该标签保留一个问题及其选择和正确答案 - 即。 &lt;div class="que multichoice deferredfeedback correct" id="q7"&gt; - 然后找到所有标签,然后在这些标签内搜索。

for questions in soup.find_all('div', class_='multichoice'):

    print(questions.find('div', class_='qtext').text.strip())
    for item in questions.find_all('label'):
        print(item.text.strip())
    print(questions.find('div', class_='rightanswer').text.strip())

完整代码 - 我复制了相同的 HTML 来模拟两个问题:

data = '''
<div class="que multichoice deferredfeedback correct" id="q7">
   <div class="info">
    <h3 class="no">
     Question
     <span class="qno">
      7
     </span>
    </h3>
    <div class="state">
     Correct
    </div>
    <div class="grade">
     Mark 1.00 out of 1.00
    </div>
   </div>
   <div class="content">
    <div class="formulation">
     <h4 class="accesshide">
      Question text
     </h4>
     <input name="q7391425:7_:sequencecheck" type="hidden" value="3"/>
     <div class="qtext">
      HOW DO I PRINT THIS QUESTION?
     </div>
     <div class="ablock">
      <div class="prompt">
       Select one:
      </div>
      <div class="answer">
       <div class="r0">
        <input disabled="disabled" id="q7391425:7_answer0" name="q7391425:7_answer" type="radio" value="0"/>
        <label for="q7391425:7_answer0">
         a. I WANT TO PRINT THIS
        </label>
       </div>
       <div class="r1 correct">
        <input checked="checked" disabled="disabled" id="q7391425:7_answer1" name="q7391425:7_answer" type="radio" value="1"/>
        <label for="q7391425:7_answer1">
         b. I WANT TO PRINT THIS TOO
        </label>
       </div>
       <div class="r0">
        <input disabled="disabled" id="q7391425:7_answer2" name="q7391425:7_answer" type="radio" value="2"/>
        <label for="q7391425:7_answer2">
         c. I WANT TO PRINT THIS ALSO
        </label>
       </div>
       <div class="r1">
        <input disabled="disabled" id="q7391425:7_answer3" name="q7391425:7_answer" type="radio" value="3"/>
        <label for="q7391425:7_answer3">
         d. I WANT TO PRINT THIS AS WELL
        </label>
       </div>
      </div>
     </div>
    </div>
    <div class="outcome">
     <h4 class="accesshide">
      Feedback
     </h4>
     <div class="feedback">
      <div class="rightanswer">
       THE CORRECT ANSWER IS: I WANT TO PRINT THIS
      </div>
     </div>
    </div>
   </div>
  </div>
<div class="que multichoice deferredfeedback correct" id="q7">
   <div class="info">
    <h3 class="no">
     Question
     <span class="qno">
      7
     </span>
    </h3>
    <div class="state">
     Correct
    </div>
    <div class="grade">
     Mark 1.00 out of 1.00
    </div>
   </div>
   <div class="content">
    <div class="formulation">
     <h4 class="accesshide">
      Question text
     </h4>
     <input name="q7391425:7_:sequencecheck" type="hidden" value="3"/>
     <div class="qtext">
      HOW DO I PRINT THIS QUESTION?
     </div>
     <div class="ablock">
      <div class="prompt">
       Select one:
      </div>
      <div class="answer">
       <div class="r0">
        <input disabled="disabled" id="q7391425:7_answer0" name="q7391425:7_answer" type="radio" value="0"/>
        <label for="q7391425:7_answer0">
         a. I WANT TO PRINT THIS
        </label>
       </div>
       <div class="r1 correct">
        <input checked="checked" disabled="disabled" id="q7391425:7_answer1" name="q7391425:7_answer" type="radio" value="1"/>
        <label for="q7391425:7_answer1">
         b. I WANT TO PRINT THIS TOO
        </label>
       </div>
       <div class="r0">
        <input disabled="disabled" id="q7391425:7_answer2" name="q7391425:7_answer" type="radio" value="2"/>
        <label for="q7391425:7_answer2">
         c. I WANT TO PRINT THIS ALSO
        </label>
       </div>
       <div class="r1">
        <input disabled="disabled" id="q7391425:7_answer3" name="q7391425:7_answer" type="radio" value="3"/>
        <label for="q7391425:7_answer3">
         d. I WANT TO PRINT THIS AS WELL
        </label>
       </div>
      </div>
     </div>
    </div>
    <div class="outcome">
     <h4 class="accesshide">
      Feedback
     </h4>
     <div class="feedback">
      <div class="rightanswer">
       THE CORRECT ANSWER IS: I WANT TO PRINT THIS
      </div>
     </div>
    </div>
   </div>
  </div>
'''  

from bs4 import BeautifulSoup as BS

soup = BS(data, 'html.parser')

for questions in soup.find_all('div', class_='multichoice'):

    print(questions.find('div', class_='qtext').text.strip())
    for item in questions.find_all('label'):
        print(item.text.strip())
    print(questions.find('div', class_='rightanswer').text.strip())
    print('---')

或者你可以使用for-loop 对项目进行分组

from bs4 import BeautifulSoup as BS

soup = BS(data, 'html.parser')

all_questions = soup.find_all('div', class_='qtext')
all_choices = soup.find_all('label')
all_answers = soup.find_all('div', class_='rightanswer')

for x in range(len(all_questions)):
    print(all_questions[x].text.strip())

    y = x*4
    for item in all_choices[y:y+4]:
        print(item.text.strip())

    print(all_answers[x].text.strip())
    print('---')

完整代码:

data = '''
<div class="que multichoice deferredfeedback correct" id="q7">
   <div class="info">
    <h3 class="no">
     Question
     <span class="qno">
      7
     </span>
    </h3>
    <div class="state">
     Correct
    </div>
    <div class="grade">
     Mark 1.00 out of 1.00
    </div>
   </div>
   <div class="content">
    <div class="formulation">
     <h4 class="accesshide">
      Question text
     </h4>
     <input name="q7391425:7_:sequencecheck" type="hidden" value="3"/>
     <div class="qtext">
      HOW DO I PRINT THIS QUESTION?
     </div>
     <div class="ablock">
      <div class="prompt">
       Select one:
      </div>
      <div class="answer">
       <div class="r0">
        <input disabled="disabled" id="q7391425:7_answer0" name="q7391425:7_answer" type="radio" value="0"/>
        <label for="q7391425:7_answer0">
         a. I WANT TO PRINT THIS
        </label>
       </div>
       <div class="r1 correct">
        <input checked="checked" disabled="disabled" id="q7391425:7_answer1" name="q7391425:7_answer" type="radio" value="1"/>
        <label for="q7391425:7_answer1">
         b. I WANT TO PRINT THIS TOO
        </label>
       </div>
       <div class="r0">
        <input disabled="disabled" id="q7391425:7_answer2" name="q7391425:7_answer" type="radio" value="2"/>
        <label for="q7391425:7_answer2">
         c. I WANT TO PRINT THIS ALSO
        </label>
       </div>
       <div class="r1">
        <input disabled="disabled" id="q7391425:7_answer3" name="q7391425:7_answer" type="radio" value="3"/>
        <label for="q7391425:7_answer3">
         d. I WANT TO PRINT THIS AS WELL
        </label>
       </div>
      </div>
     </div>
    </div>
    <div class="outcome">
     <h4 class="accesshide">
      Feedback
     </h4>
     <div class="feedback">
      <div class="rightanswer">
       THE CORRECT ANSWER IS: I WANT TO PRINT THIS
      </div>
     </div>
    </div>
   </div>
  </div>
<div class="que multichoice deferredfeedback correct" id="q7">
   <div class="info">
    <h3 class="no">
     Question
     <span class="qno">
      7
     </span>
    </h3>
    <div class="state">
     Correct
    </div>
    <div class="grade">
     Mark 1.00 out of 1.00
    </div>
   </div>
   <div class="content">
    <div class="formulation">
     <h4 class="accesshide">
      Question text
     </h4>
     <input name="q7391425:7_:sequencecheck" type="hidden" value="3"/>
     <div class="qtext">
      HOW DO I PRINT THIS QUESTION?
     </div>
     <div class="ablock">
      <div class="prompt">
       Select one:
      </div>
      <div class="answer">
       <div class="r0">
        <input disabled="disabled" id="q7391425:7_answer0" name="q7391425:7_answer" type="radio" value="0"/>
        <label for="q7391425:7_answer0">
         a. I WANT TO PRINT THIS
        </label>
       </div>
       <div class="r1 correct">
        <input checked="checked" disabled="disabled" id="q7391425:7_answer1" name="q7391425:7_answer" type="radio" value="1"/>
        <label for="q7391425:7_answer1">
         b. I WANT TO PRINT THIS TOO
        </label>
       </div>
       <div class="r0">
        <input disabled="disabled" id="q7391425:7_answer2" name="q7391425:7_answer" type="radio" value="2"/>
        <label for="q7391425:7_answer2">
         c. I WANT TO PRINT THIS ALSO
        </label>
       </div>
       <div class="r1">
        <input disabled="disabled" id="q7391425:7_answer3" name="q7391425:7_answer" type="radio" value="3"/>
        <label for="q7391425:7_answer3">
         d. I WANT TO PRINT THIS AS WELL
        </label>
       </div>
      </div>
     </div>
    </div>
    <div class="outcome">
     <h4 class="accesshide">
      Feedback
     </h4>
     <div class="feedback">
      <div class="rightanswer">
       THE CORRECT ANSWER IS: I WANT TO PRINT THIS
      </div>
     </div>
    </div>
   </div>
  </div>
'''  

from bs4 import BeautifulSoup as BS

soup = BS(data, 'html.parser')

all_questions = soup.find_all('div', class_='qtext')
all_choices = soup.find_all('label')
all_answers = soup.find_all('div', class_='rightanswer')

for x in range(len(all_questions)):
    print(all_questions[x].text.strip())

    y = x*4
    for item in all_choices[y:y+4]:
        print(item.text.strip())

    print(all_answers[x].text.strip())
    print('---')

【讨论】:

  • 好的,所以我省略了一些信息,因为我不知道它可能会如何影响获得有关此问题的帮助。我有多页类似的源代码,每页都有已显示的代码的 10 次迭代。给定这段代码,它列出了所有 10 个问题,然后是所有 40 个多项选择答案,然后是所有 10 个正确答案。我该如何更改它以按以下顺序返回 - 问题、4 个多项选择答案、正确答案、重复 x9?
  • 您可以找到一个标签,该标签保留一个问题的所有元素(问题、答案、正确答案) - 即 &lt;div class="que multichoice deferredfeedback correct" id="q7"&gt; - 首先找到所有这些元素,然后在里面搜索问题、答案、正确答案这些元素。或者您可以尝试使用for-loop 和/或zip() 对元素进行分组 - 即。 for question, correct_answer in zip(all_questions, all_correct_answers) ,您可以使用切片进行选择 - for x in range(0, 40, 4): choices[x:x+4]
  • 我在这两种方法的答案示例中都添加了。
猜你喜欢
  • 2015-06-21
  • 2021-09-02
  • 2010-10-18
  • 1970-01-01
  • 1970-01-01
  • 2013-12-13
  • 2019-10-26
  • 2021-02-25
  • 2020-04-16
相关资源
最近更新 更多