【问题标题】:Beautiful soup how print a tag while iterating over it美丽的汤如何在迭代时打印标签
【发布时间】:2011-01-01 20:25:15
【问题描述】:

我的 xml 看起来像这样,我想获取位置。

<?xml version="1.0" encoding="UTF-8"?>
<playlist version="1" xmlns="http://xspf.org/ns/0/">
 <trackList>
  <track>
   <location>file:///home/ashu/Music/Collections/randomPicks/ipod%20on%20sep%2009/Coldplay-Sparks.mp3</location>
   <title>Coldplay-Sparks</title>
  </track>
  <track>
   <location>file:///home/ashu/Music/Collections/randomPicks/gud%201s/Coldplay%20Warning%20sign.mp3</location>
   <title>Coldplay Warning sign</title>
  </track>....

我正在尝试:

from BeautifulSoup import BeautifulSoup as bs
soup = bs (the_above_xml_text)
for track in soup.tracklist:
    print track.location.string

但这不起作用,因为我得到:

AttributeError: 'NavigableString' object has no attribute 'location'

我怎样才能达到这个结果,提前谢谢。

【问题讨论】:

  • 是的,为了简洁起见,我省略了那部分,你想让我全部展示吗?
  • 是的。还要解释“不工作”是什么意思。

标签: python xml beautifulsoup


【解决方案1】:

使用lxml,速度更快,支持xpath:

>>> doc = lxml.etree.fromstring(yourxml)
>>> doc.xpath('//n:location/text()', namespaces={'n': 'http://xspf.org/ns/0/'})
['file:///home/ashu/Music/Collections/randomPicks/ipod%20on%20sep%2009/Coldplay-Sparks.mp3',
'file:///home/ashu/Music/Collections/randomPicks/gud%201s/Coldplay%20Warning%20sign.mp3']

【讨论】:

  • @Bunny:学习 XPath 足以解决所有问题的 80% 只需 30 分钟。
  • nosklo ,谢谢你的建议,我一定会知道的,但可能在下周末:P
【解决方案2】:

你可以使用findAll

>>> for track in soup.findAll('track'):
...     print track.title.string
...     print track.location.string
... 
Coldplay-Sparks
file:///home/ashu/Music/Collections/randomPicks/ipod%20on%20sep%2009/Coldplay-Sparks.mp3
Coldplay Warning sign
file:///home/ashu/Music/Collections/randomPicks/gud%201s/Coldplay%20Warning%20sign.mp3

【讨论】:

  • findAll 是否使用某种模式匹配?使用 findAll 似乎破坏了拥有结构化文档的全部意义
猜你喜欢
  • 1970-01-01
  • 2018-07-18
  • 1970-01-01
  • 1970-01-01
  • 2020-12-09
  • 1970-01-01
  • 2017-12-05
  • 2016-03-22
  • 1970-01-01
相关资源
最近更新 更多