【发布时间】:2011-09-14 17:16:19
【问题描述】:
我在views.py 文件中加载了一个外部 xml 文件
def test(request):
url = urllib2.urlopen("http://someurl.com?xml")
dom = minidom.parse(url)
groups = dom.getElementsByTagName("group")
deal_holder = []
# Iterate over each DOM group element:
for group in groups:
# Iterate over each child node
for groupChild in group.childNodes:
deal_holder.append(groupChild)
return render_to_response('folder/test.html', {'deal_holder':deal_holder})
这是加载的 XML 文件的样子:
<page>
<site>
<siteid>25550</siteid>
<sitename>
<![CDATA[ Some Text Here ]]>
</sitename>
<sitelink>
http://somelinkehere.com
</sitelink>
<timezone>
<![CDATA[ Pacific Time ]]>
</timezone>
</site>
<groups>
<enablefeaturedgroup>OFF</enablefeaturedgroup>
<group>
<groupid>467246</groupid>
<groupname>
<![CDATA[ Today's Deal ]]>
</groupname>
<groupdescription>
<![CDATA[ ]]>
</groupdescription>
</group>
<group>
<groupid>467247</groupid>
<groupname>
<![CDATA[ Past Deals ]]>
</groupname>
<groupdescription>
<![CDATA[ ]]>
</groupdescription>
</group>
</groups>
</page>
问题是我看到的所有示例都使用类似于我正在使用的东西,除了它们通常具有如下所示的 XML 标记:<weather:forecast day="Wed" date="14 Sep 2011" low="56" high="72" text="AM Clouds/PM Sun" code="30"/> 并且能够从诸如 @ 之类的东西中检索信息987654325@,date="14 Sep 2011",low="56"等...但我要检索的信息实际上是在<siteid>25550</siteid>等标签之间
任何建议或信息将不胜感激。
【问题讨论】:
标签: python xml django django-views web-scraping