python的HTML解析器，可以跟踪标签在HTML文档中的位置

【问题标题】：HTML parser for python that can keep track location of the tag within HTML documentpython的HTML解析器，可以跟踪标签在HTML文档中的位置
【发布时间】：2021-02-05 09:35:42
【问题描述】：

我正在使用 python 解析 HTML 页面。我需要找到某些标签并以字节为单位测量它们之间的距离。我使用了 BeautifulSoup，但它无法获取找到的标签的位置。有没有可以做到这一点的python库？谢谢

【问题讨论】：

您能否发布一个关于标签“位置”的示例？
你的意思是你想要每个标签开始的文档中的字符位置吗？
是的，每个标签开始的文档中的字符位置
@user1354033 你看到我的回答了吗？请更新问题的状态。

标签： python html html-parsing

【解决方案1】：

如果我理解您想要获取每个标签开始的字符位置的意图，那么您可以使用以下代码来执行此操作。我从我的一个编码挑战中获得了这一点，以获取术语/标签开始的位置并进行计数。您可以根据自己的需要进行调整。

import urllib.request

def getTopicCount(topic):
    url = "http://www.google.com/search?q="
    contents = urllib.request.urlopen(url+topic).read().encode('utf-8')
    count = 0
    pos = contents.find(topic) #returns when this word was encountered. -1 its not there
    while pos != -1: #returns -1 if not found
        count += 1 
        pos = contents.find(topic, pos+1)#starting posistion in the returned json request
    return count

print(getTopicCount("<div"))

【讨论】：