【问题标题】:Python list search, comparison and elimination of elementsPython列表搜索、元素比较和消除
【发布时间】:2015-06-11 17:06:46
【问题描述】:

我想获取所有没有配对的元素。 这是一个从上到下读取的 XML 标记列表,去掉了括号。 我想找到对(例如开始标签note 和结束标签/note),将它们从列表中删除,然后留下没有对的标签。

你如何遍历列表,将每个标签与所有其他标签进行比较,然后说:啊哈,我找到了另一个以正斜杠开头的“note”标签?

谢谢。

还有其他更好的方法来查找不匹配的标签吗?

PS:我确实希望保留列表的顺序,如果可能的话,在将标签与列表中的另一个标签进行比较时使用相等性。如果使用 'in' 运算符,它将不起作用,因为如果标签名称是像 'a' 这样的一个字母,则搜索将返回所有包含 a 的元素,而不是与 'a' 完全匹配的元素。

tags = ['note', 'to', 'bbb', 'bbb', 'firstname', '/firstname', 'lastname', '/lastname', 'from', 'hello', 'hello', 'hello', 'hello', 'hello', 'l', '/from', '/to', 'elephant', 'll', 'from', '/from', 'a1', 'img', 'a2', 'from', 'from', '/from', '/from', '/a2', '/img', '/a1', 'heading', '/heading', 'body', '/body', '/note']

【问题讨论】:

    标签: python list search compare


    【解决方案1】:

    您可以使用所有结束标签创建一个set,然后使用该集合来过滤标签。

    >>> closing = set([t for t in tags if t.startswith("/")])
    >>> [t for t in tags if "/" + t not in closing and t not in closing]
    ['bbb', 'bbb', 'hello', 'hello', 'hello', 'hello', 'hello', 'l', 'elephant', 'll']
    

    但是请注意,这并不真正尊重标签的“对”,而只是查看列表中是否存在同一标签的“关闭”变体。例如,给定tags = ["a", "a", "/a"]tags = ["a", "/a", "a"],它将从列表中删除a两个 实例。

    【讨论】:

    • 谢谢。这就是找到开始和结束标签的诀窍。如何找到配对的标签?
    • @user1552294 不确定您在问什么。显示一些示例输入和输出。
    【解决方案2】:

    程序的第一部分获取列表中的所有标签。如果您注意到这是查找不匹配括号的问题。可以通过将该列表视为堆栈,并找出哪些标签是有问题的,并在此过程中进行迭代来解决。

    import re
    
    def clean_attr(attr):
        attr_list = re.split(r'\s+', attr)
        if len(attr_list) == 1:
            return attr
        else:
            return attr_list[0] + '>'
    
    line="""
    <?xml version="1.0"?>
    <catalog>
       <book id="bk101">
          <author>Gambardella, Matthew</author>
          <title>XML Developer's Guide</title>
          <genre>Computer</genre>
          <price>44.95</price>
          <publish_date>2000-10-01</publish_date>
          <description>An in-depth look at creating applications 
          with XML.</description>
       </book>
       <book id="bk102">
          <author>Ralls, Kim</author>
          <title>Midnight Rain</title>
          <genre>Fantasy</genre>
          <price>5.95</price>
          <publish_date>2000-12-16</publish_date>
          <description>A former architect battles corporate zombies, 
          an evil sorceress, and her own childhood to become queen 
          of the world.</description>
       </book>
       <book id="bk103">
          <author>Corets, Eva</author>
          <title>Maeve Ascendant</title>
          <genre>Fantasy</genre>
          <price>5.95</price>
          <publish_date>2000-11-17</publish_date>
          <description>After the collapse of a nanotechnology 
          society in England, the young survivors lay the 
          foundation for a new society.</description>
       </book>
       <book id="bk104">
          <author>Corets, Eva</author>
          <title>Oberon's Legacy</title>
          <genre>Fantasy</genre>
          <price>5.95</price>
          <publish_date>2001-03-10</publish_date>
          <description>In post-apocalypse England, the mysterious 
          agent known only as Oberon helps to create a new life 
          for the inhabitants of London. Sequel to Maeve 
          Ascendant.</description>
       </book>
       <book id="bk105">
          <author>Corets, Eva</author>
          <title>The Sundered Grail</title>
          <genre>Fantasy</genre>
          <price>5.95</price>
          <publish_date>2001-09-10</publish_date>
          <description>The two daughters of Maeve, half-sisters, 
          battle one another for control of England. Sequel to 
          Oberon's Legacy.</description>
       </book>
       <book id="bk106">
          <author>Randall, Cynthia</author>
          <title>Lover Birds</title>
          <genre>Romance</genre>
          <price>4.95</price>
          <publish_date>2000-09-02</publish_date>
          <description>When Carla meets Paul at an ornithology 
          conference, tempers fly as feathers get ruffled.</description>
       </book>
       <book id="bk107">
          <author>Thurman, Paula</author>
          <title>Splish Splash</title>
          <genre>Romance</genre>
          <price>4.95</price>
          <publish_date>2000-11-02</publish_date>
          <description>A deep sea diver finds true love twenty 
          thousand leagues beneath the sea.</description>
       </book>
       <book id="bk108">
          <author>Knorr, Stefan</author>
          <title>Creepy Crawlies</title>
          <genre>Horror</genre>
          <price>4.95</price>
          <publish_date>2000-12-06</publish_date>
          <description>An anthology of horror stories about roaches,
          centipedes, scorpions  and other insects.</description>
       </book>
       <book id="bk109">
          <author>Kress, Peter</author>
          <title>Paradox Lost</title>
          <genre>Science Fiction</genre>
          <price>6.95</price>
          <publish_date>2000-11-02</publish_date>
          <description>After an inadvertant trip through a Heisenberg
          Uncertainty Device, James Salway discovers the problems 
          of being quantum.</description>
       </book>
       <book id="bk110">
          <author>O'Brien, Tim</author>
          <title>Microsoft .NET: The Programming Bible</title>
          <genre>Computer</genre>
          <price>36.95</price>
          <publish_date>2000-12-09</publish_date>
          <description>Microsoft's .NET initiative is explored in 
          detail in this deep programmer's reference.</description>
       </book>
          <author>O'Brien, Tim</author>
          <title>MSXML3: A Comprehensive Guide</title>
          <genre>Computer</genre>
          <price>36.95</price>
          <publish_date>2000-12-01</publish_date>
          <description>The Microsoft MSXML3 parser is covered in 
          detail, with attention to XML DOM interfaces, XSLT processing, 
          SAX and more.</description>
       </book>
       <book id="bk112">
          <author>Galos, Mike</author>
          <title>Visual Studio 7: A Comprehensive Guide</title>
          <genre>Computer</genre>
          <price>49.95</price>
          <publish_date>2001-04-16</publish_date>
          <description>Microsoft Visual Studio 7 is explored in depth,
          looking at how Visual Basic, Visual C++, C#, and ASP+ are 
          integrated into a comprehensive development 
          environment.
       </book>
    </catalog>
    
    """
    attr_open = re.findall(r'<[\w+\s=\"]+>', line)
    attr_closed = re.findall(r'<\/\w+>', line)
    all_attrs = re.findall(r'<[\w+\s=\"]+>|<\/\w+>', line)
    
    all_attrs_cleaned = map(clean_attr, all_attrs)
    
    # print all_attrs_cleaned
    
    list_as_stack = []
    not_closed = []
    all_attrs_cleaned = iter(all_attrs_cleaned)
    
    an_attr = all_attrs_cleaned.next()
    
    try:
        while all_attrs_cleaned:
            if not an_attr.startswith('</'):
                list_as_stack.append(an_attr)
                an_attr = all_attrs_cleaned.next()
            else:
                temp = list_as_stack[-1]
                if re.search(r'\w+', temp).group(0) == re.search(r'\w+', an_attr).group(0):
                    list_as_stack.pop()
                    an_attr = all_attrs_cleaned.next()
                else:
                    if len(list_as_stack) != 0:
                        not_closed.append(an_attr)  
                    an_attr = all_attrs_cleaned.next()
    except Exception:
        print "Stop Iter"
    
    print list_as_stack
    print not_closed
    

    在上面的程序中,第一个数组告诉你哪些标签没有结束,第二个数组告诉你哪些结束标签没有开始标签。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2018-01-23
      • 2019-03-22
      • 2015-08-01
      • 2021-11-01
      • 2017-08-19
      • 2020-09-20
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多