【问题标题】:How to regex for entry between quotes?如何在引号之间输入正则表达式?
【发布时间】:2017-09-04 10:55:34
【问题描述】:

我需要获取引号之间的条目,如本例所示:Regex href="x....dkjads...href="y" 并返回 xy

[<a class="lightbox" href="fileadmin/user_upload/images/Sprachen/Englisch/USA/San_Diego/San_Diego_EC/EC_San_Diego_Galerie.jpg" title=""><img alt="Sprachschule EC San Diego" border="0" height="80" src="typo3temp/pics/EC_San_Diego_Galerie_d1def1bf4a.jpg" title="Sprachschule EC San Diego (Copyright EC San Diego. All rights reserved.)" width="80"/></a>, <a class="lightbox" href="fileadmin/user_upload/images/Sprachen/Englisch/USA/San_Diego/San_Diego_EC/EC_San_Diego_Galerie-_1_.jpg" title=""><img alt="Sprachschule EC San Diego 2" border="0" height="80" src="typo3temp/pics/EC_San_Diego_Galerie-_1__fd87630014.jpg" title="Sprachschule EC San Diego 2 (Copyright EC San Diego. All rights reserved.)" width="80"/></a>, <a class="lightbox" href="fileadmin/user_upload/images/Sprachen/Englisch/USA/San_Diego/San_Diego_EC/EC_San_Diego_Galerie-_10_.jpg" title=""><img alt="Sprachschule EC San Diego 3" border="0" height="80" src="typo3temp/pics/EC_San_Diego_Galerie-_10__a8ed60c277.jpg" title="Sprachschule EC San Diego 3 (Copyright EC San Diego. All rights reserved.)"

如何在正则表达式中输入以在开头搜索多个精确字符?

这个(?<=\").*?(?=\") 返回" "(?<=\{href="}).*?(?=\") 之间的所有内容都不起作用

【问题讨论】:

  • 你需要正则表达式吗? python应该能够为您解析html。 stackoverflow.com/questions/2782097/…
  • 谢谢。我从学习正则表达式的过程开始,这就是为什么我想在有其他解决方案的情况下使用它。

标签: python regex


【解决方案1】:

如果要匹配href="<content>"中的<content>,要匹配的模式是href=\"(.*?)\" (regex101 demo)。

使用 python re 模块,您可以:

>>> a= """
... [<a class="lightbox" href="fileadmin/user_upload/images/Sprachen/Englisch/USA/San_Diego/San_Diego_EC/EC_San_Diego_Galerie.jpg" title=""><img alt="Sprachschule EC San Diego" border="0" height="80" src="typo3temp/pics/EC_San_Diego_Galerie_d1def1bf4a.jpg" title="Sprachschule EC San Diego (Copyright EC San Diego. All rights reserved.)" width="80"/></a>, <a class="lightbox" href="fileadmin/user_upload/images/Sprachen/Englisch/USA/San_Diego/San_Diego_EC/EC_San_Diego_Galerie-_1_.jpg" title=""><img alt="Sprachschule EC San Diego 2" border="0" height="80" src="typo3temp/pics/EC_San_Diego_Galerie-_1__fd87630014.jpg" title="Sprachschule EC San Diego 2 (Copyright EC San Diego. All rights reserved.)" width="80"/></a>, <a class="lightbox" href="fileadmin/user_upload/images/Sprachen/Englisch/USA/San_Diego/San_Diego_EC/EC_San_Diego_Galerie-_10_.jpg" title=""><img alt="Sprachschule EC San Diego 3" border="0" height="80" src="typo3temp/pics/EC_San_Diego_Galerie-_10__a8ed60c277.jpg" title="Sprachschule EC San Diego 3 (Copyright EC San Diego. All rights reserved.)"
... 
... """
>>> import re
>>> re.findall(r'href=\"(.*?)\"',a)
['fileadmin/user_upload/images/Sprachen/Englisch/USA/San_Diego/San_Diego_EC/EC_San_Diego_Galerie.jpg', 'fileadmin/user_upload/images/Sprachen/Englisch/USA/San_Diego/San_Diego_EC/EC_San_Diego_Galerie-_1_.jpg', 'fileadmin/user_upload/images/Sprachen/Englisch/USA/San_Diego/San_Diego_EC/EC_San_Diego_Galerie-_10_.jpg']
>>> 

希望这会有所帮助。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2012-01-27
    • 2010-09-15
    • 2012-02-09
    • 1970-01-01
    • 2016-08-29
    • 2019-04-15
    相关资源
    最近更新 更多