【问题标题】:How do I avoid text between {{ }} [duplicate]如何避免 {{ }} 之间的文本 [重复]
【发布时间】:2011-12-27 12:48:41
【问题描述】:

可能重复:
My regex is not working properly

假设我有长文本。从以下文本中,我只需要抽象部分。如何避免{{ }} 之间的文本。谢谢 `

{{ Info extra text}}
{{Infobox film
| name           = Papori
| released       = 1986
| runtime        = 144 minutes
| country        = Assam, {{IND}}
| budget         = [[a]]
| followed by    = free
}}
Albert Einstein ( /'ælb?rt 'a?nsta?n/; German: ['alb?t 'a?n?ta?n] ( listen); 14 March 1879 – 18 April 1955)
 was a German-born theoretical physicist who developed the theory of general relativity, effecting a
 revolution in physics. For this achievement, Einstein is often regarded as the father of modern physics 
 and one of the most prolific intellects in human history.`

输出:

Albert Einstein ( /'ælb?rt 'a?nsta?n/; German: ['alb?t 'a?n?ta?n] ( listen); 14 March 1879 – 18 April 1955)
 was a German-born theoretical physicist who developed the theory of general relativity, effecting a
 revolution in physics. For this achievement, Einstein is often regarded as the father of modern physics 
 and one of the most prolific intellects in human history.

【问题讨论】:

  • 如果您真的只是询问如何从 Wikipedia 文章中获取摘要,请注意 DBpedia 的优秀人员以结构化方式提供 Wikipedia 文章(以及处理 wiki 标记)。
  • @John Flatness DBpedia 是否提供API
  • 这是您关于做同样事情的第三个问题。如果之前的答案对您不起作用更新问题,不要只是一遍又一遍地问。

标签: python regex


【解决方案1】:

我做了什么:

>>> text
"{{ Info extra text}}\n{{Infobox film\n| name           = Papori\n| released       = 1986\n| runtime        = 144 minutes\n| country        = Assam, {{IND}}\n| budget         = [[a]]\n| followed by    = free\n}}\nAlbert Einstein ( /'ælb?rt 'a?nsta?n/; German: ['alb?t 'a?n?ta?n] ( listen); 14 March 1879 – 18 April 1955)\n was a German-born theoretical physicist who developed the theory of general relativity, effecting a\n revolution in physics. For this achievement, Einstein is often regarded as the father of modern physics \n and one of the most prolific intellects in human history.`"
>>> re.sub(r"\{\{[\w\W\n\s]*\}\}", "", text)
"\nAlbert Einstein ( /'ælb?rt 'a?nsta?n/; German: ['alb?t 'a?n?ta?n] ( listen); 14 March 1879 – 18 April 1955)\n was a German-born theoretical physicist who developed the theory of general relativity, effecting a\n revolution in physics. For this achievement, Einstein is often regarded as the father of modern physics \n and one of the most prolific intellects in human history.`"

编辑:巴特的评论是正确的。

可以考虑这个替代方案:

>>> re.sub(r"\{\{[^\}]*\}\}", "", "{{a\n   oaheduh}} b {{c}} d")
' b  d'

【讨论】:

  • 匹配第一个{{,然后消耗直到最后一个}}。这可能适用于 OP 发布的(单个)示例,但也会删除 "{{a}} b {{c}}" 中的 "b"
  • 另外,您可以从[\w\W\n\s] 中删除\n\s,这些集合已经与\W 匹配。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2012-04-30
  • 2016-07-24
相关资源
最近更新 更多