正则表达式：如果之前找到其他文本，则不匹配答案

【问题标题】：Regex : Don't match if other text is found before正则表达式：如果之前找到其他文本，则不匹配
【发布时间】：2021-01-12 13:05:41
【问题描述】：

我正在尝试使用正则表达式解析降价文档，以查找文档中是否有标题（#title）。我已经设法用这个正则表达式(?m)^#{1}(?!#) (.*) 实现了这一点，问题是我的markdown 中也可以有代码部分，我可以在其中遇到# 标题格式作为注释。

我的想法是尝试找到 # 标题，但如果在行之前有一个 ```language 则不匹配。

这是一个文本示例，我只需要匹配# my title 而不是下面的# helloworld.py，特别是如果缺少# my title（这是我需要找出的）：

<!--
.. title: Structuring a Python application
.. medium: yes
.. devto: yes
-->

# my title

In this short article I will explain all the different ways of structuring a Python application, from a quick script to a more complex web application.

## Single python file containing all the code


```python
#!/usr/bin/env python
# helloworld.py

test

【问题讨论】：

标题一定要出现吗？如果它存在，它是否总是在第一个短语后面跟着一个#？
如果您在代码块之后不需要可能的标题，您可以使用document.partition("```")[0] 而不是完整文档。
标题不一定要存在，如果它存在，它也不是第一个短语，但我在下面找到了答案！ @追逐

标签： python regex markdown

【解决方案1】：

这可能会让正则表达式变得非常混乱。但是因为看起来你无论如何都会使用 python - 这可能是微不足道的。

mkdwn = '''<!--
.. title: Structuring a Python application
.. medium: yes
.. devto: yes
-->

# my title

In this short article I will explain all the different ways of structuring a Python application, from a quick script to a more complex web application.

## Single python file containing all the code


```python
#!/usr/bin/env python
# helloworld.py

test'''

'''Get the first occurrence of a substring that
you're 100% certain **will not be present** before the title 
but **will be present** in the document after the title (if the title exists)
'''
idx = mkdwn.index('```')

# Now, try to extract the title using regex, starting from the string start but ending at `idx`
title_match = re.search(r'^# (.+)', mkdwn[:idx],flags=re.M)
# Get the 1st group if a match was found, else empty string
title = title_match.group(1) if title_match else ''
print(title)

你也可以减少这个

title_match = re.search(r'^# (.+)', mkdwn[:idx],flags=re.M)
# Get the 1st group if a match was found, else empty string
title = title_match.group(1) if title_match else ''

如果你喜欢那种东西的话-

title = getattr(re.search(r'^# (.+)', mkdwn[:idx],flags=re.M), 'group', lambda _: '')(1)

getattr 将返回属性 group 如果存在（即找到匹配时） - 否则它只会返回那个伪函数（lambda _: ''），它接受一个伪参数并返回一个空字符串，即分配给title。

然后使用参数1 调用返回的函数，如果找到匹配项，则返回第一个组。如果没有找到匹配项，那么参数无关紧要，它只是返回一个空字符串。

输出

我的头衔

【讨论】：

是的，你是对的，我应该分工，它有效！仅供参考，我需要使用这个正则表达式：r"(?<!#)#{1} .+" 否则它也会匹配 h2 标题 ## title h2
@MathieuDugue 不，它不应该匹配 ## title h2，r"(?<!#)#{1} .+" 似乎过于复杂，只需一个 ^# 就足以只选择位于一行开头并且也是后跟一个空格。又名单个哈希。这是demo
我的错你是对的！我添加了一个测试来检查是否有idx，因为我可以有一个没有任何代码块的文档，在这种情况下，我可以在整个文档中搜索正则表达式。像魅力一样工作谢谢！

【解决方案2】：

这是三个正则表达式的任务。第一个临时屏蔽所有代码片段，第二个处理 markdown，第三个取消屏蔽代码。

“筛选”是指将代码片段存储在字典中，并用字典键替换一些特殊的markdown。

【讨论】：