使用 Python 解析和更新 markdown 文件答案

【问题标题】：Parsing and updating markdown file with Python使用 Python 解析和更新 markdown 文件
【发布时间】：2023-03-26 15:00:01
【问题描述】：

我正在创建一个脚本，它将遍历一个 markdown 文件并从

更新任何图像标签

![Daffy Duck](http://www.nonstick.com/wp-content/uploads/2015/10/Daffy.gif)

到

![Daffy Duck](http://www.nonstick.com/wp-content/uploads/2015/10/Daffy.gif?alt-text="Daffy Duck")

我是Python新手，所以我不确定语法和我的方法，但我目前的想法是创建一个新的空字符串，逐行遍历原始markdown，如果检测到图像标签拼接alt文本到正确的位置并将行添加到新的降价字符串。到目前为止，我的代码如下所示：

import markdown
from markdown.treeprocessors import Treeprocessor
from markdown.extensions import Extension


originalMarkdown = '''
## New Article
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pretium nunc ligula. Quisque bibendum vel lectus sed pulvinar. Phasellus id magna ac arcu iaculis facilisis. Curabitur tincidunt sed ipsum vel lacinia. Nulla et semper urna. Quisque ultrices hendrerit magna nec tempor. 

![Daffy Duck](http://www.nonstick.com/wp-content/uploads/2015/10/Daffy.gif)
Quisque accumsan sem mi. Nunc orci justo, laoreet vel metus nec, interdum euismod ipsum. 
![Bugs Bunny](http://www.nationalnannies.com/wp-content/uploads/2012/03/bugsbunny.png)
 Suspendisse augue odio, pharetra ac erat eget, volutpat ornare velit. Sed ac luctus quam. Sed id mauris erat. Duis lacinia faucibus metus, nec vehicula metus consectetur eu.
'''

updatedMarkdown = ""

# First create the treeprocessor
class AltTextExtractor(Treeprocessor):
    def run(self, doc):
        "Find all alt_text and append to markdown.alt_text. "
        self.markdown.alt_text = []
        for image in doc.findall('.//img'):
         self.markdown.alt_text.append(image.get('alt'))

# Then traverse the markdown file and concatenate the alt text to the end of any image tags
class ImageTagUpdater(Treeprocessor):
    def run(self, doc):
      # Set a counter
      count = 0
      # Go through markdown file line by line
        for line in doc:
          # if line is an image tag
          if line > ('.//img'):
            # grab the array of the alt text
            img_ext = ImgExtractor(md)
            # find the second to last character of the line
            location = len(line) - 1
            # insert the alt text
            line += line[location] + '?' +  '"' + img_ext[count] +  '"'
            # add line to new markdown file 
        updatedMarkdownadd.add(line)

以上代码为伪代码。我能够成功地从原始文件中提取我需要的字符串，但我无法将这些字符串连接到它们各自的图像标签并更新原始文件。

【问题讨论】：

您的问题到底是什么？你的代码做错了什么？请举例说明。
代码是伪代码。我能够成功提取我需要的字符串，但我无法将这些字符串连接到图像标签并保存原始文件。
你在用什么markdown模块（因为Python没有自带）？
@martineau python-markdown.github.io/reference
在我看来，通过阅读模块关于扩展的文档，您可以通过使用自己的自定义 Treeprocessor 子类预处理输入文件，一步完成您想做的事情。

标签： python markdown

【解决方案1】：

如果您的文件不是很大，覆盖文件可能会更容易，而不是尝试在这里或那里插入一点点。

orig = '![Daffy Duck](http://www.nonstick.com/wp-content/uploads/2015/10/Daffy.gif)'
new = '![Daffy Duck](http://www.nonstick.com/wp-content/uploads/2015/10/Daffy.gif?alt-text="Daffy Duck")'

with open(filename, 'r') as f:
    text = f.readlines()
    new_text = "\n".join([line if line != orig else new for line in text])

with open(filename, 'w') as f:    
    f.write(new_text)

你也可以使用正则表达式re.sub，但我想这是一个偏好问题。

【讨论】：