使用 re.sub [重复] 将符号添加到 UTF8 BOM 字符串答案

【问题标题】：Add symbol to UTF8 BOM string using re.sub [duplicate]使用 re.sub [重复] 将符号添加到 UTF8 BOM 字符串
【发布时间】：2023-01-31 23:25:53
【问题描述】：

我有 UTF8 BOM 文本文件的文件：

1
00:00:05,850 --> 00:00:07,713
Welcome to the Course.

2
00:00:08,550 --> 00:00:10,320
This course has been designed to teach you

3
00:00:10,320 --> 00:00:12,750
all about the,
...

我需要添加一个“;”在每组数字的末尾。我用代码来做到这一点：

import re

with open("/content/file.srt", "r", encoding='utf-8-sig') as strfile:
    str_file_content = strfile.read()
    print(str_file_content)

test = re.sub(r'^(\d{1,3})$', r'\1;', str_file_content)
test

结果：

1\n00:00:05,850 --> 00:00:07,713\nWelcome to the Course.\n\n2\n00:00:08,550 --> 00:00:10,320\nThis course has been designed to teach you\n\n

即符号“;”没有添加！我期望的结果：

1;
00:00:05,850 --> 00:00:07,713
Welcome to the Course.

2;
00:00:08,550 --> 00:00:10,320
This course has been designed to teach you

3;
00:00:10,320 --> 00:00:12,750
all about the,
...

我做错了什么？

【问题讨论】：

您要查找的行中是否有空格？
正则表达式中的 ^ 和 $ 将其锚定到字符串的开头和结尾。也就是说，它仅在您的整个字符串仅包含 1-3 个数字字符。我认为 test = re.sub(r'(\d{1,3})', r'\1;', str_file_content) 会做正确的事。
使用多行修饰符作为参数：re.sub(r'^(\d{1,3})$', r'\1;', str_file_content, flags=re.M)

标签： python regex

【解决方案1】：

您可以使用 MULTILINE 标志使 ^ 和 $ 的行为符合您的预期：

test = re.sub(r'^(d{1,3})$', r';', strtest, flags=re.MULTILINE)

【讨论】：

【解决方案2】：

这是一个可能的正则表达式：

test = re.sub(r"(d+)
", r";
", str_file_content)

【讨论】：