【发布时间】:2023-01-31 23:25:53
【问题描述】:
我有 UTF8 BOM 文本文件的文件:
1
00:00:05,850 --> 00:00:07,713
Welcome to the Course.
2
00:00:08,550 --> 00:00:10,320
This course has been designed to teach you
3
00:00:10,320 --> 00:00:12,750
all about the,
...
我需要添加一个“;”在每组数字的末尾。我用代码来做到这一点:
import re
with open("/content/file.srt", "r", encoding='utf-8-sig') as strfile:
str_file_content = strfile.read()
print(str_file_content)
test = re.sub(r'^(\d{1,3})$', r'\1;', str_file_content)
test
结果:
1\n00:00:05,850 --> 00:00:07,713\nWelcome to the Course.\n\n2\n00:00:08,550 --> 00:00:10,320\nThis course has been designed to teach you\n\n
即符号“;”没有添加! 我期望的结果:
1;
00:00:05,850 --> 00:00:07,713
Welcome to the Course.
2;
00:00:08,550 --> 00:00:10,320
This course has been designed to teach you
3;
00:00:10,320 --> 00:00:12,750
all about the,
...
我做错了什么?
【问题讨论】:
-
您要查找的行中是否有空格?
-
正则表达式中的
^和$将其锚定到字符串的开头和结尾。也就是说,它仅在您的整个字符串仅包含 1-3 个数字字符。我认为test = re.sub(r'(\d{1,3})', r'\1;', str_file_content)会做正确的事。 -
使用多行修饰符作为参数:
re.sub(r'^(\d{1,3})$', r'\1;', str_file_content, flags=re.M)