如何让 python 将瑞典字母（åäö）写入 html 文件？ [复制]答案

【问题标题】：How do i get python to write swedish letters(åäö) into a html file? [duplicate]如何让 python 将瑞典字母（åäö）写入 html 文件？ [复制]
【发布时间】：2020-07-10 08:57:25
【问题描述】：

所以代码我已经将一个 HTML 文件复制到一个字符串中，然后将除普通文本和 cmets 之外的所有内容都更改为小写。问题是它还将 åäö 更改为 VS 代码无法识别的内容。我能找到的是它的编码问题，但在 py3 上找不到任何关于它的信息，而且我为 py2 找到的解决方案不起作用。任何帮助表示赞赏，如果您知道如何改进代码，请告诉我。

import re
import os


text_list = []

for root, dirs, files in os.walk("."):
    for filename in files:

        if (
            filename.endswith(".html")
        ):
            text_list.append(os.path.join(root, filename))

for file in text_list:

    file_content = open(f"{file}", "r+").read()

    if file.endswith(".html"):
        os.rename(file, file.replace(" ", "_").lower())
        code_strings = re.findall(r"<.+?>", file_content)
        for i, str in enumerate(code_strings):
            new_code_string = code_strings[i].lower()
            file_content = file_content.replace(code_strings[i], new_code_string)

    else:
        os.rename(file, file.replace(" ", "_").lower())
        file_content = file_content.lower()

    open(f"{file}", "r+").write(file_content)

【问题讨论】：

您应该使用编码打开文件，请参阅stackoverflow.com/questions/147741/…
欢迎来到 SO！您能否也将文本添加到您的问题中，以便我们检查行为？肯定是编码的问题
使用例如open(file, 'r+', encoding='utf-8')。如果您不指定编码，python 将默认为您的系统编码，这可能与文件中使用的不同。您的系统编码由import locale; locale.getpreferredencoding(False) 给出。

标签： python html encoding character-encoding

【解决方案1】：

使用 codecs 打开您的文件并使用 Unicode 编码。示例：

import codecs
codecs.open('your_filename_here', encoding='utf-8', mode='w+')

文档：Python Unicode Docs

【讨论】：