如何使用 Python 的“unittest”对写入文件的函数进行单元测试答案

【问题标题】：How to do unit testing of functions writing files using Python's 'unittest'如何使用 Python 的“unittest”对写入文件的函数进行单元测试
【发布时间】：2011-04-25 23:06:30
【问题描述】：

我有一个将输出文件写入磁盘的 Python 函数。

我想使用 Python 的 unittest 模块为它编写一个单元测试。

我应该如何断言文件相等？如果文件内容与预期的一个+差异列表不同，我想得到一个错误。就像 Unix diff 命令的输出一样。

有官方或推荐的方法吗？

【问题讨论】：

标签： python unit-testing file

【解决方案1】：

最简单的事情是写输出文件，然后读取它的内容，读取黄金（预期）文件的内容，并用简单的字符串相等比较它们。如果它们相同，请删除输出文件。如果它们不同，请提出断言。

这样，当测试完成后，每个失败的测试都将用一个输出文件表示，你可以使用第三方工具将它们与黄金文件进行比较（Beyond Compare 非常适合这一点）。

如果您真的想提供自己的差异输出，请记住 Python 标准库具有 difflib 模块。 Python 3.1 中新的 unittest 支持包括一个 assertMultiLineEqual 方法，该方法使用它来显示差异，类似于：

    def assertMultiLineEqual(self, first, second, msg=None):
        """Assert that two multi-line strings are equal.

        If they aren't, show a nice diff.

        """
        self.assertTrue(isinstance(first, str),
                'First argument is not a string')
        self.assertTrue(isinstance(second, str),
                'Second argument is not a string')

        if first != second:
            message = ''.join(difflib.ndiff(first.splitlines(True),
                                                second.splitlines(True)))
            if msg:
                message += " : " + msg
            self.fail("Multi-line strings are unequal:\n" + message)

【讨论】：

不，总体而言，最好的方法是不要写入可能很慢且容易出错的文件（prod env 可能与 test/CI env 完全不同，例如 Windows 与 OSX），但是而是使用unittest.mock 模拟对open 的调用，如本页其他答案中所述（请参阅Enrico M 的答案）

【解决方案2】：

您可以将内容生成与文件处理分开。这样，您就可以测试内容是否正确，而无需处理临时文件并在事后清理它们。

如果您编写一个产生每一行内容的generator method，那么您可以有一个文件处理方法，它打开一个文件并使用行序列调用file.writelines()。这两种方法甚至可以在同一个类上：测试代码调用生成器，生产代码调用文件处理程序。

以下示例显示了所有三种测试方法。通常，您只需选择一个，具体取决于要测试的类上可用的方法。

import os
from io import StringIO
from unittest.case import TestCase


class Foo(object):
    def save_content(self, filename):
        with open(filename, 'w') as f:
            self.write_content(f)

    def write_content(self, f):
        f.writelines(self.generate_content())

    def generate_content(self):
        for i in range(3):
            yield u"line {}\n".format(i)


class FooTest(TestCase):
    def test_generate(self):
        expected_lines = ['line 0\n', 'line 1\n', 'line 2\n']
        foo = Foo()

        lines = list(foo.generate_content())

        self.assertEqual(expected_lines, lines)

    def test_write(self):
        expected_text = u"""\
line 0
line 1
line 2
"""
        f = StringIO()
        foo = Foo()

        foo.write_content(f)

        self.assertEqual(expected_text, f.getvalue())

    def test_save(self):
        expected_text = u"""\
line 0
line 1
line 2
"""
        foo = Foo()

        filename = 'foo_test.txt'
        try:
            foo.save_content(filename)

            with open(filename, 'rU') as f:
                text = f.read()
        finally:
            os.remove(filename)

        self.assertEqual(expected_text, text)

【讨论】：

您能提供示例代码吗？听起来很有趣。
我为所有三种方法添加了一个示例，@buhtz。

【解决方案3】：

我更喜欢让输出函数明确接受文件句柄（或类似文件的对象），而不是接受文件名称和自己打开文件。这样，我可以在单元测试中将StringIO 对象传递给输出函数，然后将.read() 从该StringIO 对象返回的内容（在.seek(0) 调用之后）并与我的预期输出进行比较。

例如，我们会像这样转换代码

##File:lamb.py
import sys


def write_lamb(outfile_path):
    with open(outfile_path, 'w') as outfile:
        outfile.write("Mary had a little lamb.\n")


if __name__ == '__main__':
    write_lamb(sys.argv[1])



##File test_lamb.py
import unittest
import tempfile

import lamb


class LambTests(unittest.TestCase):
    def test_lamb_output(self):
        outfile_path = tempfile.mkstemp()[1]
        try:
            lamb.write_lamb(outfile_path)
            contents = open(tempfile_path).read()
        finally:
            # NOTE: To retain the tempfile if the test fails, remove
            # the try-finally clauses
            os.remove(outfile_path)
        self.assertEqual(contents, "Mary had a little lamb.\n")

像这样写代码

##File:lamb.py
import sys


def write_lamb(outfile):
    outfile.write("Mary had a little lamb.\n")


if __name__ == '__main__':
    with open(sys.argv[1], 'w') as outfile:
        write_lamb(outfile)



##File test_lamb.py
import unittest
from io import StringIO

import lamb


class LambTests(unittest.TestCase):
    def test_lamb_output(self):
        outfile = StringIO()
        # NOTE: Alternatively, for Python 2.6+, you can use
        # tempfile.SpooledTemporaryFile, e.g.,
        #outfile = tempfile.SpooledTemporaryFile(10 ** 9)
        lamb.write_lamb(outfile)
        outfile.seek(0)
        content = outfile.read()
        self.assertEqual(content, "Mary had a little lamb.\n")

这种方法还有一个额外的好处，就是让你的输出函数更加灵活，例如，如果你决定不想写入文件，而是写入其他缓冲区，因为它将接受所有类似文件的对象。

请注意，使用StringIO 假设测试输出的内容可以放入主内存。对于非常大的输出，您可以使用temporary file 方法（例如，tempfile.SpooledTemporaryFile）。

【讨论】：

这比将文件写入磁盘要好。如果您正在运行大量的单元测试，那么 IO 到磁盘会导致各种问题，尤其是试图清理它们。我有测试写入磁盘，tearDown 删除了写入的文件。测试一次可以正常工作，然后在全部运行时失败。至少在 Win 机器上使用 Visual Studio 和 PyTools。还有速度。
虽然这是一个很好的测试单独功能的解决方案，但在测试程序提供的实际接口（例如 CLI 工具）时仍然很麻烦..
我收到错误：TypeError: unicode argument expected, got 'str'
我来到这里是因为我正在尝试编写单元测试以逐个文件地读取和读取分区拼花数据集。这需要解析文件路径以获取键/值对，以将分区的适当值分配给（最终）生成的 pandas DataFrame。写入缓冲区虽然不错，但无法解析分区值。
@PMende 听起来您正在使用需要与实际文件系统交互的 API。单元测试并不总是合适的测试级别。可以不在单元测试级别测试代码的所有部分；也应该在适当的地方使用集成或系统测试。不过，尽量包含这些部分，并尽可能在边界之间传递简单的值。见youtube.com/watch?v=eOYal8elnZk

【解决方案4】：

根据建议我做了以下。

class MyTestCase(unittest.TestCase):
    def assertFilesEqual(self, first, second, msg=None):
        first_f = open(first)
        first_str = first_f.read()
        second_f = open(second)
        second_str = second_f.read()
        first_f.close()
        second_f.close()

        if first_str != second_str:
            first_lines = first_str.splitlines(True)
            second_lines = second_str.splitlines(True)
            delta = difflib.unified_diff(first_lines, second_lines, fromfile=first, tofile=second)
            message = ''.join(delta)

            if msg:
                message += " : " + msg

            self.fail("Multi-line strings are unequal:\n" + message)

我创建了一个子类 MyTestCase，因为我有很多需要读/写文件的函数，所以我真的需要有可重用的断言方法。现在在我的测试中，我会继承 MyTestCase 而不是 unittest.TestCase。

你怎么看？

【讨论】：

见stackoverflow.com/questions/4617034/…

【解决方案5】：

import filecmp

然后

self.assertTrue(filecmp.cmp(path1, path2))

【讨论】：

default 进行shallow 比较，仅检查文件元数据（mtime、大小等）。请在您的示例中添加shallow=False。
另外，结果是cached。

【解决方案6】：

我总是尽量避免将文件写入磁盘，即使它是专门用于我的测试的临时文件夹：不实际接触磁盘会使您的测试更快，尤其是当您在代码中与文件进行大量交互时。

假设您在一个名为main.py 的文件中拥有这个“惊人”的软件：

"""
main.py
"""

def write_to_file(text):
    with open("output.txt", "w") as h:
        h.write(text)

if __name__ == "__main__":
    write_to_file("Every great dream begins with a dreamer.")

要测试write_to_file 方法，您可以在名为test_main.py 的同一文件夹中的文件中编写类似这样的内容：

"""
test_main.py
"""
from unittest.mock import patch, mock_open

import main


def test_do_stuff_with_file():
    open_mock = mock_open()
    with patch("main.open", open_mock, create=True):
        main.write_to_file("test-data")

    open_mock.assert_called_with("output.txt", "w")
    open_mock.return_value.write.assert_called_once_with("test-data")

【讨论】：