获取文件大小的人类可读版本？答案

【问题标题】：Get human readable version of file size?获取文件大小的人类可读版本？
【发布时间】：2010-11-08 20:28:17
【问题描述】：

从字节大小返回人类可读大小的函数：

>>> human_readable(2048)
'2 kilobytes'
>>>

如何做到这一点？

【问题讨论】：

我认为这属于“任务太小，不需要库”的标题。如果您查看 hurry.filesize 的源代码，就会发现只有一个函数，有十几行代码。甚至可以压缩。
使用库的优点是它通常经过测试（包含可以在编辑引入错误时运行的测试）。如果您添加测试，那么它不再是“几十行代码”:-)
python 社区的重新发明轮子的数量是疯狂和荒谬的。只需 ls -h /path/to/file.ext 即可完成这项工作。话虽如此，公认的答案做得很好。工藤。
2048 bytes = 2 kibibytes（不是千字节）。

标签： python code-snippets filesize

【解决方案1】：

通过简单的实现（使用 f-strings，因此 Python 3.6+）解决上述“任务太小，不需要库”问题：

def sizeof_fmt(num, suffix="B"):
    for unit in ["", "Ki", "Mi", "Gi", "Ti", "Pi", "Ei", "Zi"]:
        if abs(num) < 1024.0:
            return f"{num:3.1f}{unit}{suffix}"
        num /= 1024.0
    return f"{num:.1f}Yi{suffix}"

支持：

所有当前已知的binary prefixes
负数和正数
大于 1000 Yobibytes 的数字
任意单位（也许您喜欢以 Gibibits 计算！）

例子：

>>> sizeof_fmt(168963795964)
'157.4GiB'

Fred Cirera

【讨论】：

数字和单位之间应该有空格。如果您要输出 html 或 latex，它应该是一个不间断的空格。
只是一个想法，但是对于除B 以外的任何（？）后缀（即对于字节以外的单位），您希望因子为1000.0 而不是1024.0 不是吗？
如果您想提高小数部分的精度，请将第 4 行和第 6 行的 1 更改为您想要的任何精度。
酷！我非常喜欢它，我将它转换为 Go lang：play.golang.org/p/68w_QCsE4F
如果这个“任务太小”的所有迭代都被捕获并封装到一个带有测试的库中，那肯定会很好。

【解决方案2】：

具有您正在寻找的所有功能的库是humanize。 humanize.naturalsize() 似乎可以满足您的所有需求。

【讨论】：

一些使用 OP 数据的例子：humanize.naturalsize(2048) # => '2.0 kB', humanize.naturalsize(2048, binary=True) # => '2.0 KiB' humanize.naturalsize(2048, gnu=True) # => '2.0K'

【解决方案3】：

以下适用于 Python 3.6+，在我看来，这里是最容易理解的答案，并允许您自定义使用的小数位数。

def human_readable_size(size, decimal_places=2):
    for unit in ['B', 'KiB', 'MiB', 'GiB', 'TiB', 'PiB']:
        if size < 1024.0 or unit == 'PiB':
            break
        size /= 1024.0
    return f"{size:.{decimal_places}f} {unit}"

【讨论】：

我喜欢这个，但是在负值的if条件中添加abs()函数。

【解决方案4】：

这是我的版本。它不使用 for 循环。它具有恒定的复杂度，O(1)，理论上比这里使用 for 循环的答案更有效。

from math import log
unit_list = zip(['bytes', 'kB', 'MB', 'GB', 'TB', 'PB'], [0, 0, 1, 2, 2, 2])
def sizeof_fmt(num):
    """Human friendly file size"""
    if num > 1:
        exponent = min(int(log(num, 1024)), len(unit_list) - 1)
        quotient = float(num) / 1024**exponent
        unit, num_decimals = unit_list[exponent]
        format_string = '{:.%sf} {}' % (num_decimals)
        return format_string.format(quotient, unit)
    if num == 0:
        return '0 bytes'
    if num == 1:
        return '1 byte'

为了更清楚发生了什么，我们可以省略字符串格式化的代码。以下是实际工作的行：

exponent = int(log(num, 1024))
quotient = num / 1024**exponent
unit_list[exponent]

【讨论】：

当您谈论优化这么短的代码时，为什么不使用 if/elif/else 呢？除非您期望负文件大小，否则最后一次检查 num==1 是不必要的。否则：干得好，我喜欢这个版本。
我的代码肯定可以更优化。然而，我的观点是要证明这个任务可以通过不断的复杂性来解决。
for 循环的答案也是 O(1)，因为 for 循环是有界的——它们的计算时间不会随着输入的大小而缩放（我们没有无限的 SI 前缀)。
可能应该为格式添加一个逗号，所以1000 会显示为1,000 bytes。
请注意，使用 Python 3 时，zip 会返回一个迭代器，因此需要用 list() 包装它。 unit_list = list(zip(['bytes', 'kB', 'MB', 'GB', 'TB', 'PB'], [0, 0, 1, 2, 2, 2]))

【解决方案5】：

总会有这样的人之一。好吧，今天是我。这是一行 - 如果算上函数签名，则为两行。

def human_size(bytes, units=[' bytes','KB','MB','GB','TB', 'PB', 'EB']):
    """ Returns a human readable string representation of bytes """
    return str(bytes) + units[0] if bytes < 1024 else human_size(bytes>>10, units[1:])

>>> human_size(123)
123 bytes
>>> human_size(123456789)
117GB

如果您需要大于 Exabyte 的大小，那就有点麻烦了：

def human_size(bytes, units=[' bytes','KB','MB','GB','TB', 'PB', 'EB']):
    return str(bytes) + units[0] if bytes < 1024 else human_size(bytes>>10, units[1:]) if units[1:] else f'{bytes>>10}ZB'

【讨论】：

仅供参考，输出将始终向下舍入。
为方法内的单位分配默认列表以避免使用列表作为默认参数不是更好吗？（并改用units=None）
@ImanolEizaguirre 最佳实践表明按照您的建议去做是个好主意，因此您不会无意中将错误引入程序。但是，编写的这个函数是安全的，因为单元列表从未被操作过。如果它被操纵，则更改将是永久性的，并且任何后续函数调用都将收到列表的操纵版本作为单位参数的默认参数。
对于 Python 3，如果需要小数点，请改用：``` def human_size(fsize, units=[' bytes','KB','MB','GB', 'TB', 'PB', 'EB']): return "{:.2f}{}".format(float(fsize), units[0]) if fsize
@OmerTuchfeld +1 因为这样生成的大小更准确。另外，我对单位是否应该称为 KiB、MiB 等存在分歧。

【解决方案6】：

我最近想出了一个避免循环的版本，使用log2 来确定大小顺序，它兼作移位和后缀列表的索引：

from math import log2

_suffixes = ['bytes', 'KiB', 'MiB', 'GiB', 'TiB', 'PiB', 'EiB', 'ZiB', 'YiB']

def file_size(size):
    # determine binary order in steps of size 10 
    # (coerce to int, // still returns a float)
    order = int(log2(size) / 10) if size else 0
    # format file size
    # (.4g results in rounded numbers for exact matches and max 3 decimals, 
    # should never resort to exponent values)
    return '{:.4g} {}'.format(size / (1 << (order * 10)), _suffixes[order])

不过，由于它的可读性，很可能被认为是 unpythonic。

【讨论】：

虽然我喜欢 log2 的东西，但你应该处理 size == 0！
您需要将size 或(1 << (order * 10) 包裹在最后一行的float() 中（对于python 2）。

【解决方案7】：

如果你安装了 Django，你也可以试试filesizeformat:

from django.template.defaultfilters import filesizeformat
filesizeformat(1073741824)

=>

"1.0 GB"

【讨论】：

对我来说这样做的一个缺点是它使用 GB 而不是 GiB，即使它除以 1024。
以前从未听说过 GiB，它看起来很傻。有没有人每个人都使用 10^3 的实数存储来做任何有用的事情？没人说Mebibyte，我们都说MegaByte

【解决方案8】：

你应该使用“人性化”。

>>> humanize.naturalsize(1000000)
'1.0 MB'
>>> humanize.naturalsize(1000000, binary=True)
'976.6 KiB'
>>> humanize.naturalsize(1000000, gnu=True)
'976.6K'

参考：

https://pypi.org/project/humanize/

【讨论】：

【解决方案9】：

一个这样的库是hurry.filesize。

>>> from hurry.filesize import alternative
>>> size(1, system=alternative)
'1 byte'
>>> size(10, system=alternative)
'10 bytes'
>>> size(1024, system=alternative)
'1 KB'

【讨论】：

然而，这个库不是很可定制的。 >>> from hurry.filesize import size >>> size(1031053) >>> size(3033053) '2M' 我希望它显示，例如，'2.4M' 或 '2423K' .. 而不是公然近似的 ' 2M'。
还请注意，如果您正在处理依赖系统等问题，只需从 hurry.filesize 中获取代码并将其直接放入您自己的代码中是非常容易的。它与人们在这里提供的 sn-ps 一样短。
@SridharRatnakumar，要巧妙地解决过度近似问题，请参阅我的数学hack。该方法是否可以进一步改进？

【解决方案10】：

使用 1000 或 kibibytes 的幂会更符合标准：

def sizeof_fmt(num, use_kibibyte=True):
    base, suffix = [(1000.,'B'),(1024.,'iB')][use_kibibyte]
    for x in ['B'] + map(lambda x: x+suffix, list('kMGTP')):
        if -base < num < base:
            return "%3.1f %s" % (num, x)
        num /= base
    return "%3.1f %s" % (num, x)

附：永远不要相信一个以 K（大写）后缀打印数千个的库:)

【讨论】：

P.S. Never trust a library that prints thousands with the K (uppercase) suffix :) 为什么不呢？该代码可能非常合理，而作者只是没有考虑公斤的大小写。根据您的规则自动关闭任何代码似乎很愚蠢......

【解决方案11】：

HumanFriendly 项目帮助with this。

import humanfriendly
humanfriendly.format_size(1024)

以上代码将给出 1KB 作为答案。
示例can be found here。

【讨论】：

我觉得这个和下面的人性化是对OP的唯一真正答案

【解决方案12】：

这几乎可以在任何情况下满足您的需求，可以使用可选参数进行自定义，并且如您所见，相当是自我记录的：

from math import log
def pretty_size(n,pow=0,b=1024,u='B',pre=['']+[p+'i'for p in'KMGTPEZY']):
    pow,n=min(int(log(max(n*b**pow,1),b)),len(pre)-1),n*b**pow
    return "%%.%if %%s%%s"%abs(pow%(-pow-1))%(n/b**float(pow),pre[pow],u)

示例输出：

>>> pretty_size(42)
'42 B'

>>> pretty_size(2015)
'2.0 KiB'

>>> pretty_size(987654321)
'941.9 MiB'

>>> pretty_size(9876543210)
'9.2 GiB'

>>> pretty_size(0.5,pow=1)
'512 B'

>>> pretty_size(0)
'0 B'

高级自定义：

>>> pretty_size(987654321,b=1000,u='bytes',pre=['','kilo','mega','giga'])
'987.7 megabytes'

>>> pretty_size(9876543210,b=1000,u='bytes',pre=['','kilo','mega','giga'])
'9.9 gigabytes'

此代码同时兼容 Python 2 和 Python 3。 PEP8 合规性是读者的练习。请记住，漂亮的是输出。

更新：

如果您需要数千个逗号，只需应用明显的扩展名：

def prettier_size(n,pow=0,b=1024,u='B',pre=['']+[p+'i'for p in'KMGTPEZY']):
    r,f=min(int(log(max(n*b**pow,1),b)),len(pre)-1),'{:,.%if} %s%s'
    return (f%(abs(r%(-r-1)),pre[r],u)).format(n*b**pow/b**float(r))

例如：

>>> pretty_units(987654321098765432109876543210)
'816,968.5 YiB'

【讨论】：

【解决方案13】：

在作为 hurry.filesize() 的替代方法提供的 sn-p 上进行复习，这里有一个 sn-p，它根据使用的前缀给出不同的精度数字。它不像一些 sn-ps 那样简洁，但我喜欢结果。

def human_size(size_bytes):
    """
    format a size in bytes into a 'human' file size, e.g. bytes, KB, MB, GB, TB, PB
    Note that bytes/KB will be reported in whole numbers but MB and above will have greater precision
    e.g. 1 byte, 43 bytes, 443 KB, 4.3 MB, 4.43 GB, etc
    """
    if size_bytes == 1:
        # because I really hate unnecessary plurals
        return "1 byte"

    suffixes_table = [('bytes',0),('KB',0),('MB',1),('GB',2),('TB',2), ('PB',2)]

    num = float(size_bytes)
    for suffix, precision in suffixes_table:
        if num < 1024.0:
            break
        num /= 1024.0

    if precision == 0:
        formatted_size = "%d" % num
    else:
        formatted_size = str(round(num, ndigits=precision))

    return "%s %s" % (formatted_size, suffix)

【讨论】：

【解决方案14】：

借鉴所有先前的答案，这是我的看法。它是一个将文件大小以字节为单位存储为整数的对象。但是当您尝试打印对象时，您会自动获得人类可读的版本。

class Filesize(object):
    """
    Container for a size in bytes with a human readable representation
    Use it like this::

        >>> size = Filesize(123123123)
        >>> print size
        '117.4 MB'
    """

    chunk = 1024
    units = ['bytes', 'KB', 'MB', 'GB', 'TB', 'PB']
    precisions = [0, 0, 1, 2, 2, 2]

    def __init__(self, size):
        self.size = size

    def __int__(self):
        return self.size

    def __str__(self):
        if self.size == 0: return '0 bytes'
        from math import log
        unit = self.units[min(int(log(self.size, self.chunk)), len(self.units) - 1)]
        return self.format(unit)

    def format(self, unit):
        if unit not in self.units: raise Exception("Not a valid file size unit: %s" % unit)
        if self.size == 1 and unit == 'bytes': return '1 byte'
        exponent = self.units.index(unit)
        quotient = float(self.size) / self.chunk**exponent
        precision = self.precisions[exponent]
        format_string = '{:.%sf} {}' % (precision)
        return format_string.format(quotient, unit)

【讨论】：

【解决方案15】：

现代 Django 有自我模板标签filesizeformat:

将值格式化为human-readable 文件大小（即“13 KB”、“4.1 MB”、“102 字节”等）。

例如：

{{ value|filesizeformat }}

如果值为 123456789，则输出为 117.7 MB。

【讨论】：

from django.template.defaultfilters import filesizeformat; filesizeformat(1024*400) 这非常有用，谢谢！

【解决方案16】：

我喜欢senderle's decimal version 的固定精度，所以这是与上面joctee 答案的一种混合（你知道你可以使用非整数基数获取日志吗？）：

from math import log
def human_readable_bytes(x):
    # hybrid of https://stackoverflow.com/a/10171475/2595465
    #      with https://stackoverflow.com/a/5414105/2595465
    if x == 0: return '0'
    magnitude = int(log(abs(x),10.24))
    if magnitude > 16:
        format_str = '%iP'
        denominator_mag = 15
    else:
        float_fmt = '%2.1f' if magnitude % 3 == 1 else '%1.2f'
        illion = (magnitude + 1) // 3
        format_str = float_fmt + ['', 'K', 'M', 'G', 'T', 'P'][illion]
    return (format_str % (x * 1.0 / (1024 ** illion))).lstrip('0')

【讨论】：

【解决方案17】：

简单的 2 班轮怎么样：

def humanizeFileSize(filesize):
    p = int(math.floor(math.log(filesize, 2)/10))
    return "%.3f%s" % (filesize/math.pow(1024,p), ['B','KiB','MiB','GiB','TiB','PiB','EiB','ZiB','YiB'][p])

下面是它的工作原理：

计算日志₂（文件大小）
除以 10 得到最近的单位。（如大小为5000字节，最接近的单位是Kb，所以答案应该是X KiB）
与单位一起返回file_size/value_of_closest_unit。

但是，如果文件大小为 0 或负数，则它不起作用（因为未定义 0 和 -ve 数字的日志）。您可以为它们添加额外的检查：

def humanizeFileSize(filesize):
    filesize = abs(filesize)
    if (filesize==0):
        return "0 Bytes"
    p = int(math.floor(math.log(filesize, 2)/10))
    return "%0.2f %s" % (filesize/math.pow(1024,p), ['Bytes','KiB','MiB','GiB','TiB','PiB','EiB','ZiB','YiB'][p])

示例：

>>> humanizeFileSize(538244835492574234)
'478.06 PiB'
>>> humanizeFileSize(-924372537)
'881.55 MiB'
>>> humanizeFileSize(0)
'0 Bytes'

注意 - Kb 和 KiB 之间存在差异。 KB 表示 1000 字节，而 KiB 表示 1024 字节。 KB、MB、GB 都是 1000 的倍数，而 KiB、MiB、GiB 等都是 1024 的倍数。More about it here

【讨论】：

【解决方案18】：

您将在下面找到的绝不是已发布的解决方案中性能最高或最短的解决方案。相反，它侧重于一个特定问题，而许多其他答案都忽略了这一问题。

即给出999_995这样的输入时的情况：

Python 3.6.1 ...
...
>>> value = 999_995
>>> base = 1000
>>> math.log(value, base)
1.999999276174054

其中，被截断为最接近的整数并应用回输入给出

>>> order = int(math.log(value, base))
>>> value/base**order
999.995

这似乎正是我们所期望的，直到我们需要控制输出精度。这就是事情开始变得有点困难的时候。

将精度设置为 2 位，我们得到：

>>> round(value/base**order, 2)
1000 # K

而不是1M。

我们该如何应对？

当然，我们可以明确检查：

if round(value/base**order, 2) == base:
    order += 1

但是我们可以做得更好吗？在我们做最后一步之前，我们能否知道order 应该以哪种方式切割？

事实证明我们可以。

假设 0.5 小数舍入规则，上述if 条件转换为：

导致

def abbreviate(value, base=1000, precision=2, suffixes=None):
    if suffixes is None:
        suffixes = ['', 'K', 'M', 'B', 'T']

    if value == 0:
        return f'{0}{suffixes[0]}'

    order_max = len(suffixes) - 1
    order = log(abs(value), base)
    order_corr = order - int(order) >= log(base - 0.5/10**precision, base)
    order = min(int(order) + order_corr, order_max)

    factored = round(value/base**order, precision)

    return f'{factored:,g}{suffixes[order]}'

给予

>>> abbreviate(999_994)
'999.99K'
>>> abbreviate(999_995)
'1M'
>>> abbreviate(999_995, precision=3)
'999.995K'
>>> abbreviate(2042, base=1024)
'1.99K'
>>> abbreviate(2043, base=1024)
'2K'

【讨论】：

那是一本非常好的读物，看到你的数学算法很有趣。不幸的是，正如您所指出的，它很慢。我之前在以下帖子中以高性能的方式解决了这个问题：stackoverflow.com/a/63839503/8874388

【解决方案19】：

为了以人类可读的形式获取文件大小，我创建了这个函数：

import os

def get_size(path):
    size = os.path.getsize(path)
    if size < 1024:
        return f"{size} bytes"
    elif size < 1024*1024:
        return f"{round(size/1024, 2)} KB"
    elif size < 1024*1024*1024:
        return f"{round(size/(1024*1024), 2)} MB"
    elif size < 1024*1024*1024*1024:
        return f"{round(size/(1024*1024*1024), 2)} GB"

>>> get_size("a.txt")
1.4KB

【讨论】：

【解决方案20】：

def human_readable_data_quantity(quantity, multiple=1024):
    if quantity == 0:
        quantity = +0
    SUFFIXES = ["B"] + [i + {1000: "B", 1024: "iB"}[multiple] for i in "KMGTPEZY"]
    for suffix in SUFFIXES:
        if quantity < multiple or suffix == SUFFIXES[-1]:
            if suffix == SUFFIXES[0]:
                return "%d%s" % (quantity, suffix)
            else:
                return "%.1f%s" % (quantity, suffix)
        else:
            quantity /= multiple

【讨论】：

【解决方案21】：

如果在Boltons 中提供此功能，它对于大多数项目来说是一个非常方便的库。

>>> bytes2human(128991)
'126K'
>>> bytes2human(100001221)
'95M'
>>> bytes2human(0, 2)
'0.00B'

【讨论】：

【解决方案22】：

这是一个使用while的选项：

def number_format(n):
   n2, n3 = n, 0
   while n2 >= 1e3:
      n2 /= 1e3
      n3 += 1
   return '%.3f' % n2 + ('', ' k', ' M', ' G')[n3]

s = number_format(9012345678)
print(s == '9.012 G')

https://docs.python.org/reference/compound_stmts.html#while

【讨论】：

【解决方案23】：

参考 Sridhar Ratnakumar 的回答，更新为：

def formatSize(sizeInBytes, decimalNum=1, isUnitWithI=False, sizeUnitSeperator=""):
  """format size to human readable string"""
  # https://en.wikipedia.org/wiki/Binary_prefix#Specific_units_of_IEC_60027-2_A.2_and_ISO.2FIEC_80000
  # K=kilo, M=mega, G=giga, T=tera, P=peta, E=exa, Z=zetta, Y=yotta
  sizeUnitList = ['','K','M','G','T','P','E','Z']
  largestUnit = 'Y'

  if isUnitWithI:
    sizeUnitListWithI = []
    for curIdx, eachUnit in enumerate(sizeUnitList):
      unitWithI = eachUnit
      if curIdx >= 1:
        unitWithI += 'i'
      sizeUnitListWithI.append(unitWithI)

    # sizeUnitListWithI = ['','Ki','Mi','Gi','Ti','Pi','Ei','Zi']
    sizeUnitList = sizeUnitListWithI

    largestUnit += 'i'

  suffix = "B"
  decimalFormat = "." + str(decimalNum) + "f" # ".1f"
  finalFormat = "%" + decimalFormat + sizeUnitSeperator + "%s%s" # "%.1f%s%s"
  sizeNum = sizeInBytes
  for sizeUnit in sizeUnitList:
      if abs(sizeNum) < 1024.0:
        return finalFormat % (sizeNum, sizeUnit, suffix)
      sizeNum /= 1024.0
  return finalFormat % (sizeNum, largestUnit, suffix)

示例输出为：

def testKb():
  kbSize = 3746
  kbStr = formatSize(kbSize)
  print("%s -> %s" % (kbSize, kbStr))

def testI():
  iSize = 87533
  iStr = formatSize(iSize, isUnitWithI=True)
  print("%s -> %s" % (iSize, iStr))

def testSeparator():
  seperatorSize = 98654
  seperatorStr = formatSize(seperatorSize, sizeUnitSeperator=" ")
  print("%s -> %s" % (seperatorSize, seperatorStr))

def testBytes():
  bytesSize = 352
  bytesStr = formatSize(bytesSize)
  print("%s -> %s" % (bytesSize, bytesStr))

def testMb():
  mbSize = 76383285
  mbStr = formatSize(mbSize, decimalNum=2)
  print("%s -> %s" % (mbSize, mbStr))

def testTb():
  tbSize = 763832854988542
  tbStr = formatSize(tbSize, decimalNum=2)
  print("%s -> %s" % (tbSize, tbStr))

def testPb():
  pbSize = 763832854988542665
  pbStr = formatSize(pbSize, decimalNum=4)
  print("%s -> %s" % (pbSize, pbStr))


def demoFormatSize():
  testKb()
  testI()
  testSeparator()
  testBytes()
  testMb()
  testTb()
  testPb()

  # 3746 -> 3.7KB
  # 87533 -> 85.5KiB
  # 98654 -> 96.3 KB
  # 352 -> 352.0B
  # 76383285 -> 72.84MB
  # 763832854988542 -> 694.70TB
  # 763832854988542665 -> 678.4199PB

【讨论】：

【解决方案24】：

这个解决方案也可能会吸引你，这取决于你的思维方式：

from pathlib import Path    

def get_size(path = Path('.')):
    """ Gets file size, or total directory size """
    if path.is_file():
        size = path.stat().st_size
    elif path.is_dir():
        size = sum(file.stat().st_size for file in path.glob('*.*'))
    return size

def format_size(path, unit="MB"):
    """ Converts integers to common size units used in computing """
    bit_shift = {"B": 0,
            "kb": 7,
            "KB": 10,
            "mb": 17,
            "MB": 20,
            "gb": 27,
            "GB": 30,
            "TB": 40,}
    return "{:,.0f}".format(get_size(path) / float(1 << bit_shift[unit])) + " " + unit

# Tests and test results
>>> get_size("d:\\media\\bags of fun.avi")
'38 MB'
>>> get_size("d:\\media\\bags of fun.avi","KB")
'38,763 KB'
>>> get_size("d:\\media\\bags of fun.avi","kb")
'310,104 kb'

【讨论】：