【问题标题】:Parse currency into numbers in Python在 Python 中将货币解析为数字
【发布时间】:2016-06-01 23:02:43
【问题描述】:

我刚刚从Format numbers as currency in Python 了解到,Python 模块babel 提供babel.numbers.format_currency 将数字格式化为货币。例如,

from babel.numbers import format_currency

s = format_currency(123456.789, 'USD', locale='en_US')  # u'$123,456.79'
s = format_currency(123456.789, 'EUR', locale='fr_FR')  # u'123\xa0456,79\xa0\u20ac'

反过来呢,从货币到数字,比如$123,456,789.00 --> 123456789babel 提供了babel.numbers.parse_number 来解析本地号码,但是我没有找到类似parse_currency 的东西。那么,将本地货币解析为数字的理想方法是什么?


我通过Python: removing characters except digits from string

# Way 1
import string
all=string.maketrans('','')
nodigs=all.translate(all, string.digits)

s = '$123,456.79'
n = s.translate(all, nodigs)    # 12345679, lost `.`

# Way 2
import re
n = re.sub("\D", "", s)         # 12345679

它不关心小数点分隔符.


从字符串中删除所有非数字字符,. 除外(请参阅here),

import re

# Way 1:
s = '$123,456.79'
n = re.sub("[^0-9|.]", "", s)   # 123456.79

# Way 2:
non_decimal = re.compile(r'[^\d.]+')
s = '$123,456.79'
n = non_decimal.sub('', s)      # 123456.79

它确实处理小数分隔符.


但上述解决方案在遇到时不起作用,例如,

from babel.numbers import format_currency
s = format_currency(123456.789, 'EUR', locale='fr_FR')  # u'123\xa0456,79\xa0\u20ac'
new_s = s.encode('utf-8') # 123 456,79 €

如您所见,货币的格式各不相同。 以一般方式将货币解析为数字的理想方法是什么?

【问题讨论】:

  • 你为什么会无缘无故被否决?
  • @leekaiinthesky,货币可能包含,.
  • @TigerhawkT3 不完全重复,因为. 仍然有意义。
  • @sparkandshine 你也想输入语言环境吗?或者你会知道要取出哪些字符(在这种情况下正则表达式就足够了)?
  • 这绝对不是重复的,从货币到十进制数字要复杂得多。

标签: python numbers currency-formatting


【解决方案1】:

下面是一个不依赖 babel 库的通用货币解析器。

import numpy as np
import re

def currency_parser(cur_str):
    # Remove any non-numerical characters
    # except for ',' '.' or '-' (e.g. EUR)
    cur_str = re.sub("[^-0-9.,]", '', cur_str)
    # Remove any 000s separators (either , or .)
    cur_str = re.sub("[.,]", '', cur_str[:-3]) + cur_str[-3:]

    if '.' in list(cur_str[-3:]):
        num = float(cur_str)
    elif ',' in list(cur_str[-3:]):
        num = float(cur_str.replace(',', '.'))
    else:
        num = float(cur_str)

    return np.round(num, 2)

这是一个测试函数的pytest脚本:

import numpy as np
import pytest
import re


def currency_parser(cur_str):
    # Remove any non-numerical characters
    # except for ',' '.' or '-' (e.g. EUR)
    cur_str = re.sub("[^-0-9.,]", '', cur_str)
    # Remove any 000s separators (either , or .)
    cur_str = re.sub("[.,]", '', cur_str[:-3]) + cur_str[-3:]

    if '.' in list(cur_str[-3:]):
        num = float(cur_str)
    elif ',' in list(cur_str[-3:]):
        num = float(cur_str.replace(',', '.'))
    else:
        num = float(cur_str)

    return np.round(num, 2)


@pytest.mark.parametrize('currency_str, expected', [
    (
            '.3', 0.30
    ),
    (
            '1', 1.00
    ),
    (
            '1.3', 1.30
    ),
    (
            '43,324', 43324.00
    ),
    (
            '3,424', 3424.00
    ),
    (
            '-0.00', 0.00
    ),
    (
            'EUR433,432.53', 433432.53
    ),
    (
            '25.675,26 EUR', 25675.26
    ),
    (
            '2.447,93 EUR', 2447.93
    ),
    (
            '-540,89EUR', -540.89
    ),
    (
            '67.6 EUR', 67.60
    ),
    (
            '30.998,63 CHF', 30998.63
    ),
    (
            '0,00 CHF', 0.00
    ),
    (
            '159.750,00 DKK', 159750.00
    ),
    (
            '£ 2.237,85', 2237.85
    ),
    (
            '£ 2,237.85', 2237.85
    ),
    (
            '-1.876,85 SEK', -1876.85
    ),
    (
            '59294325.3', 59294325.30
    ),
    (
            '8,53 NOK', 8.53
    ),
    (
            '0,09 NOK', 0.09
    ),
    (
            '-.9 CZK', -0.9
    ),
    (
            '35.255,40 PLN', 35255.40
    ),
    (
            '-PLN123.456,78', -123456.78
    ),
    (
            'US$123.456,79', 123456.79
    ),
    (
            '-PLN123.456,78', -123456.78
    ),
    (
            'PLN123.456,79', 123456.79
    ),
    (
            'IDR123.457', 123457
    ),
    (
            'JP¥123.457', 123457
    ),
    (
            '-JP\xc2\xa5123.457', -123457
    ),
    (
            'CN\xc2\xa5123.456,79', 123456.79
    ),
    (
            '-CN\xc2\xa5123.456,78', -123456.78
    ),
])
def test_currency_parse(currency_str, expected):
    assert currency_parser(currency_str) == expected

【讨论】:

    【解决方案2】:

    使用 babel

    babel 文档指出 the number parsing is not fully implemented yes 但他们已经做了很多工作来将货币信息输入库。您可以使用get_currency_name()get_currency_symbol() 获取货币详细信息,也可以使用所有其他get_... 函数获取普通数字详细信息(小数点、减号等)。

    使用该信息,您可以从货币字符串中排除货币详细信息(名称、符号)和分组(例如美国的,)。然后,您将小数详细信息更改为 C 语言环境使用的详细信息(- 用于减号,. 用于小数点)。

    这导致了这段代码(我添加了一个对象来保存一些数据,这可能会在进一步处理中派上用场):

    import re, os
    from babel import numbers as n
    from babel.core import default_locale
    
    class AmountInfo(object):
        def __init__(self, name, symbol, value):
            self.name = name
            self.symbol = symbol
            self.value = value
    
    def parse_currency(value, cur):
        decp = n.get_decimal_symbol()
        plus = n.get_plus_sign_symbol()
        minus = n.get_minus_sign_symbol()
        group = n.get_group_symbol()
        name = n.get_currency_name(cur)
        symbol = n.get_currency_symbol(cur)
        remove = [plus, name, symbol, group]
        for token in remove:
            # remove the pieces of information that shall be obvious
            value = re.sub(re.escape(token), '', value)
        # change the minus sign to a LOCALE=C minus
        value = re.sub(re.escape(minus), '-', value)
        # and change the decimal mark to a LOCALE=C decimal point
        value = re.sub(re.escape(decp), '.', value)
        # just in case remove extraneous spaces
        value = re.sub('\s+', '', value)
        return AmountInfo(name, symbol, value)
    
    #cur_loc = os.environ['LC_ALL']
    cur_loc = default_locale()
    print('locale:', cur_loc)
    test = [ (n.format_currency(123456.789, 'USD', locale=cur_loc), 'USD')
           , (n.format_currency(-123456.78, 'PLN', locale=cur_loc), 'PLN')
           , (n.format_currency(123456.789, 'PLN', locale=cur_loc), 'PLN')
           , (n.format_currency(123456.789, 'IDR', locale=cur_loc), 'IDR')
           , (n.format_currency(123456.789, 'JPY', locale=cur_loc), 'JPY')
           , (n.format_currency(-123456.78, 'JPY', locale=cur_loc), 'JPY')
           , (n.format_currency(123456.789, 'CNY', locale=cur_loc), 'CNY')
           , (n.format_currency(-123456.78, 'CNY', locale=cur_loc), 'CNY')
           ]
    
    for v,c in test:
        print('As currency :', c, ':', v.encode('utf-8'))
        info = parse_currency(v, c)
        print('As value    :', c, ':', info.value)
        print('Extra info  :', info.name.encode('utf-8')
                             , info.symbol.encode('utf-8'))
    

    输出看起来很有希望(在美国语言环境中):

    $ export LC_ALL=en_US
    $ ./cur.py
    locale: en_US
    As currency : USD : b'$123,456.79'
    As value    : USD : 123456.79
    Extra info  : b'US Dollar' b'$'
    As currency : PLN : b'-z\xc5\x82123,456.78'
    As value    : PLN : -123456.78
    Extra info  : b'Polish Zloty' b'z\xc5\x82'
    As currency : PLN : b'z\xc5\x82123,456.79'
    As value    : PLN : 123456.79
    Extra info  : b'Polish Zloty' b'z\xc5\x82'
    As currency : IDR : b'Rp123,457'
    As value    : IDR : 123457
    Extra info  : b'Indonesian Rupiah' b'Rp'
    As currency : JPY : b'\xc2\xa5123,457'
    As value    : JPY : 123457
    Extra info  : b'Japanese Yen' b'\xc2\xa5'
    As currency : JPY : b'-\xc2\xa5123,457'
    As value    : JPY : -123457
    Extra info  : b'Japanese Yen' b'\xc2\xa5'
    As currency : CNY : b'CN\xc2\xa5123,456.79'
    As value    : CNY : 123456.79
    Extra info  : b'Chinese Yuan' b'CN\xc2\xa5'
    As currency : CNY : b'-CN\xc2\xa5123,456.78'
    As value    : CNY : -123456.78
    Extra info  : b'Chinese Yuan' b'CN\xc2\xa5'
    

    而且它仍然适用于不同的语言环境(巴西以使用逗号作为小数点而著称):

    $ export LC_ALL=pt_BR
    $ ./cur.py 
    locale: pt_BR
    As currency : USD : b'US$123.456,79'
    As value    : USD : 123456.79
    Extra info  : b'D\xc3\xb3lar americano' b'US$'
    As currency : PLN : b'-PLN123.456,78'
    As value    : PLN : -123456.78
    Extra info  : b'Zloti polon\xc3\xaas' b'PLN'
    As currency : PLN : b'PLN123.456,79'
    As value    : PLN : 123456.79
    Extra info  : b'Zloti polon\xc3\xaas' b'PLN'
    As currency : IDR : b'IDR123.457'
    As value    : IDR : 123457
    Extra info  : b'Rupia indon\xc3\xa9sia' b'IDR'
    As currency : JPY : b'JP\xc2\xa5123.457'
    As value    : JPY : 123457
    Extra info  : b'Iene japon\xc3\xaas' b'JP\xc2\xa5'
    As currency : JPY : b'-JP\xc2\xa5123.457'
    As value    : JPY : -123457
    Extra info  : b'Iene japon\xc3\xaas' b'JP\xc2\xa5'
    As currency : CNY : b'CN\xc2\xa5123.456,79'
    As value    : CNY : 123456.79
    Extra info  : b'Yuan chin\xc3\xaas' b'CN\xc2\xa5'
    As currency : CNY : b'-CN\xc2\xa5123.456,78'
    As value    : CNY : -123456.78
    Extra info  : b'Yuan chin\xc3\xaas' b'CN\xc2\xa5'
    

    值得指出的是,babel 存在一些编码问题。这是因为语言环境文件(locale-data)本身确实使用不同的编码。如果您使用熟悉的货币,那应该不是问题。但是如果你尝试不熟悉的货币,你可能会遇到问题(我刚刚了解到波兰使用iso-8859-2,而不是iso-8859-1)。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2012-01-29
      • 1970-01-01
      • 1970-01-01
      • 2016-10-01
      • 2018-09-06
      • 1970-01-01
      • 1970-01-01
      • 2015-12-04
      相关资源
      最近更新 更多