从字符串中的数字拆分字母答案

【问题标题】：Splitting letters from numbers within a string从字符串中的数字拆分字母
【发布时间】：2013-03-12 10:19:42
【问题描述】：

我正在处理这样的字符串："125A12C15" 我需要在字母和数字之间的边界处拆分它们，例如这个应该变成["125","A","12","C","15"]。

在 Python 中，有没有比逐个位置检查它是字母还是数字，然后进行相应连接更优雅的方法呢？例如。这种东西的内置函数或模块？

感谢您的指点！

【问题讨论】：

以下 (SO) 文章准确回答了您的问题 ;) stackoverflow.com/questions/3340081/… gr, M.

标签： python string split

【解决方案1】：

将itertools.groupby 与str.isalpha 方法一起使用：

文档字符串：

groupby(iterable[, keyfunc]) -> 创建一个返回的迭代器 (key, sub-iterator) 按key(value)的每个值分组。

文档字符串：

S.isalpha() -> 布尔值

如果 S 中的所有字符都是字母，则返回 True 且 S 中至少有一个字符，否则为 False。

In [1]: from itertools import groupby

In [2]: s = "125A12C15"

In [3]: [''.join(g) for _, g in groupby(s, str.isalpha)]
Out[3]: ['125', 'A', '12', 'C', '15']

或者可能来自regular expressions module 的re.findall 或re.split：

In [4]: import re

In [5]: re.findall('\d+|\D+', s)
Out[5]: ['125', 'A', '12', 'C', '15']

In [6]: re.split('(\d+)', s)  # note that you may have to filter out the empty
                              # strings at the start/end if using re.split
Out[6]: ['', '125', 'A', '12', 'C', '15', '']

In [7]: re.split('(\D+)', s)
Out[7]: ['125', 'A', '12', 'C', '15']

至于性能，似乎使用正则表达式可能更快：

In [8]: %timeit re.findall('\d+|\D+', s*1000)
100 loops, best of 3: 2.15 ms per loop

In [9]: %timeit [''.join(g) for _, g in groupby(s*1000, str.isalpha)]
100 loops, best of 3: 8.5 ms per loop

In [10]: %timeit re.split('(\d+)', s*1000)
1000 loops, best of 3: 1.43 ms per loop

【讨论】：