给定的任务是受语法限制的 + 受监管程序的约束
机器学习需要这样一个超集训练数据集,以满足(霍夫丁不等式约束)预计错误率,这对于目前主要(几乎)不可能安排训练的低级别目标。
因此,即使是正则表达式工具也(几乎)在猜测,因为 E.164-“地址”的终端部分(几乎)对于全局地址空间是不可维护的。
概率性 ML 学习者可能会觉得在这里被利用有些意义,但同样 - 这些甚至会故意猜测(提供对每个此类猜测所达到的置信度水平的工作估计)。
为什么?
因为每个电话号码(在这里我们不假设词汇不规则和类似的外观细节)都必须符合一套全球法规(ITU-T 管辖),然后 - 在较低级别 - 受制于国家一整套规定(多方治理),最后还有两个截然不同的电话号码E.164-“地址”-分配程序,不是让故事变得简单一点。
ITU-T RFC 4725 - 简要介绍:
只是为了实现分布式规则的[ ITU-T [, NNPA [, CSP [, <privateAdmin> ]]]]-层次结构,将其引入(绝对语法 - 分布式治理)E.164 数字块分析(直至单个数字)。
RFC 4725 ENUM Validation Architecture November 2006
These two variants of E.164 number assignment are depicted in
Figure 2:
+--------------------------------------------+
| International Telecommunication Union (ITU)|
+--------------------------------------------+
|
Country codes (e.g., +44)
|
v
+-------------------------------------------+
| National Number Plan Administrator (NNPA) |------------+
+-------------------------------------------+ |
| |
Number Ranges |
(e.g., +44 20 7946 xxxx) |
| |
v |
+--------------------------------------+ |
| Communication Service Provider (CSP) | |
+--------------------------------------+ |
| |
| Single Numbers
Either Single Numbers (e.g., +44 909 8790879)
or Number Blocks (Variant 2)
(e.g., +44 20 7946 0999, +44 20 7946 07xx) |
(Variant 1) |
| |
v |
+----------+ |
| Assignee |<------------------------------+
+----------+
Figure 2: E.164 Number Assignment
(Note: Numbers above are "drama" numbers and are shown for
illustrative purpose only. Assignment polices for similar "real"
numbers in country code +44 may differ.)
As the Assignee (subscriber) data associated with an E.164 number is
the primary source of number assignment information, the NAE usually
holds the authoritative information required to confirm the
assignment.
A CSP that acts as NAE (indirect assignment) may therefore easily
assert the E.164 number assignment for its subscribers. In some
cases, such CSPs operate database(s) containing service information
on their subscribers' numbers.