如何为单个字母 ASCII 字符串（值 0-127）键入规范？答案

【问题标题】：How can I typespec for a single letter ASCII string (value 0-127)?如何为单个字母 ASCII 字符串（值 0-127）键入规范？
【发布时间】：2018-06-17 14:44:57
【问题描述】：

同样，我如何为“单个”UTF8 字符指定类型？

在类型定义中，我可以使用通用的“任何字符串”或“任何 utf8 字符串”

@type tile :: String.t # matches any string
@type tile :: <<_::8>> # matches any single byte

但似乎我无法匹配第一位为 0

@type tile :: <<0::1, _::7>>

单个 UTF 位序列的情况是

@type tile :: <<0::1, _::7>> | 
              <<6::3, _::5, 2::2, _::6>> | 
              <<14::4, _::4, 2::2, _::6, 2::2, _::6>> |
              <<30::5, _::3, 2::2, _::6, 2::2, _::6, 2::2, _::6>>

（这些位模式在使用模式匹配时匹配，例如

<<14::4, _::4, 2::2, _::6, 2::2, _::6>> = "○"

成功。）

但是在 typespecs 中使用时，编译器会抱怨很多

== Compilation error in file lib/board.ex ==
** (ArgumentError) argument error
    (elixir) lib/kernel/typespec.ex:1000: Kernel.Typespec.typespec/3
    (elixir) lib/kernel/typespec.ex:1127: anonymous fn/4 in Kernel.Typespec.typespec/3
    (elixir) lib/enum.ex:1899: Enum."-reduce/3-lists^foldl/2-0-"/3
    (elixir) lib/kernel/typespec.ex:1127: Kernel.Typespec.typespec/3
    (elixir) lib/kernel/typespec.ex:828: anonymous fn/4 in Kernel.Typespec.typespec/3
    (elixir) lib/enum.ex:1899: Enum."-reduce/3-lists^foldl/2-0-"/3
    (elixir) lib/kernel/typespec.ex:828: Kernel.Typespec.typespec/3
    (elixir) lib/kernel/typespec.ex:470: Kernel.Typespec.translate_type/3

有没有办法对这样的位模式进行类型规范？

【问题讨论】：

我认为不可能。我最好的猜测是指定一个范围0..127::8，但我认为它不会起作用。
鉴于我在透析器文档中看到的内容，char() 类型规范似乎最接近您想要的，但仍然允许 0..255（而不仅仅是 0..127范围）。
@OnorioCatenacci 确实如此。我真的很想匹配一个“单一”的 UTF8 代码点，它可以从 8 位到 32 位不等，具有特定的位模式，所以 char() 不会这样做。
好的，所以我们可能仍然需要更多的上下文来确定为什么一个可以使用模式匹配的简单验证函数（我们知道它有效）不能满足您的目的。能够对这个 utf8 字符进行 typespec 将如何帮助您执行类型检查？据我了解，Erlang 和 Elixir 的编译器不会观察到类型不匹配（除了像您提供的那样的编译错误），所以我认为类型检查是为了您自己的内部审查。

标签： erlang elixir dialyzer

【解决方案1】：

You cannot typespec on binary patterns 仅适用于二进制文件。即使您可以定义此类规范，我也不认为 Dialyzer 足够复杂，无法在此类匹配中发现故障。您只剩下在运行时使用守卫和模式匹配来实现此类行为，例如：

def unicode?(<<0::size(1), a::size(7)>>), do: true
def unicode?(<<6::3, _::5, 2::2, _::6>>), do: true 
def unicode?(<<14::4, _::4, 2::2, _::6, 2::2, _::6>>), do: true
def unicode?(<<30::5, _::3, 2::2, _::6, 2::2, _::6, 2::2, _::6>>), do: true
def unicode?(str) when is_binary(str), do: false

不幸的是，据我所知，没有办法在守卫中使用位模式，您只能使用binary_part/3 匹配整个字节，但没有功能可以对位执行相同的操作。所以你能得到的最接近的是这样的（未经测试这是否有效，甚至可以编译，但让你大致了解什么是可能的）：

defguardp is_valid_utf_part(code) when code in 0b10000000..0b10111111

defguard is_unicode(<<ascii>>) when ascii in 0b0000000..0b01111111
defguard is_unicode(<<first, second>>)
  when first in 0b11000000..0b11011111
   and is_valid_utf_part(second)
defguard is_unicode(<<first, second, third>>)
  when first in 0b11100000..0b11101111
   and is_valid_utf_part(second)
   and is_valid_utf_part(third)
defguard is_unicode(<<first, second, third, fourth>>)
  when first in 0b11110000..0b11110111
   and is_valid_utf_part(second)
   and is_valid_utf_part(third)
   and is_valid_utf_part(fourth)

【讨论】：