比较 PHP 中的 Unicode 字符答案

【问题标题】：Comparing Unicode Characters in PHP比较 PHP 中的 Unicode 字符
【发布时间】：2013-01-01 19:23:38
【问题描述】：

我无法比较我认为应该完全相同的两个 unicode 字符。我怀疑它们以某种方式编码不同，但不知道如何将它们更改为相同的编码。

我要比较的字符来自缅甸 Unicode 块。我在 php 5 上运行 wordpress，并试图制作一个自定义插件来处理缅甸 Unicode。我所有的文件都是用 UTF-8 编码的，但是我不知道 wordpress 是做什么的。

这是我正在做的事情：

function myFunction( $inputText ) {
    $outputText = '';
    $inputTextArray = str_split($inputText);
    foreach($inputTextArray as $char) {
        if ($char == "က") // U+1000, a character from the Myanmar Unicode block 
            $outputText .= $char;
    }
    return $outputText;
}
add_filter( 'the_content', 'myFunction');

在解决问题的这个阶段，该函数应该只返回 က 它出现在内容中的位置。但是，它只返回空字符串，即使 က 明显存在于帖子内容中。如果我将字符更改为任何拉丁字符，该功能将按预期工作。

所以，我的问题是，我如何编码这些字符（$char 或"က"），以便当$char 包含此字符时，它们比较相等。

【问题讨论】：

请你添加 $value = unpack('H*', $inputText);回声 base_convert($value[1], 16, 2);回声$值；并使用输出？该函数将字符串转换为二进制。
oops - 固定转换为二进制。来自这里stackoverflow.com/questions/6382738/…
你用的是什么php版本？
好的，在帖子标题中，我只放了一个က，它返回111000011000000010000000Array。
PHP 为 5.3.10-1ubuntu3.4。

标签： php wordpress unicode utf-8

【解决方案1】：

str_split 不支持 Unicode。对于多字节字符，它将它们拆分为单个字符。尝试使用multi-byte string functions 或preg_split 和/u 开关

$inputTextArray = preg_split("//u", $inputText, -1, PREG_SPLIT_NO_EMPTY);

http://codepad.viper-7.com/ErFwcy

使用多字节函数mb_substr_count 也可以减少代码。像这样，

function myFunction( $inputText ) {
    return str_repeat("က", mb_substr_count($inputText, "က"));
}

或者使用正则表达式，

preg_match_all("/က/u", $text, $match);
$output = implode("", $match[0]);

【讨论】：

哇，就是这样。非常感谢。
默认情况下可能值得开启mbstring：php.net/manual/en/mbstring.overload.php
@Danack no 有很多原因，但最重要的是因为丢失了字节级函数
@BenSharon 查看更新，您需要的要简单得多。