Unicode 字符在 preg_match_all 中不起作用答案

【问题标题】：Unicode characters not working in preg_match_allUnicode 字符在 preg_match_all 中不起作用
【发布时间】：2017-08-04 16:50:13
【问题描述】：

我正在尝试查找文件中字符串的出现次数。但是文件中充满了用 Unicode 字符编写的句子。

function probability($next,$now){
            $text_file = file_get_contents("temp/train_set.txt");
            $ans = preg_match_all("/\b$now $next\b/i", $text_file);
            echo $ans."<br>";
}

$text_file 变量查找所有句子并打印 unicode 句子就好了（我用 echo 看到了）。

$now 和 $next 是两个 unicode 字符串。如 $now="আমি" 和 $next="ভাত"。然后结果是 0 ，但我的文件中有两个字符串。

但是每当我把$now 和$next 两个英文字符串放在一起。它为我提供了实际计数。每当我将 unicode 单词放入 $now 和 $next 时都会出现问题。我不知道我的问题应该是“如何使 preg_match_all 支持孟加拉语 unicode 字符”

有什么问题可以问我

谢谢

【问题讨论】：

标签： php string unicode preg-match-all

【解决方案1】：

使用/u 标志（unicode）：

$ans = preg_match_all("/\b$now $next\b/ui", $text_file);
//                              here __^

【讨论】：

谢谢，但现在混合词还有另一个问题。像英语=“START”和“END”和孟加拉语Unicode“আমি”和“খাই”。现在如果我这样输入“আমি খাই”或“খাই END”-> OK。但是“开始আমি”->不行。我的意思是如果我把英文字母放在开头，那根本不算数。
@NahidHossain: 删除单词边界\b