使用 RegEx 提取 MySQL 中每个单词的第一个字符答案

【问题标题】：Extract first character of each word in MySQL using a RegEx使用 RegEx 提取 MySQL 中每个单词的第一个字符
【发布时间】：2014-02-07 01:52:45
【问题描述】：

在我的 MySQL 数据库中，我有一列 UTF-8 格式的字符串，例如，我想使用 RegEx 为其提取第一个字符。

假设一个正则表达式只提取以下字符：

ਹਮਜਰਣਚਕਨਖਲਨ

并给出以下字符串：

ਹੁਕਮਿ ਰਜਾਈ ਚਲਣਾ ਨਾਨਕ ਲਿਖਿਆ ਨਾਲਿ ॥੧॥

提取的唯一字符是：

ਹਰਚਨਲਨ

我知道解决此问题需要以下步骤：

使用空格作为分隔符将字符串分成单个单词（子字符串）
对于每个单词，如果它与有效字符的正则表达式中的内容匹配，则提取第一个字母（子字符串的子字符串）

我已经查看了关于 SO 的所有类似问题/答案，但到目前为止没有一个能够解决我的问题。

【问题讨论】：

你看过这个吗？ stackoverflow.com/q/8313154/2812842
是的，但这不是一个很有帮助的答案

标签： mysql regex

【解决方案1】：

我真的不知道 MySql Regex 语法和限制（从未使用过），但是您可以在字符串前添加前导空格，并匹配如下简单的内容：“ ([ਮਜਰਣਚਕਨਖਲਨ]{1})”

所以，如果你连接匹配的组，你会得到这个字符串“ਰਚਨਲਨ”（只有“ਹ”不匹配，因为它不存在于样本中”）

在 C# 中它可能看起来像这样（工作示例）：

namespace TestRegex
{
    using System.Linq;
    using System.Text.RegularExpressions;
    using System.Windows.Forms;

    class Program
    {
        static void Main(string[] args)
        {
            // leading space(to match first word too)
            // + sample string
            var sample = " ";
            sample +=  "ਹੁਕਮਿ ਰਜਾਈ ਚਲਣਾ ਨਾਨਕ ਲਿਖਿਆ ਨਾਲਿ ॥੧॥"; 

            // Regex pattern that will math space, and
            // if next character in set - add it to "match group 1"
            var pattern = " ([ਮਜਰਣਚਕਨਖਲਨ]{1})";

            // select every "match group 1" from matches as array
            var result = from Match m in Regex.Matches(sample, pattern) 
                         select m.Groups[1];

            // concatenate array content into one string and
            // show it in message box to user, for example..
            MessageBox.Show(string.Concat(result)); 
        }
    }
}

在大多数非查询语言中，它看起来几乎相同。例如在 php 中你需要做 preg_match_all，并且在 foreach 循环中从每个匹配添加 "$match[i][1]"(every "match group 1") 到单个字符串的末尾。

嗯.. 很简单。但不适用于 mysql...

【讨论】：

您有可以执行此操作的代码示例吗？
是的（添加到答案中），但仅在 C# 中。
非常感谢 C# 版本。我也想用 PHP 做这个，但我真的想要一个纯 SQL 解决方案

【解决方案2】：

在我的一位程序员朋友的帮助下，我终于实现了这一点。我直接将以下代码粘贴到PhpMyAdmin数据库的SQL部分：

delimiter $$
drop function if exists `initials`$$
CREATE FUNCTION `initials`(str text, expr text) RETURNS text CHARSET utf8
begin
    declare result text default '';
    declare buffer text default '';
    declare i int default 1;
    if(str is null) then
        return null;
    end if;
    set buffer = trim(str);
    while i <= length(buffer) do
        if substr(buffer, i, 1) regexp expr then
            set result = concat( result, substr( buffer, i, 1 ));
            set i = i + 1;
            while i <= length( buffer ) and substr(buffer, i, 1) regexp expr do
                set i = i + 1;
            end while;
            while i <= length( buffer ) and substr(buffer, i, 1) not regexp expr do
                set i = i + 1;
            end while;
        else
            set i = i + 1;
        end if;
    end while;
    return result;
end$$

drop function if exists `acronym`$$
CREATE FUNCTION `acronym`(str text) RETURNS text CHARSET utf8
begin
    declare result text default '';
    set result = initials( str, '[ੴਓੳਅੲਸਹਕਖਗਘਙਚਛਜਝਞਟਠਡਢਣਤਥਦਧਨਪਫਬਭਮਯਰਲਵੜਸ਼ਖ਼ਗ਼ਜ਼ਫ਼ਲ਼]' );
    return result;
end$$
delimiter ;

UPDATE scriptures SET search = acronym(scripture)

只是为了解释最后一行：

scriptures 是我要更新的表
search 是我在表中创建的一个新的空列，用于存储结果
scripture 是 scriptures 表中的现有列，其中包含我要从中提取的所有字符串
acronym 是先前声明的函数，它希望将每个单词的第一个字母与 RegEx [ੴਓੳਅੲਸਹਕਖਗਘਙਚਛਜਝਞਟਠਡਢਣਤਥਦਧਨਪਫਬਭਮਯਰਲਵੜਸ਼ਖ਼ਗ਼ਜ਼ਫ਼ਲ਼] 中的一个字符进行匹配

所以代码的最后一行将遍历scripture 列的每一行，对其应用函数acronym 并将结果存储在新的search 列中。

完美！正是我想要的:)

【讨论】：