【问题标题】:How to split text to match double quotes plus trailing text to dot?如何拆分文本以匹配双引号加上尾随文本到点?
【发布时间】:2017-10-20 07:51:32
【问题描述】:

我怎样才能得到一个双引号中的句子,其中有一个必须拆分的点?

像这样的示例文档:

“国际象棋帮助我们克服困难和痛苦,”乌尼克里希南说,带着我的王后。 “在棋盘上,你正在战斗。因为我们也在与日常生活中的艰辛作斗争。”他说。

我想得到这样的输出:

Array
(
    [0] =>"Chess helps us overcome difficulties and sufferings," said Unnikrishnan, taking my queen.
    [1] =>"On a chess board you are fighting. as we are also fighting the hardships in our daily life," he said.
 )

我的代码仍然按点爆炸。

function sample($string)
{
    $data=array();
    $break=explode(".", $string);
    array_push($data, $break);

    print_r($data);
}

我仍然对拆分关于双引号和点的两个分隔符感到困惑。因为在双引号里面有一个包含点分隔符的句子。

【问题讨论】:

    标签: php regex unicode preg-split


    【解决方案1】:

    (*SKIP)(*FAIL) 的完美示例:

    “[^“”]+”(*SKIP)(*FAIL)|\.\s*
    # looks for strings in double quotes
    # throws them away
    # matches a dot literally, followed by whitespaces eventually
    


    PHP:
    $regex = '~“[^“”]+”(*SKIP)(*FAIL)|\.\s*~';
    $parts = preg_split($regex, $your_string_here);
    

    这会产生

    Array
    (
        [0] => “Chess helps us overcome difficulties and sufferings,” said Unnikrishnan, taking my queen
        [1] => “On a chess board you are fighting. as we are also fighting the hardships in our daily life.”
    )
    

    参见a demo on regex101.coma demo on ideone.com

    【讨论】:

    • 你能告诉我你的正则表达式中字符~是什么意思吗? Cz 我尝试学习正则表达式,但我没有在正则表达式中找到字符~。或者你能给我参考学习正则表达式字符吗?,谢谢。
    • @Rachmad:这些是分隔符,例如 /#,并且在正则表达式字符串的两侧都需要。
    • 哦..so 如果我将~ 更改为 ~/~ 没问题? @简
    【解决方案2】:

    这是preg_split() 后跟preg_replace() 使用的更简单的模式来修复左右双引号(Demo):

    $in = '“Chess helps us overcome difficulties and sufferings,” said Unnikrishnan, taking my queen. “On a chess board you are fighting. as we are also fighting the hardships in our daily life.” he said.';
    
    $out = preg_split('/ (?=“)/', $in, 0, PREG_SPLIT_NO_EMPTY);
    //$out = preg_match_all('/“.+?(?= “|$)/', $in, $out) ? $out[0] : null;
    
    $find = '/[“”]/u';  // unicode flag is essential
    $replace = '"';
    $out = preg_replace($find, $replace, $out);  // replace curly quotes with standard double quotes
    
    var_export($out);
    

    输出:

    array (
      0 => '"Chess helps us overcome difficulties and sufferings," said Unnikrishnan, taking my queen.',
      1 => '"On a chess board you are fighting. as we are also fighting the hardships in our daily life." he said.',
    )
    

    preg_split() 匹配空格后跟(左双引号)。

    preg_replace() 步骤需要带有u 修饰符的模式,以确保识别字符类中的左右双引号。使用 '/“|”/' 意味着您可以删除 u 修饰符,但它会使正则表达式引擎必须执行的步骤加倍(在这种情况下,我的字符类仅使用 189 步,而管道字符使用 372 步)。

    此外,关于preg_split()preg_match_all() 之间的选择,选择preg_split() 的原因是因为目标只是在left double quote 后面的空格上拆分字符串。如果目标是省略与分隔空格字符不相邻的子字符串,preg_match_all() 将是一个更实际的选择。

    尽管我的逻辑,如果你仍然想使用preg_match_all(),我的preg_split()行可以替换为:

    $out = preg_match_all('/“.+?(?= “|$)/', $in, $out) ? $out[0] : null;
    

    【讨论】:

    • 完美解决方案!
    • 也不错.. 但是我们如何在 php 中打印双引号?
    • 哦.. 我知道我的问题,只需编辑 .htacces 并添加特殊字符 AddDefaultCharset UTF-8 AddCharset UTF-8 .php,也感谢 @mickmackusa
    【解决方案3】:

    或者:

    regex101 (16 步)

    “.[^”]+”(?:.[^“]+)?

    • “.[^”]+” 匹配 之间的所有内容。
    • (?:.[^“]+)? 匹配 - 一种可能性,这就是为什么会有最后一个 ?- 不是开始的一切?: 表示非捕获组。

    PHP - PHPfiddle: - 点击“Run-F9” - [ 更新为替换 , @ 987654332@ with " ]

    <?php
        $str = '“Chess helps us overcome difficulties and sufferings,” said Unnikrishnan, taking my queen. “On a chess board you are fighting. as we are also fighting the hardships in our daily life.”';
    
    if(preg_match_all('/“.[^”]+”(?:.[^“]+)?/',$str, $matches)){
        echo '<pre>';
        print_r(preg_replace('[“|”]', '"', $matches[0]));
        echo '</pre>';
    }
    ?>
    

    输出:

    Array
    (
        [0] => "Chess helps us overcome difficulties and sufferings," said Unnikrishnan, taking my queen. 
        [1] => "On a chess board you are fighting. as we are also fighting the hardships in our daily life."
    )
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2021-07-10
      • 1970-01-01
      • 2021-09-24
      • 2012-03-24
      • 1970-01-01
      • 2019-08-27
      • 2011-07-26
      相关资源
      最近更新 更多