使用 PHP 解码内部的多个 xml 标签答案

【问题标题】：Decode multiple xml tags inside using PHP使用 PHP 解码内部的多个 xml 标签
【发布时间】：2013-12-30 20:00:01
【问题描述】：

我正在寻找一种在字符串中解码多个 XML 标记的“智能方式”，我有以下功能：

function b($params) {
    $xmldata = '<?xml version="1.0" encoding="UTF-8" ?><root>' . html_entity_decode($params['data']) . '</root>';
    $lang = ucfirst(strtolower($params['lang']));
    if (simplexml_load_string($xmldata) === FALSE) {
        return $params['data'];
    } else {
        $langxmlobj = new SimpleXMLElement($xmldata);

        if ($langxmlobj -> $lang) {
            return $langxmlobj -> $lang;
        } else {
            return $params['data'];
        }
    }
}

尝试一下

$params['data'] = '<French>Service DNS</French><English>DNS Service</English> - <French>DNS Gratuit</French><English>Free DNS</English>';
$params['lang'] = 'French';
$a = b($params);
print_r($a);

但是输出：

Service DNS

我希望它基本上输出每个标签，所以结果应该是：

Service DNS - DNS Gratuit

拔掉我的头发。任何快速帮助或指示将不胜感激。

编辑：细化需求。

好像我说的不够清楚；让我再举一个例子

如果我有以下字符串作为输入：

The <French>Chat</French><English>Cat</English> is very happy to stay on stackoverflow 
because it makes him <French>Heureux</French><English>Happy</English> to know that it 
is the best <French>Endroit</French><English>Place</English> to find good people with
good <French>Réponses</French><English>Answers</English>.

所以如果我用'French'运行函数，它将返回：

The Chat is very happy to stay on stackoverflow 
because it makes him Heureux to know that it 
is the best Endroit to find good people with
good Réponses.

还有“英语”：

The Cat is very happy to stay on stackoverflow 
because it makes him Happy to know that it 
is the best Place to find good people with
good Answers.

希望现在更清楚了。

【问题讨论】：

你的 php 版本是什么？您的代码为我输出每个标签（$a 是 SimpleXMLElement 对象）

标签： php xml-parsing

【解决方案1】：

基本上，我会先解析出lang部分，比如：

<French>Chat</French><English>Cat</English>

用这个：

"@(<($defLangs)>.*?</\\2>)+@i"

然后用回调解析出正确的lang str。

如果你有 php 5.3+，那么：

function transLang($str, $lang, $defLangs = 'French|English')
{
    return preg_replace_callback ( "@(<($defLangs)>.*?</\\2>)+@i", 

            function ($matches) use($lang)
            {
                preg_match ( "/<$lang>(.*?)<\/$lang>/i", $matches [0], $longSec );

                return $longSec [1];
            }, $str );
}

echo transLang ( $str, 'French' ), "\n", transLang ( $str, 'English' );

如果不是，有点复杂：

class LangHelper
{

    private $lang;

    function __construct($lang)
    {
        $this->lang = $lang;
    }

    public function callback($matches)
    {
        $lang = $this->lang;

        preg_match ( "/<$lang>(.*?)<\/$lang>/i", $matches [0], $subMatches );

        return $subMatches [1];
    }

}

function transLang($str, $lang, $defLangs = 'French|English')
{
    $langHelper = new LangHelper ( $lang );

    return preg_replace_callback ( "@(<($defLangs)>.*?</\\2>)+@i", 
            array (
                    $langHelper,
                    'callback' 
            ), $str );
}

echo transLang ( $str, 'French' ), "\n", transLang ( $str, 'English' );

【讨论】：

【解决方案2】：

如果我理解正确，您想删除所有“语言”标签，但保留所提供语言的内容。

DOM 是一棵节点树。标签是元素节点，文本存储在文本节点中。 Xpath 允许使用表达式选择节点。因此，获取要保留的语言元素的所有子节点，并将它们复制到语言节点之前。然后删除所有语言节点。即使语言元素包含其他元素节点（例如），这也将起作用。

function replaceLanguageTags($fragment, $language) {
  $dom = new DOMDocument();
  $dom->loadXml(
    '<?xml version="1.0" encoding="UTF-8" ?><content>'.$fragment.'</content>'
  );
  // get an xpath object
  $xpath = new DOMXpath($dom);

  // fetch all nodes with the language you like to keep
  $nodes = $xpath->evaluate('//'.$language);
  foreach ($nodes as $node) {
    // copy all the child nodes of just before the found node
    foreach ($node->childNodes as $childNode) {
      $node->parentNode->insertBefore($childNode->cloneNode(TRUE), $node);
    }
    // remove the found node
    $node->parentNode->removeChild($node);
  }

  // select all language nodes
  $tags = array('English', 'French');
  $nodes = $xpath->evaluate('//'.implode('|//', $tags));
  foreach ($nodes as $node) {
    // remove them
    $node->parentNode->removeChild($node);
  }

  $result = '';
  // we do not need the root node, so save all its children
  foreach ($dom->documentElement->childNodes as $node) {
    $result .= $dom->saveXml($node);
  }
  return $result;
}

$xml = <<<'XML'
The <French>Chat</French><English>Cat</English> is very happy to stay on stackoverflow
because it makes him <French>Heureux</French><English>Happy</English> to know that it
is the best <French>Endroit</French><English>Place</English> to find good people with
good <French>Réponses</French><English>Answers</English>.
XML;

var_dump(replaceLanguageTags($xml, 'English'));
var_dump(replaceLanguageTags($xml, 'French'));

输出：

string(146) "The Cat is very happy to stay on stackoverflow
because it makes him Happy to know that it
is the best Place to find good people with
good Answers."
string(153) "The Chat is very happy to stay on stackoverflow
because it makes him Heureux to know that it
is the best Endroit to find good people with
good Réponses."

【讨论】：

【解决方案3】：

您使用的是哪个版本的 PHP？我不知道还有什么不同，但我复制并粘贴了您的代码并得到以下输出：

SimpleXMLElement Object
(
    [0] => Service DNS
    [1] => DNS Gratuit
)

为了确定，这是我从上面复制的代码：

<?php

function b($params) {
    $xmldata = '<?xml version="1.0" encoding="UTF-8" ?><root>' . html_entity_decode($params['data']) . '</root>';
    $lang = ucfirst(strtolower($params['lang']));
    if (simplexml_load_string($xmldata) === FALSE) {
        return $params['data'];
    } else {
        $langxmlobj = new SimpleXMLElement($xmldata);

        if ($langxmlobj -> $lang) {
            return $langxmlobj -> $lang;
        } else {
            return $params['data'];
        }
    }
}

$params['data'] = '<French>Service DNS</French><English>DNS Service</English> - <French>DNS Gratuit</French><English>Free DNS</English>';
$params['lang'] = 'French';
$a = b($params);
print_r($a);

【讨论】：

不，这不是更清楚。使用您的新字符串，您的代码会生成 SimpleXMLElement 对象（ [0] => Chat [1] => Heureux [2] => Endroit [3] => Reponses ）也许您需要 print_r 以外的函数，但不清楚您是什么正在努力实现或您目前的结果是什么。如果你想在你的问题中输出一个段落，不要使用 print_r，这样做： $a[0] 很高兴留在 stackoverflow 上，因为它让他 $a[1] 知道它是最好的 $a[2] 用 $a[3] 找到好人。
没有。我希望函数返回用正确语言翻译的文本；我不想通过数组和索引。请忽略最后的'print_r'函数； print $a 应该打印翻译后的文本。
据我所知，除了通过数组之外，没有简单的方法可以用 PHP 做你想做的事情。

【解决方案4】：

这是我的建议。它应该很快而且很简单。您只需要剥离所需语言的标签，然后删除任何其他标签及其内容。

不利的一面是，如果您希望使用语言标签以外的任何其他标签，则必须确保开始标签与结束标签不同（例如，Lorem 而不是Lorem）。另一方面，这允许您添加任意数量的语言，而无需保留它们的列表。当要求的语言丢失时，您只需要知道默认的（或只是抛出并捕获异常）。

function only_lang($lang, $text) {
    static $infinite_loop;

    $result = str_replace("<$lang>", '', $text, $num_matches_open);
    $result = str_replace("</$lang>", '', $result, $num_matches_close);

    // Check if the text is malformed. Good place to throw an error
    if($num_matches_open != $num_matches_close) {
        //throw new Exception('Opening and closing tags does not match', 1);

        return $text;
    }

    // Check if this language is present at all.
    // Otherwise fallback to default language or throw an error
    if( ! $num_matches_open) {
        //throw new Exception('No such language', 2);

        // Prevent infinite loop if even the default language is missing
        if($infinite_loop) return $text;
        $infinite_loop = __FUNCTION__;
        return $infinite_loop('English', $text);
    }

    // Strip any other language and return the result
    return preg_replace('!<([^>]+)>.*</\\1>!', '', $result);
}

【讨论】：

【解决方案5】：

我用正则表达式得到了一个简单的。如果输入仅包含 <lang>...</lang> 标记，则很有用。

function to_lang($lang="", $str="") {
  return strip_tags(preg_replace('~<(\w+(?<!'.$lang.'))>.*</\1>~Us',"",$str));
}

echo to_lang("English","The happy <French>Chat</French><English>Cat</English>");

删除每个<tag>...</tag>，这不是$lang 中指定的一个。如果<tag-name> 中可能有空格/特价，例如<French-1> 将 \w 替换为 [^/>]。

解释一下搜索模式

1.) <(\w+(?<!'.$lang.'))

< 后跟一个或多个Word characters，不匹配$lang（使用negative lookbehind）并捕获<tag_name>

2.) .* 后跟任何内容（不贪婪：modifier U，点匹配换行符：修饰符 s )

3.) </\1> 直到捕获的标签被关闭

【讨论】：