我决定创建一个简单的解析器来查看结果。由于我不解析有效的 XML,从现在开始我将它称为 XMLIsh。
解析器实际上工作得很好,而且性能也不错:我做了一些测试,我发现它在有效的 xml 文档上只比 SimpleXMLElement 慢 10 倍,而 SimpleXMLElement 是在 php 功能中构建的,并且我的功能只是 php。这个解析器也适用于“XMLIsh”文档,如前所述。因此,只要不需要超快的速度,这可能是一个有效的解决方案。
在我的情况下,这些文档只是偶尔解析一次,因为输出是缓存的,所以我认为这对我有用。
无论如何,这是我的代码:
/**
* This function parses a string as an XMLIsh document. An XMLIsh document is very similar to xml, but only one namespace should be parsed.
*
* parseXMLish walks through the document and creates a tree while doing so.
* Each element will be represented as an array, with the following content:
* -index = 0: An array with as first element (index = 0) the type of the element. All following elements are its arguments with index=name and value=value.
* -index = 1: Optional:an array with the content of this element. If the content is a string, this array will only have one element, namely the content of the string.
*
* @param &$string The XMLIsh string to be parsed
* @param $namespace The namespace which should be parsed.
* @param &$offset The starting point of parsing. Default = 0
* @param $previousTag The current opening tag. This argument shouldn't be set manually, this argument is needed for this function to check if a closing tag is valid.
*/
function parseXMLish(&$string,$namespace,&$offset=0,$openingTag = ""){
//Whitespace doesn't matter, so trim it:)
$string = trim($string);
$result = array();
//We need to find our mvc elements. These elements use xml syntax and should have the namespace mvc.
//Opening, closing and self closing tags are found.
while(preg_match("/<(\/)?{$namespace}:(\w*)(.*?)(\/)?>/",$string,$matches,PREG_OFFSET_CAPTURE,$offset)){
//Before our first mvc element, other text might have been found (e.g. html code).
//This should be added to our result array first. Again, strip the whitespace.
$preText = substr($string,$offset,$matches[0][1]-$offset);
$trimmedPreText = trim($preText);
if (!empty($trimmedPreText))
$result[] = $trimmedPreText;
//We could have find 2 types of tags: closing and opening (including self closing) tags.
//We need to distinguish between those two.
if ($matches[1][0] == ''){
//This tag was an opening tag. This means we should add this to the result array.
//We add the name of this tag to the element first.
$result[][0][0] = $matches[2][0];
//Tags can also have arguments. We will find them here, and store them in the result array.
preg_match_all("/\s*(\w+=[\"']?\S+[\"'])/",$matches[0][0],$arguments);
foreach($arguments[1] as $argument){
list($name,$value)=explode("=",$argument);
$value = str_replace("\"","",$value);
$value = str_replace("'","",$value);
$result[count($result)-1][0][$name]=$value;
}
//We need to recalculate our offset. So lets do that.
$offset += strlen($preText) + strlen($matches[0][0]);
//Now we will have to fill our element with content.
//This is only necessary if this is a regular opening tag, and not a self-closing tag.
if (!(isset($matches[4]) && $matches[4][0] == "/")){
$content = parseXMLish($string, $namespace, $offset,$matches[2][0]);
}
//Only add content when there is any.
if (!empty($content))
$result[count($result)-1][] = $content;
}else{
//This tag is a closing tag. It means that we only have to update the offset, and that we can go one level up
//That is: return what we have so far back to the previous level.
//Note: the closing tag is the closing tag of the previous level, not of the current level.
if ($matches[2][0] != $openingTag)
throw new Exception("Closing tag doesn't match the opening tag. Opening tag: $previousTag. Closing tag: {$matches[2][0]}");
$offset += strlen($preText) + strlen($matches[0][0]);
return $result;
}
}
//If we have any text left after our last element, we should add that to the array too.
$postText = substr($string,$offset);
if (!empty($postText))
$result[] = $postText;
//We're done!
return $result;
}