我没有测试过这段代码,但我认为这个非正则表达式的想法可能更适合你。本质上,您将字符串按空格拆分,然后解析每个部分。这种方法意味着部件的顺序无关紧要。
这有点棘手,因为内容和项目可以跨越多个部分,但我认为我的代码应该处理这个问题。它还假设每条推文只有一个标签、用户、项目和优先级。例如,如果会有多个主题标签,只需将它们放入数组而不是字符串中。最后,它没有任何错误处理来检测/防止奇怪的事情发生。
这是我未经测试的代码:
$data = array(
'hash' => '',
'user' => '',
'priority' => '',
'project' => '',
'content' => ''
);
$parsingProjectName = false;
foreach(explode(' ', $tweet) as $piece)
{
switch(substr($piece, 0, 1))
{
case '#':
$data['hash'] = substr($piece, 1);
break;
case '@':
$data['user'] = substr($piece, 1);
break;
case '!':
$data['priority'] = substr($piece, 1);
break;
case '[':
// Check if the project name is longer than 1 word
if(strpos($piece, -1) == ']')
{
$data['project'] = substr($piece, 1, -1);
}
else
{
// There will be more to parse in the next piece(s)
$parsingProjectName = true;
$data['project'] = substr($piece, 1) . ' ';
}
break;
default:
if($parsingProjectName)
{
// Are we at the end yet?
if(strpos($piece, -1) == ']')
{
// Yes we are
$data['project'] .= substr($piece, 1, -1);
$parsingProjectName = false;
}
else
{
// Nope, there is more
$data['project'] .= substr($piece, 1) . ' ';
}
}
else
{
// We aren't in the middle of parsing the project name, and this piece doesn't start with one of the special chars, so assume it is content
$data['content'] .= $piece . ' ';
}
}
}
// There will be an extra space on the end; remove it
$data['content'] = substr($data['content'], 0, -1);