是的,strtok 被彻底破坏了,即使在一个简单的单线程程序中,我将通过一些示例代码来演示这种失败:
让我们从一个简单的文本分析器函数开始,使用strtok 收集有关文本句子的统计信息。
此代码将导致未定义的行为。
在此示例中,句子是由空格、逗号、分号和句点分隔的一组单词。
// Example:
// int words, longest;
// GetSentenceStats("There were a king with a large jaw and a queen with a plain face, on the throne of England.", &words, &longest);
// will report there are 20 words, and the longest word has 7 characters ("England").
void GetSentenceStats(const char* sentence, int* pWordCount, int* pMaxWordLen)
{
char* delims = " ,;."; // In a sentence, words are separated by spaces, commas, semi-colons or period.
char* input = strdup(sentence); // Make an local copy of the sentence, to be modified without affecting the caller.
*pWordCount = 0; // Initialize the output to Zero
*pMaxWordLen = 0;
char* word = strtok(input, delims);
while(word)
{
(*pWordCount)++;
*pMaxWordLen = MAX(*pMaxWordLen, (int)strlen(word));
word = strtok(NULL, delims);
}
free(input);
}
这个简单的功能有效。到目前为止没有错误。
现在让我们扩充我们的库,添加一个收集文本段落统计信息的函数。
段落是由感叹号、问号和句号分隔的一组句子。
它将返回段落中的句子数,以及最长句子中的单词数。
也许最重要的是,它将使用早期的函数GetSentenceStats 来帮助
void GetParagraphStats(const char* paragraph, int* pSentenceCount, int* pMaxWords)
{
char* delims = ".!?"; // Sentences in a paragraph are separated by Period, Question-Mark, and Exclamation.
char* input = strdup(paragraph); // Make an local copy of the paragraph, to be modified without affecting the caller.
*pSentenceCount = 0;
*pMaxWords = 0;
char* sentence = strtok(input, delims);
while(sentence)
{
(*pSentenceCount)++;
int wordCount;
int longestWord;
GetSentenceStats(sentence, &wordCount, &longestWord);
*pMaxWords = MAX(*pMaxWords, wordCount);
sentence = strtok(NULL, delims); // This line returns garbage data,
}
free(input);
}
这个函数看起来也非常简单明了。
但它不起作用,正如这个示例程序所展示的那样。
int main(void)
{
int cnt;
int len;
// First demonstrate that the SentenceStats function works properly:
char *sentence = "There were a king with a large jaw and a queen with a plain face, on the throne of England.";
GetSentenceStats(sentence, &cnt, &len);
printf("Word Count: %d\nLongest Word: %d\n", cnt, len);
// Correct Answer:
// Word Count: 20
// Longest Word: 7 ("England")
printf("\n\nAt this point, expected output is 20/7.\nEverything is working fine\n\n");
char paragraph[] = "It was the best of times!" // Literary purists will note I have changed Dicken's original text to make a better example
"It was the worst of times?"
"It was the age of wisdom."
"It was the age of foolishness."
"We were all going direct to Heaven!";
int sentenceCount;
int maxWords;
GetParagraphStats(paragraph, &sentenceCount, &maxWords);
printf("Sentence Count: %d\nLongest Sentence: %d\n", sentenceCount, maxWords);
// Correct Answer:
// Sentence Count: 5
// Longest Sentence: 7 ("We were all going direct to Heaven")
printf("\n\nAt the end, expected output is 5/7.\nBut Actual Output is Undefined Behavior! Strtok is hopelessly broken\n");
_getch();
return 0;
}
对strtok 的所有调用都是完全正确的,并且是在单独的数据上。
但结果是未定义的行为!
为什么会这样?
当GetParagraphStats 被调用时,它开始一个strtok 循环来获取句子。
在第一句话中,它将调用GetSentenceStats。 GetSentenceStats 也将是一个strtok 循环,丢失由GetParagraphStats 建立的所有状态。
当GetSentenceStats返回时,调用者(GetParagraphStats)会再次调用strtok(NULL)获取下一句。
但strtok 会认为这是一个继续前一个操作的调用,并将继续标记现在已释放的内存!
结果是可怕的未定义行为。
什么时候可以安全地使用 strtok?
即使在单线程环境中,strtok 也只能使用当程序员/架构师确定两个条件时安全:
在多线程环境下,使用strtok 更是不可能,因为程序员需要确保在当前线程上只有一个strtok 使用,并且没有其他线程在使用strtok 要么。