处理 strcpy(string, "") 时未正确清空和分配字符串答案

【问题标题】：String not properly being emptied and assigned when dealing with strcpy(string, "")处理 strcpy(string, "") 时未正确清空和分配字符串
【发布时间】：2018-01-22 19:51:47
【问题描述】：

编辑：我确实尝试将行 arr_of_strings[arr_index_count] = first_word; 更改为 strcpy(arr_of_strings[arr_index_count], first_word); 但在打印 Word is: This 后出现分段错误

编辑 2：我试图在没有 strtok 的情况下执行此操作，因为我认为这将是了解 C 字符串的好方法。

尝试自学C。决定创建一个函数，该函数接受一个字符串，并将字符串中的每个单词放入数组中的一个元素中。这是我的代码：

假设#define MAX_LENGTH = 80

// char *string_one[unknown_size];

// first_word will represent each word in the sentence
char first_word[MAX_LENGTH + 1] = "";

// this is the array I will store each word in
char *arr_of_strings[MAX_LENGTH];

int index_count = 0;
int arr_index_count = 0;

char sentence[] = "This is a sentence.";

for (int i = 0; i<MAX_LENGTH; i++) {
    printf("Dealing with char: %c\n", sentence[i]); 

    if (sentence[i] == '\0') {
        // end of sentence
        break;
    } else if (sentence[i] ==  ' ') {
        // this signifies the end of a word
        printf("Word is: %s\n", first_word);
        arr_of_strings[arr_index_count] = first_word;
        // after putting the word in the string, make the word empty again
        strcpy(first_word, "");
        // verify that it is empty
        printf("First word is now: %s\n", first_word);

        index_count = 0;
        arr_index_count++;
    } else {
        // not the start of a new string... so keep appending the letter to first_word
        printf("Letter to put in first_word is: %c\n", sentence[i]);
        first_word[index_count] = sentence[i];
        index_count++;
    }
}

printf("-----------------\n");
for (int j = 0; j<=arr_index_count; j++) {
    printf("%s\n", arr_of_strings[j]);
}

打印出来的是：

Dealing with char: T
Letter to put in first_word is: T
Dealing with char: h
Letter to put in first_word is: h
Dealing with char: i
Letter to put in first_word is: i
Dealing with char: s
Letter to put in first_word is: s
Dealing with char:  
Word is: This
First word is now: 
Dealing with char: i
Letter to put in first_word is: i
Dealing with char: s
Letter to put in first_word is: s
Dealing with char:  
Word is: isis
First word is now: 
Dealing with char: a
Letter to put in first_word is: a
Dealing with char:  
Word is: asis
First word is now: 
Dealing with char: s
Letter to put in first_word is: s
Dealing with char: e
Letter to put in first_word is: e
Dealing with char: n
Letter to put in first_word is: n
Dealing with char: t
Letter to put in first_word is: t
Dealing with char: e
Letter to put in first_word is: e
Dealing with char: n
Letter to put in first_word is: n
Dealing with char: c
Letter to put in first_word is: c
Dealing with char: e
Letter to put in first_word is: e
Dealing with char: .
Letter to put in first_word is: .
Dealing with char: 
-----------------
sentence.
sentence.
sentence.

如果我们看这里：

First word is now: 
Dealing with char: i
Letter to put in first_word is: i
Dealing with char: s
Letter to put in first_word is: s
Dealing with char:  
Word is: isis

怎么会，word是空的，我们把i和s放进去，word现在是isis？（与asis 相同）。
sentence这个词怎么会打印3次？我的算法显然有缺陷，但如果有的话，不应该将单词sentence 打印 4 次（句子中的每个单词一次：This is a sentence）？

另外，我只是在学习 C，所以如果有任何其他方法可以改进算法，请告诉我。

【问题讨论】：

arr_of_strings 是一个 char 指针数组，你将它们都指向同一个 char 数组 first_word
..完全正确。而且你没有写一个空终止符，所以“This”会被“is”->“isis”覆盖，依此类推。
我建议从一本书中学习；试错法不适用于 C
@user2719875 和strcpy(first_word, "") 你在first_word[0] 只写了一个零。当您进行“手动复制”时，您不会写零。
“手动复制”是first_word[index_count] = sentence[i];。（但最糟糕的问题仍然是单字符数组。）

标签： c string function c-strings

【解决方案1】：

arr_of_strings 只是一个char 指针数组，然后将所有单词指向数组first_word。此外，您不使用 C 字符串所需的空终止符。

这是一种可能对您有所帮助的方法，它使用strtok：

#include <string.h>
#include <stdio.h>

#define N 100
#define LEN 20 // max length of a word

int fill(char matrix[N][LEN], char* data)
{
    // How many words in 'data'?
    int counter = 0;
    char * pch;
    // Splits 'data' to tokens, separated by a whitespace
    pch = strtok (data," ");
    while (pch != NULL)
    {
        // Copy a word to the correct row of 'matrix'
        strcpy(matrix[counter++], pch);
        //printf ("%s\n",pch);
        pch = strtok (NULL, " ");
    }
    return counter;
}

void print(char matrix[N][LEN], int words_no)
{
   for(int i = 0; i < words_no; ++i)
       printf("%s\n", matrix[i]);
}

int main(void)
{
    char data[] = "New to the C programming language";
    // We will store each word of 'data' to a matrix, of 'N' rows and 'LEN' columns
    char matrix[N][LEN] = {0};
    int words_no;
    // 'fill()' populates 'matrix' with 'data' and returns the number of words contained in 'data'.
    words_no = fill(matrix, data);
    print(matrix, words_no);
    return 0;
}

输出：

New
to
the
C
programming
language

【讨论】：

啊，我实际上是在尝试在没有 strtok 的情况下执行此操作，认为这是掌握 C 字符串的好方法。对不起，应该在帖子中提到。我将使用strtok 审查您的代码，因为我可能也会从中学习，所以提前致谢。

【解决方案2】：

1) 发生这种情况是因为您在打印出来之前没有将 '\0' 添加到单词的末尾。在您的程序遇到第一个空格后first_word 看起来像这样{'T', 'h', 'i', 's', '\0', '\0', ...} 并打印出来就好了。调用strcpy(first_word, "") 将其更改为{'\0', 'h', 'i', 's', '\0', ...}，然后读入下一个单词“is”会覆盖字符串的前两个字符，从而导致{'i', 's', 'i', 's', '\0', ...}，因此first_word 现在是字符串“isis”，如输出所示。这可以通过在打印字符串之前简单地添加first_word[index_count] = '\0' 来解决。

2.1) 这个数组在每个索引中包含相同字符串的原因是因为你的字符串数组arr_of_strings 是一个字符串指针数组，最终都指向同一个字符串first_word，它将包含句子的最后一个单词在循环结束时。这可以通过几种方法来解决，其中一种是使arr_of_strings 像char arr_of_strings[MAX_STRINGS][MAX_LENGTH] 这样的二维数组，然后您可以使用strcpy(arr_of_strings[arr_index_count], first_word) 将字符串添加到该数组中

2.2) 最后是它只打印“句子”的原因。三次是因为您只检查一个空格来表示单词的结尾。 “句子。”以空终止符 '\0' 结尾，因此它永远不会添加到单词数组中，并且输出也没有一行“单词是：句子。”

【讨论】：

感谢您的解释。当您说“通过在打印字符串之前添加first_word[index_count] = '\0'”时，您的意思是在此行之前：printf("First word is now: %s\n", first_word);？所以假设first_word当前是This并且达到' '，那不就是把' '改成\0吗？然后下一行变成\0his\0\0\0\0。然后用'a'，它变成a\0s\0\0，然后打印这个词是a？编辑：我确实添加了您提到的行，现在它打印Word is: ThisWord is: isisWord is: as
好的，关于 2.1）。为什么需要char arr_of_strings[MAX_STRINGS][MAX_LENGTH]，为什么char *arr_of_strings[MAX_LENGTH]（原始方式）不起作用？据我了解，每个元素都指向一个字符串，对吧？所以strcpy(arr_of_strings[arr_index_count], first_word) 使数组char *arr_of_strings[MAX_LENGTH] 中的元素指向字符串first_word 的内容？
哦，最后，char arr_of_strings[MAX_STRINGS][MAX_LENGTH] 是如何读取/写入的？它是“一个 MAX_STRING 元素的数组，其中每个元素都是一个字符，每个字符的最大长度为 MAX_LENGTH”？所以[ [char with length MAX_LENGTH], [char with length MAX_LENGTH] etc.]?在那种情况下，内部数组不应该是char *吗？所以像char *arr_of_strings[MAX_STRINGS][MAX_LENGTH]?

【解决方案3】：

基于我的 strtok-free answer，我编写了一些使用 N 字符指针数组的代码，而不是硬编码的二维矩阵。

char matrix[N][LEN] 是一个二维数组，最多可以存储N 字符串，其中每个字符串的最大长度可以是LEN。 char *ptr_arr[N] 是 N 字符指针的数组。所以它最多可以存储N个字符串，但是每个字符串的长度是没有定义的。

当前的方法可以为每个字符串分配所需的内存，从而为我们节省一些空间。使用硬编码的 2D 数组，您可以为任何字符串使用相同的内存；因此，如果您假设一个字符串的长度可以是 20，那么您将分配一个大小为 20 的内存块，而不管您存储的字符串是什么，它的大小可能比 20 小得多，或者 - 甚至更糟 - 大得多.在后一种情况下，您需要截断字符串，或者如果代码编写不仔细，则调用 Undefined Behavior，方法是超出存储字符串的数组范围。

使用指针的方法，我们不需要担心这一点，并且可以为每个字符串分配我们需要的尽可能多的空间，但与往常一样，存在权衡。我们能够做到这一点并节省一些空间，但我们需要动态分配内存（完成后，取消分配它；C 中没有垃圾收集器，例如 Java )。动态分配是一个强大的工具，但需要我们花费更多的开发时间。

因此，在我的示例中，我们将遵循与之前相同的逻辑（关于如何从字符串中查找单词等），但我们会小心将单词存储在矩阵中。

一旦找到一个单词并将其存储在临时数组word 中，我们就可以使用strlen() 找出该单词的确切长度。我们将动态分配与单词长度完全相同的空间，加上 1 用于空终止符，所有 C 字符串都应具有（因为 <string.h> 依赖于它来查找字符串的结尾）。

因此，为了存储第一个单词“Alexander”，我们需要这样做：

ptr_arr[0] = malloc(sizeof(char) * (9 + 1));

其中 9 是 strlen("Alexander") 的结果。请注意，我们要求的内存块大小等于char 的大小乘以 10。char 的大小为 1，因此在这种情况下它不会做任何更改，但通常你应该使用它（因为你可能想要其他数据类型甚至结构等）。

我们使数组的第一个指针指向我们刚刚动态分配的内存块。现在这个内存块属于我们，因此允许我们在其中存储数据（在我们的例子中是单词）。我们使用strcpy() 来做到这一点。

然后我们继续打印单词。

现在我们已经完成了，例如在 Python 中，您将完成为您的程序编写代码。但是现在，由于我们动态分配内存，我们需要free() 它！这是人们常犯的错误。忘记释放他们要求的内存！

我们通过释放每个指向malloc() 返回的内存的指针来做到这一点。所以如果我们调用malloc() 10 次，那么free() 也应该被调用10 次——否则就会发生内存泄漏！

废话不多说，代码如下：

#include <string.h>
#include <stdio.h>
#include <stdlib.h>

#define N 100

int fill(char* ptr_arr[N], char* data)
{
    // How many words in 'data'?
    int counter = 0;
    // Array to store current word, assuming max length will be 50
    char word[50];
    // Counter 'i' for 'word'
    int i;
    // Wihle there is still something to read from 'data'
    while(*data != '\0')
    {
        // We seek a new word
        i = 0;
        // While the current character of 'data' is not a whitespace or a null-terminator
        while(*data != ' ' && *data != '\0')
            // copy that character to word, and increment 'i'. Move to the next character of 'data'.
            word[i++] = *data++;
        // Null-terminate 'word'. 'i' is already at the value we desire, from the line above.
        word[i] = '\0';
        // If the current of 'data' is not a null-terminator (thus it's a whitespace)
        if(*data != '\0')
            // Increment the pointer, so that we skip the whitespace (and be ready to read the next word)
            data++;
        // Dynamically allocate space for a word of length `strlen(word)`
        // plus 1 for the null terminator. Assign that memory chunk to the
        // pointer positioned at `ptr_arr[counter]`.
        ptr_arr[counter] = malloc(sizeof(char) * (strlen(word) + 1));
        // Now, `ptr_arr[counter]` points to a memory block, that will
        // store the current word.

        // Copy the word to the counter-th row of the ptr_arr, and increment the counter
        strcpy(ptr_arr[counter++], word);
    }

    return counter;
}

void print(char* matrix[N], int words_no)
{
   for(int i = 0; i < words_no; ++i)
       printf("%s\n", matrix[i]);
}

void free_matrix(char* matrix[N], int words_no)
{
   for(int i = 0; i < words_no; ++i)
       free(matrix[i]);
}

int main(void)
{
    char data[] = "Alexander the Great";
    // We will store each word of 'data' to a matrix, of 'N' rows and 'LEN' columns
    char *matrix[N];
    int words_no;
    // 'fill()' populates 'matrix' with 'data' and returns the number of words contained in 'data'.
    words_no = fill(matrix, data);
    print(matrix, words_no);
    free_matrix(matrix, words_no);
    return 0;
}

输出：

Alexander
the
Great

【讨论】：

【解决方案4】：

尝试在没有strtok 的情况下执行此操作，因为我认为这是学习 C 字符串的好方法。

是的，这就是精神！

我已经在之前的回答中解释了你的代码的一些问题，所以现在我将发布一个无 strtok 的解决方案，它肯定会帮助你理解字符串的情况。将使用基本的指针算法。

专业提示：使用一张纸并绘制数组（data 和 matrix），注意其计数器的值，然后运行该纸上的程序。

代码：

#include <string.h>
#include <stdio.h>

#define N 100
#define LEN 20 // max length of a word

int fill(char matrix[N][LEN], char* data)
{
    // How many words in 'data'?
    int counter = 0;
    // Array to store current word
    char word[LEN];
    // Counter 'i' for 'word'
    int i;
    // Wihle there is still something to read from 'data'
    while(*data != '\0')
    {
        // We seek a new word
        i = 0;
        // While the current character of 'data' is not a whitespace or a null-terminator
        while(*data != ' ' && *data != '\0')
            // copy that character to word, and increment 'i'. Move to the next character of 'data'.
            word[i++] = *data++;
        // Null-terminate 'word'. 'i' is already at the value we desire, from the line above.
        word[i] = '\0';
        // If the current of 'data' is not a null-terminator (thus it's a whitespace)
        if(*data != '\0')
            // Increment the pointer, so that we skip the whitespace (and be ready to read the next word)
            data++;
        // Copy the word to the counter-th row of the matrix, and increment the counter
        strcpy(matrix[counter++], word);
    }

    return counter;
}

void print(char matrix[N][LEN], int words_no)
{
   for(int i = 0; i < words_no; ++i)
       printf("%s\n", matrix[i]);
}

int main(void)
{
    char data[] = "Alexander the Great";
    // We will store each word of 'data' to a matrix, of 'N' rows and 'LEN' columns
    char matrix[N][LEN] = {0};
    int words_no;
    // 'fill()' populates 'matrix' with 'data' and returns the number of words contained in 'data'.
    words_no = fill(matrix, data);
    print(matrix, words_no);
    return 0;
}

输出：

Alexander
the
Great

代码的要点在于函数fill()，它接受data和：

找到一个词。
将该单词一个字符一个字符地存储到一个名为 word 的数组中。
将该词复制到matrix。

棘手的部分是找到这个词。您需要遍历字符串并在遇到空格时停止，这表明我们在该迭代中读取的每个字符实际上都是单词的字母。

但是，在搜索字符串的最后一个单词时需要小心，因为当您到达该点时，您不会遇到空格。出于这个原因，你应该小心到达字符串的末尾；换句话说：空终止符。

当你这样做时，复制矩阵中的最后一个单词就完成了，但请确保正确更新指针（这是我给你的论文想法将有助于你理解的地方）。

【讨论】：

好的，谢谢。目前正在审查这个。 char matrix[N][LEN] 是如何读取/写入的？它是“一个 N 字符数组，其中每个字符指向另一个 LEN 字符数组”？所以写成[ [char array of length LEN], [char array of length LEN], [char array of length LEN], ... N ]？如果是，那与char *matrix[N]（这是一个“N 个字符串的数组”，即[ [string], [string], [string], .. N]）不一样吗？
char matrix[N][LEN] 是一个二维数组，最多可以存储N 字符串，其中每个字符串的最大长度可以是LEN。 char *matrix[N] 是一个 N 字符指针数组。所以它最多可以存储N 字符串，但是每个字符串的长度是没有定义的。希望对@user2719875有所帮助，欢迎您！ =) 你想让我修改示例并使用char *matrix[N]吗？
是的，请！你能把你当前使用char matrix[N][LEN]的例子也留下吗？我认为这对未来的读者也很有帮助（我来自 Python 背景，所以处理字符串真的很不同）。（我关于“所以它最多可以存储 N 个字符串，但是每个字符串的长度没有定义”的问题是，如果有的话，这不应该是一件好事吗？如果我们不知道每个单词有多长是... 即如果用户输入的句子中单词超过 20，那么我们的代码 char *matrix[N] 仍然可以像将每个单词缩减到 len 20) 一样工作。