将文本文件读入纯 C 中的数组答案

【问题标题】：Read a text file into an array in plain C将文本文件读入纯 C 中的数组
【发布时间】：2011-12-27 18:09:06
【问题描述】：

有没有办法将文本文件读入纯 C 中的一维数组？这是我尝试过的（我正在写刽子手）：

int main() {
    printf("Welcome to hangman!");

    char buffer[81];
    FILE *dictionary;
    int random_num;
    int i;
    char word_array[80368];

    srand ( time(NULL) );

    random_num = rand() % 80368 + 1;
    dictionary = fopen("dictionary.txt", "r");

    while (fgets(buffer, 80, dictionary) != NULL){
        printf(buffer); //just to make sure the code worked;
        for (i = 1; i < 80368; i++) {
            word_array[i] = *buffer;
        }
    }

    printf("%s, \n", word_array[random_num]);
    return 0;
}

这里有什么问题？

【问题讨论】：

Word_array 应该是 char * 的数组，而不是 char。缓冲区需要动态分配（每个 fgets 都会覆盖您的单个缓冲区，并且 word_array 分配将全部相同。您应该查找一些示例 C 文本处理。

标签： c file gcc io stdio

【解决方案1】：

尝试改变一些东西；

首先；你正在存储一个字符。 word_array[i] = *buffer; 表示将单个字符（行/缓冲区中的第一个字符）复制到 word_array 中的每个（和每个）单字符槽中。

其次，您的数组将包含 80K 个字符，而不是 80K 个单词。假设这是您的字典文件的长度，您无法使用该循环将其全部放入其中。

我假设您的字典文件中有 80,368 个单词。不过，这比我的工作站上的 /usr/share/dict/words 少了大约 400,000 个字，但对于刽子手来说，这听起来很合理……

如果您有意使用一维数组，出于某种原因，您必须做以下三件事之一：

假设您在大型机上，每个单词使用 80 个字符：

  char word_array[80368 * 80];

memcpy (&(word_array[80 * i]), buffer, 80);

创建一个并行数组，索引指向巨大缓冲区中每行的开头

   int last_char = 0;
   char* word_start[80368];
   char word_array[80368 * 80];
   for ( … i++ ) {
       memcpy (&word_array[last_char], buffer, strlen(buffer));
       word_start[i] = last_char;
       last_char += strlen(buffer);
   }

切换到使用指向 char 的指针数组，每个槽一个字。

  char* word_array[80368];

  for (int i = 0; i < 80368, i++) {
       fgets (buffer, 80, dictionary);
       word_array[i] = strdup (buffer);
  }

我推荐后者，否则您必须猜测最大大小或在阅读时浪费大量 RAM。（如果您的平均字长约为 4-5 个字符，如在英语中，那么您平均每个字浪费了 75 个字节。）

我还建议动态分配 word_array：

   int max_word = 80368;
   char** word_array = malloc (max_word * sizeof (char*));

...如果您的字典大小发生变化，这可以让您更安全地阅读：

   int i = 0;
   while (1) {
        /* If we've exceeded the preset word list size, increase it. */
        if ( i > max_word ) {
            max_word *= 1.2; /* tunable arbitrary value */
            word_array = realloc (word_array, max_word * sizeof(char*));
        }
        /* Try to read a line, and… */
        char* e = fgets (buffer, 80, dictionary);
        if (NULL == e) { /* end of file */
            /* free any unused space */
            word_array = realloc (word_array, i * sizeof(char*));
            /* exit the otherwise-infinite loop */
            break;
        } else {
            /* remove any \r and/or \n end-of-line chars */
            for (char *s = &(buffer[0]); s < &(buffer[80]); ++s) {
               if ('\r' == *s || '\n' == *s || '\0' == *s) {
                  *s = '\0'; break;
               }
            }
            /* store a copy of the word, only, and increment the counter.
             * Note that `strdup` will only copy up to the end-of-string \0,
             * so you will only allocate enough memory for actual word
             * lengths, terminal \0's, and the array of pointers itself. */
            *(word_array + i++) = strdup (buffer);
        }
    }
    /* when we reach here, word_array is guaranteed to be the right size */
    random = rand () % max_word;
    printf ("random word #%d: %s\n", random, *(word_array + random));

不好意思，发的比较匆忙，所以没有测试上面的。警告购买者。

【讨论】：

【解决方案2】：

这部分错了：

while (fgets(buffer, 80, dictionary) != NULL){
    printf(buffer); //just to make sure the code worked;
    for (i = 1; i < 80368; i++) {
        word_array[i] = *buffer;
    }
}

您正在从大小为 81 的 buffer 复制 80368 个字符。将其更改为：

i = 0;
while (fgets(buffer, 80, dictionary) != NULL){
    printf(buffer); //just to make sure the code worked;
    for (j = 0; j < 80; j++) {
        word_array[i++] = buffer[j];
    }
}

【讨论】：

fyi，你错过了第 1 行的分号 ;-)
@AhmadGaffoor 没什么用，我只是不记得 fgets 是否保证为 0 终止并感到偏执。
printf("%s, \n", word_array[random_num]);
该行不打印相应的数组值。它打印出来（空）。我做错了什么？是标识符吗？
...这会将buffer中的80个字符（需要或不需要）复制到word_array的连续部分...但是他的随机选择代码将选择一个字符偏移量来跳转到数组…