如何从 FILE 加载多个“克隆”结构？ C答案

【问题标题】：How to load multiple "clones" of structure from FILE? C如何从 FILE 加载多个“克隆”结构？ C
【发布时间】：2018-09-05 16:59:37
【问题描述】：

我想学习如何从 文本文件 中加载多个结构（许多学生：姓名、姓氏、索引、地址...），如下所示：

Achilles, 9999
Hector, 9998
Menelaos, 9997
... and so on

结构可以是：

struct student_t {
    char *name;
    int index;
}

我的尝试（不起作用；我什至不确定 fgets+sscanf 是否是一个相当大的选择）：

int numStudents=3; //to simplify... I'd need a function to count num of lines, I imagine
int x, y=1000, err_code=1;

FILE *pfile = fopen("file.txt", "r");
if(pfile==0) {return 2;}

STUDENT* students = malloc(numStudents * sizeof *students);

char buffer[1024];
char *ptr[numStudents];
for (x = 0; x < numStudents; x++){ //loop for each student
    students[x].name=malloc(100); //allocation of each *name field 
    fgets(buffer, 100, pfile); //reads 1 line containing data of 1 student, to buffer
    if(x==0) *ptr[x] = strtok(buffer, ",");//cuts buffer into tokens: ptr[x] for *name
    else *ptr[x] = strtok(NULL, ","); //cuts next part of buffer
    sscanf(ptr[x], "%19s", students[x].name); //loads the token to struct field
    *ptr[y] = strtok(NULL, ","); //cuts next part of the buffer
    students[y].index = (int)strtol(ptr[y], NULL, 10); //loads int token to struct field
    *buffer='\0';//resets buffer to the beginning for the next line from x++ fgets...
    y++;//the idea with y=1000 is that I need another pointer to each struct field right?
}

for (x = 0; x < numStudents; x++)
    printf("first name: %s, index: %d\n",students[x].name, students[x].index);

return students;

然后 printf 看看加载了什么。（为了简化我有 6 个字段的真实结构）。我知道一个从用户输入加载 1 个学生的好方法...（How to scanf commas, but with commas not assigned to a structure? C）但是要加载多个，我有这个想法，但我不确定它是否太笨拙而无法工作或只是写得很糟糕。

稍后我会尝试按姓名对学生进行排序，甚至可能会尝试做一个 realloc 缓冲区，随着新学生被加载到缓冲区中，它的大小会增加......然后对什么进行排序已经加载...但我想首先我需要将它从文件加载到缓冲区，然后从缓冲区加载到填充结构，然后才能对其进行排序？...

非常感谢您的帮助！

【问题讨论】：

阅读C I/O functions的文档。
使用students[x].name=(char*)malloc(sizeof(char*)); 分配一个大小的指针，这对于字符串来说是不够的。
在 sscanf 中，&students[x].name 是错误的，应该是 students[x].name
strtok 只是将buffer 切分成令牌，它不分配内存。
我怎样才能以其他方式命名我想做的事情，这样我才能用谷歌搜索它？在我提出问题之前，我至少会花 20 个小时来解决一个问题。我知道该怎么做是：从标准输入读取结构字段，从我知道字段长度的 bin 文件中读取，但这都是关于 1 个学生的。

标签： c file structure

【解决方案1】：

C 有点苛刻。我在下面使用 GNU getline，它可能不可移植，您最终可能会自己实现。为了简单起见，我使用stdin 输入FILE *。
该程序将学生列表读入students 数组。然后我通过比较索引，然后按姓名对学生进行排序，每次都打印出来。
您的代码有点杂乱无章 - 尝试编写一个单独的函数来加载单个学生，您不需要 char ptr[students] 只需一个 char *ptr 用于 strtok 函数。 strtok 有点混乱，我更喜欢只使用 strchr 多次。我使用memcpy 只是从字符串中复制名称并记住将其分隔为空。

#define _GNU_SOURCE
#include <stdio.h>
#include <string.h>
#include <stddef.h>
#include <stdlib.h>
#include <errno.h>
#include <limits.h>

struct student_s {
    char *name;
    int index;
};

static int students_name_cmp(const void *a, const void *b)
{
    const struct student_s *s1 = a;
    const struct student_s *s2 = b;
    return strcmp(s1->name, s2->name);
}

static int students_index_cmp(const void *a, const void *b)
{
    const struct student_s *s1 = a;
    const struct student_s *s2 = b;
    return s1->index - s2->index;
}

int main()
{
    struct student_s *students = NULL;
    size_t students_cnt = 0;
    FILE *fp = stdin;
    size_t read;
    char *line = NULL;
    size_t len = 0;

    // for each line
    while ((read = getline(&line, &len, fp)) != -1) {

        // resize students!
        students = realloc(students, (students_cnt + 1) * sizeof(*students));
        // handle erros            
        if (students == NULL) {
            fprintf(stderr, "ERROR allocating students!\n");
            exit(-1);
        }

        // find the comma in the line
        const const char * const commapos = strchr(line, ',');
        if (commapos == NULL) {
            fprintf(stderr, "ERROR file is badly formatted!\n");
            exit(-1);
        }
        // student has the neme between the start to the comma adding null delimeter
        const size_t namelen = (commapos - line) + 1;
        // alloc memory for the name and copy it and null delimeter it
        students[students_cnt].name = malloc(namelen * sizeof(char));
        // handle errors
        if (students[students_cnt].name == NULL) {
             fprintf(stderr, "ERROR allocating students name!\n");
             exit(-1);
        }
        memcpy(students[students_cnt].name, line, namelen - 1);
        students[students_cnt].name[namelen] = '\0';

        // convert the string after the comma to the number
        // strtol (sadly) discards whitespaces before it, but in this case it's lucky
        // we can start after the comma
        errno = 0;
        char *endptr;
        const long int tmp = strtol(&line[namelen], &endptr, 10);
        // handle strtol errors
        if (errno) {
            fprintf(stderr, "ERROR converting student index into number\n");
            exit(-1);
        }
        // handle out of range values, I use INT_MIN/MAX cause index is int, no better idea, depends on application
        if (tmp <= INT_MIN || INT_MAX <= tmp) {
            fprintf(stderr, "ERROR index number is out of allowed range\n");
            exit(-1);
        }
        students[students_cnt].index = tmp;

        // handle the case when the line consist of any more characters then a string and a number
        if (*endptr != '\n' && *endptr != '\0') {
            fprintf(stderr, "ERROR there are some rabbish characters after the index!");
            exit(-1);
        }

        // finnally, increment students count
        students_cnt++;
    }
    if (line) {
        free(line);
    }

    // sort by index
    qsort(students, students_cnt, sizeof(*students), students_index_cmp);

    // print students out sorted by index
    printf("Students sorted by index:\n");
    for (size_t i = 0; i < students_cnt; ++i) {
        printf("student[%zu] = '%s', %d\n", i, students[i].name, students[i].index);
    }

    // now we have students. We can sort them.
    qsort(students, students_cnt, sizeof(*students), students_name_cmp);

    // print students out sorted by name
    printf("Students sorted by name:\n");
    for (size_t i = 0; i < students_cnt; ++i) {
        printf("student[%zu] = '%s', %d\n", i, students[i].name, students[i].index);
    }

    // free students, lucky them!
    for (size_t i = 0; i < students_cnt; ++i) {
        free(students[i].name);
    }
    free(students);

    return 0;
}

对于标准输入的以下输入：

Achilles, 9999
Hector, 9998
Menelaos, 9997

程序输出：

Students sorted by index:
student[0] = 'Menelaos', 9997
student[1] = 'Hector', 9998
student[2] = 'Achilles', 9999
Students sorted by name:
student[0] = 'Achilles', 9999
student[1] = 'Hector', 9998
student[2] = 'Menelaos', 9997

测试版可用here on onlinegdb。

【讨论】：

几点注意事项：（1）realloc(students, (students_cnt + 1).. 每次由1 重新分配是非常低效的。最好分配students 的初始块（比如alloc=2, 4 or 8;），然后当student_cnt == alloc，realloc 2 x alloc 和更新alloc *= 2; （2）if (commapos == NULL) 不要退出，只需跳过带有@987654343 的行@。 (3) 为什么memcpy？您可以strcpy (students[students_cnt].name, commapos + 1);，然后删除'\n'（无论哪种方式，由您决定）。 (4) 比较&line[namelen] == endptr && tmp == 0 以完全验证strtol 转换。（努力:)
@David 谢谢！ (1) 优化的好主意。但是，如今，我不再关心 malloc 开销。 (2) 好吧，作为错误处理的一部分，我决定不跳过该行 (3) memcpy 更快，我知道字符串长度，因此无需检查每个字符是否为零 (4) 谢谢！我对 strtol 何时返回错误感到非常迷茫，所以我只检查 errno 并希望设置 EINVAL 的实现；）我觉得我应该(errno == ERANGE && (tmp == LONG_MAX || tmp == LONG_MIN) || (errno != 0 && tmp == 0) || endptr == &line[namelen] 来全面检查转换错误；）但是地狱，我只检查 errno！
请记住strtol 有 2 个验证（1）检查是否有任何数字被转换。如果没有找到数字，strtol 将返回 0 和 ptr == endptr（并且未设置 errno）。 (2) 检查errno 涵盖所有其他情况（下溢/上溢等）。用strchr 和strtol 解析是否不如sscanf 更好，这是一个折腾。无论哪种方式，结果都相同，但 sscanf（到 name 的固定 buf）和验证 2 次转换发生，然后是 malloc 和 strcpy（或 strdup）可能有点短 :)