使用 fscanf 读取字符串和空行答案

【问题标题】：Use fscanf to read strings and empty lines使用 fscanf 读取字符串和空行
【发布时间】：2018-03-21 06:06:50
【问题描述】：

我有一个包含关键字和整数的文本文件，并且可以访问文件流来解析这个文件。

我可以通过这样做来解析它 while( fscanf(stream, "%s", word) != -1 ) 获取文件中的每个单词和 int 供我解析，但我遇到的问题是我无法检测到空行“\n”，然后我需要检测一些东西。我可以看到 \n 是一个字符，因此没有被 %s 检测到。我可以做些什么来修改 fscanf 以获取 EOL 字符？

【问题讨论】：

%s 扫描字符直到遇到空格，无论是空格字符、制表符、换行符等。如果读取空行，我希望word为“空”（仅包含一个空终止符）。除非它只是跳过换行符并读取文件中的下一个单词。您可能不得不求助于使用%c 并逐个字符而不是逐字读取文件。这是 C++ 通常更好的一个领域，因为它具有逐行读取文件的功能。
在 C 中，您可以使用 fgets() 或 fscanf("%[^\n]\n") 逐行读取。见Traverse FILE line by line using fscanf
所以没有简单的方法来做我想做的事？
@penu 使用fgets() 是最简单的方法。
@RemyLebeau fscanf(f, "%[^\n]\n", word ) 不足以读取带有'\n' 的行。 '\n' 将保留在 f 中，word 将保持不变。

标签： c scanf

【解决方案1】：

你可以用fscanf 做你想做的事，但是与使用正确的面向行的输入函数，如fgets。

使用fgets（或POSIX getline）检测空行不需要什么特别的，或者除了读取正常行之外。例如，要读取一行带有fgets 的文本，您只需提供一个足够大的缓冲区并进行一次调用即可将'\n' 读取到buf 并将其包括在内：

while (fgets (buf, BUFSZ, fp)) {        /* read each line in file */

要检查该行是否为空行，您只需检查buf 中的第一个字符是否为'\n' 字符，例如

    if (*buf == '\n')
        /* handle blank line */

或者，在正常情况下，您将通过获取长度并用 nul-terminating 字符覆盖 '\n' 来删除尾随的 '\n'。在这种情况下，您可以简单地检查长度是否为0（删除后），例如

    size_t len = strlen (buf);          /* get buf length */
    if (len && buf[len-1] == '\n')      /* check last char is '\n' */
        buf[--len] = 0;                 /* overwrite with nul-character */

(注意：如果最后一个字符不是'\n'，您知道该行比缓冲区长，并且该行中的字符仍然未读——并且将在下一次调用时读取fgets，或者你已经到达文件末尾，最后一行结尾是非 POSIX 行）

总而言之，一个使用fgets 识别空行并提供打印完整行的示例，即使该行超过缓冲区长度，您可以执行以下操作：

#include <stdio.h>
#include <string.h>

#define BUFSZ 4096

int main (int argc, char **argv) {

    size_t n = 1;
    char buf[BUFSZ] = "";
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
        return 1;
    }

    while (fgets (buf, BUFSZ, fp)) {        /* read each line in file */
        size_t len = strlen (buf);          /* get buf length */
        if (len && buf[len-1] == '\n')      /* check last char is '\n' */
            buf[--len] = 0;                 /* overwrite with nul-character */
        else {   /* line too long or non-POSIX file end, handle as required */
            printf ("line[%2zu] : %s\n", n, buf);
            continue;
        }   /* output line (or "empty" if line was empty) */
        printf ("line[%2zu] : %s\n", n++, len ? buf : "empty");
    }
    if (fp != stdin) fclose (fp);           /* close file if not stdin */

    return 0;
}

输入文件示例

$ cat ../dat/captnjack2.txt
This is a tale

Of Captain Jack Sparrow

A Pirate So Brave

On the Seven Seas.

使用/输出示例

$ ./bin/fgetsblankln ../dat/captnjack2.txt
line[ 1] : This is a tale
line[ 2] : empty
line[ 3] : Of Captain Jack Sparrow
line[ 4] : empty
line[ 5] : A Pirate So Brave
line[ 6] : empty
line[ 7] : On the Seven Seas.

那么为什么大家都推荐fgets？

好吧，让我们看看用fscanf 做同样的事情，我会让你来评判。首先，fscanf 不会读取或包含带有"%s" 格式说明符 的尾随'\n'（默认情况下）或使用字符类时@987654347 @ （因为它被明确排除在外）。因此，您无法使用相同的格式字符串读取 (1) 有字符的行和 (2) 没有字符的行。您要么读取字符并且fscanf 成功，要么不读取字符并且遇到匹配失败。

因此，正如 cmets 中所提到的，如果输入缓冲区中的下一个字符是 '\n' 字符，则必须使用 fgetc（或 getc）预先检查，然后如果不是，则使用ungetc 将其放回输入缓冲区。

进一步添加到您的fscanf 任务，您必须独立地验证每张支票、放回并阅读沿途的每一步。这导致需要进行大量检查来处理所有情况并提供所有必要的检查以避免未定义的行为。

作为这些检查的一部分，您需要将读取的字符数限制为比缓冲区中的字符数少一个，同时捕获下一个字符以确定该行是否太长而无法容纳。需要额外的检查来处理（不失败）在最后一行具有非 POSIX 行结尾的文件——fgets 处理的东西没有问题。

下面是与上面fgets 代码类似的实现。仔细阅读并了解为什么每个步骤都是必要的，以及每个验证要防止什么。您可能可以稍微重新排列，但它已被削减到接近最低限度。看完之后，应该会明白为什么fgets 是处理空行检查的首选方法（通常还有面向行的输入）

#include <stdio.h>

#define BUFSZ 4096

int main (int argc, char **argv) {

    int c = 0, r = 0;
    size_t n = 1;
    char buf[BUFSZ] = "", nl = 0;
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
        return 1;
    }

    for (;;) {  /* loop until EOF */
        if ((c = fgetc (fp)) == '\n')   /* check next char is '\n' */
            *buf = 0;                   /* make buf empty-string */
        else {
            if (c == EOF)               /* check if EOF */
                break;
            if (ungetc (c, fp) == EOF) {    /* ungetc/validate */
                fprintf (stderr, "error: ungetc failed.\n");
                break;
            }
            /* read line into buf and '\n' into nl, handle failure */
            if ((r = fscanf (fp, "%4095[^\n]%c", buf, &nl)) != 2) {
                if (r == EOF) {         /* EOF (input failure) */
                    break;
                } /* check next char, if not EOF, non-POSIX eol */
                else if ((c = fgetc (fp)) != EOF) {
                    if (ungetc (c, fp) == EOF) {    /* unget it */
                        fprintf (stderr, "error: ungetc failed.\n");
                        break;
                    } /* read line again handling non-POSIX eol */
                    if (fscanf (fp, "%4095[^\n]", buf) != 1) {
                        fprintf (stderr, "error: fscanf failed.\n");
                        break;
                    }
                }
            } /* good fscanf, validate nl = '\n' or line to long */
            else if (nl != '\n') {
                fprintf (stderr, "error: line %zu too long.\n", n);
                break;
            }
        } /* output line (or "empty" for empty line) */
        printf ("line[%2zu] : %s\n", n++, *buf ? buf : "empty");
    }

    if (fp != stdin) fclose (fp);     /* close file if not stdin */

    return 0;
}

使用/输出与上述相同。如果您有任何其他问题，请仔细查看并告诉我。

【讨论】：

当然，很高兴为您提供帮助。这是文件 I/O 的具体细节，总是适合新的 C 程序员。大多数教程只提供所涉及的一些问题，即使您坐下来从头到尾阅读man scanf，比较/对比示例仍然会有所帮助。