对您的代码进行这种基本上微不足道的修改会处理您代码中的几个问题。
- 你不应该这样使用
feof()——`while (!feof(file)) is always wrong。
- 您不应读取不属于刚刚读取的字符串的数据。
我还重构了您的代码,以便该函数获取文件名、打开、计数和关闭它,并报告它找到的内容。
#include <stdio.h>
#include <string.h>
// Revised interface - process a given file name, reporting
static void commentChars(char const *file)
{
char str[256];
FILE *fp;
long commentCount = 0;
if (!(fp = fopen(file, "r")))
{
fprintf(stderr, "Error! File %s not found\n", file);
return;
}
while (fgets(str, sizeof(str), fp) != 0)
{
int len = strlen(str);
for (int i = 0; i <= len; i++)
{
if (str[i] == '/' && str[i + 1] == '/')
{
commentCount += (strlen(str) - 2);
break;
}
}
}
fclose(fp);
printf("%s: Number of characters contained in comments: %ld\n", file, commentCount);
}
int main(int argc, char **argv)
{
if (argc == 1)
commentChars("/dev/stdin");
else
{
for (int i = 1; i < argc; i++)
commentChars(argv[i]);
}
return 0;
}
在源代码 (ccc.c) 上运行时,会产生:
ccc.c: Number of characters contained in comments: 58
评论并不完整(哎呀),但它有助于显示发生了什么。它计算 fgets() 保留作为评论一部分的换行符,但不计算 // 介绍人。
处理/* cmets 更难。您需要找出一个斜线后跟一个星号,然后阅读下一个星号斜线字符对。这可能比使用逐行输入更容易完成。至少,您需要能够将字符分析与行输入交错。
当你准备好后,你可以在你的程序上尝试这个折磨测试。它是我用来检查我的评论剥离器 SCC 的工具(它不处理三元组——通过有意识的决定;如果源包含三元组,我有一个三元组去除器,我首先在源上使用它)。
/*
@(#)File: $RCSfile: scc.test,v $
@(#)Version: $Revision: 1.7 $
@(#)Last changed: $Date: 2013/09/09 14:06:33 $
@(#)Purpose: Test file for program SCC
@(#)Author: J Leffler
*/
/*TABSTOP=4*/
// -- C++ comment
/*
Multiline C-style comment
#ifndef lint
static const char sccs[] = "@(#)$Id: scc.test,v 1.7 2013/09/09 14:06:33 jleffler Exp $";
#endif
*/
/*
Multi-line C-style comment
with embedded /* in line %C% which should generate a warning
if scc is run with the -w option
Two comment starts /* embedded /* in line %C% should generate one warning
*/
/* Comment */ Non-comment /* Comment Again */ Non-Comment Again /*
Comment again on the next line */
// A C++ comment with a C-style comment marker /* in the middle
This is plain text under C++ (C99) commenting - but comment body otherwise
// A C++ comment with a C-style comment end marker */ in the middle
The following C-style comment end marker should generate a warning
if scc is run with the -w option
*/
Two of these */ generate */ one warning
It is possible to have both warnings on a single line.
Eg:
*/ /* /* */ */
SCC has been trained to handle 'q' single quotes in most of
the aberrant forms that can be used. '\0', '\\', '\'', '\\
n' (a valid variant on '\n'), because the backslash followed
by newline is elided by the token scanning code in CPP before
any other processing occurs.
This is a legitimate equivalent to '\n' too: '\
\n', again because the backslash/newline processing occurs early.
The non-portable 'ab', '/*', '*/', '//' forms are handled OK too.
The following quote should generate a warning from SCC; a
compiler would not accept it. '
\n'
" */ /* SCC has been trained to know about strings /* */ */"!
"\"Double quotes embedded in strings, \\\" too\'!"
"And \
newlines in them"
"And escaped double quotes at the end of a string\""
aa '\\
n' OK
aa "\""
aa "\
\n"
This is followed by C++/C99 comment number 1.
// C++/C99 comment with \
continuation character \
on three source lines (this should not be seen with the -C flag)
The C++/C99 comment number 1 has finished.
This is followed by C++/C99 comment number 2.
/\
/\
C++/C99 comment (this should not be seen with the -C flag)
The C++/C99 comment number 2 has finished.
This is followed by regular C comment number 1.
/\
*\
Regular
comment
*\
/
The regular C comment number 1 has finished.
/\
\/ This is not a C++/C99 comment!
This is followed by C++/C99 comment number 3.
/\
\
\
/ But this is a C++/C99 comment!
The C++/C99 comment number 3 has finished.
/\
\* This is not a C or C++ comment!
This is followed by regular C comment number 2.
/\
*/ This is a regular C comment *\
but this is just a routine continuation *\
and that was not the end either - but this is *\
\
/
The regular C comment number 2 has finished.
This is followed by regular C comment number 3.
/\
\
\
\
* C comment */
The regular C comment number 3 has finished.
Note that \u1234 and \U0010FFF0 are legitimate Unicode characters
(officially universal character names) that could appear in an
id\u0065ntifier, a '\u0065' character constant, or in a "char\u0061cter\
string". Since these are mapped long after comments are eliminated,
they cannot affect the interpretation of /* comments */. In particular,
none of \u0002A. \U0000002A, \u002F and \U0000002F ever constitute part
of a comment delimiter ('*' or '/').
More double quoted string stuff:
if (logtable_out)
{
sprintf(logtable_out,
"insert into %s (bld_id, err_operation, err_expected, err_sql_stmt, err_sql_state)"
" values (\"%s\", \"%s\", \"%s\", \"", str_logtable, blade, operation, expected);
/* watch out for embedded double quotes. */
}
/* Non-terminated C-style comment at the end of the file