sscanf 在大数字上的分段错误答案

【问题标题】：Segmentation fault with sscanf on big numbersscanf 在大数字上的分段错误
【发布时间】：2017-08-16 20:40:04
【问题描述】：

我是 C 的新手，我正在尝试逐行读取一个大文件（>30m 行）并将每行的一些值存储到一个数组中。输入文件的格式为：

1. inode    100660 uid  66322 gid  66068 bytes       5848 blks        128
2. inode    100662 uid  66492 gid  66076 bytes        159 blks          0
3. inode    100647 uid  66419 gid  66068 bytes        235 blks          0
4. inode 100663302 uid  66199 gid  66068 bytes        131 blks          0
5. inode 100663311 uid  66199 gid  66068 bytes        134 blks          0

这是我的代码：

void loadArrayFromFile(char * filename) {
long bytesArray[380000000];
FILE * myfile;
myfile = fopen(filename, "r");
char line[1024];
char inodeText[10];
long int inode = 0;
int mybytes = 0;

if(myfile == NULL) {
    printf("No file found \n");
    exit(EXIT_FAILURE);
}

while(fgets(line, sizeof(line), myfile)) {
    int x = (sscanf(line, "%s %ld %*s %*d %*s %*d %*[bytes] %d %*[^\n]", inodeText, &inode, &mybytes));
    if(x > 1) {
        bytesArray[inode] = mybytes;
    }
}

这个代码在前 3 行运行良好，但是当它到达第 4 行时，我得到一个 Segmentation Fault (core dumped) 错误。我怀疑这与 inode 值太大而无法存储到 int 中有关，即使 int 可以存储的最大值是 2147483647。有人可以帮我解决问题吗？

【问题讨论】：

inodeText 缓冲区只有 10 个字节长。够了吗？
我能看到的最长的单词是bytes我想是
bytesArray 定义在哪里？
输入示例中的1. 和2. 等是否是被扫描文本的一部分？如果是，那么第二个格式规范应该是%s。
@dbush bytesArray 在此方法的范围之外定义。为了便于阅读，我刚刚将其添加到代码块中。行号不是文件的一部分，我只是添加它们以便更容易地调用这些行。 inodeText 仅用于包含字符串“inode”，所以是的，缓冲区就足够了。

标签： c file scanf

【解决方案1】：

您使用索引节点号作为索引bytesArray。你没有显示这个数组有多大，但我敢打赌它比 100663302 小得多。所以你写的是数组的末尾。这会调用undefined behavior。

不要使用 inode 编号作为索引，而是使用包含 inode 编号和文件大小的 struct，并使用这些数组以及数组中元素的计数。

struct entry {
    int inode;
    int nbytes;
};

struct entry entryArray[10];   // assuming there are no more than 10 lines in the file
int arrayLen = 0;

...

while(fgets(line, sizeof(line), myfile)) {
    int x = (sscanf(line, "%s %ld %*s %*d %*s %*d %*[bytes] %d %*[^\n]", inodeText, &inode, &mybytes));
    if(x > 1) {
        entryArray[arrayLen].inode = inode;
        entryArray[arrayLen].nbytes = mybytes;
        arrayLen++;
    }
}

【讨论】：

我怎么会错过这个。如此简单的修复。谢谢！
我想知道为什么我们可以传递比变量更多的说明符并且无法在任何地方找到它。你能解释几行或发布一些链接吗？
但在代码中 long bytesArray[380000000]; 应该适合（除非堆栈首先爆炸:)）
@FilipKočica——你不能："If there are insufficient arguments for the format, the behavior is undefined."。但是* 禁止分配。
@Jean-François Fabre 第 4 行的 inode 为 100m，但 bytesArray 仅针对 38m 值初始化，因此导致分段错误！