【问题标题】:Unfixable memory leak无法修复的内存泄漏
【发布时间】:2016-10-10 12:33:38
【问题描述】:

对于前面的长代码sn-p,我深表歉意,但我花了很长时间看这里,我觉得到目前为止我所看到的没有任何东西可以帮助我解决这个问题。我在课程论坛上提问过,得到了 TA 的帮助,也得到了朋友的建议,但没有什么能解决我的问题的根源。

在这个程序中,我使用树来创建拼写检查器。我的代码中有很多东西需要修复,但内存泄漏是我真正需要帮助解决的唯一问题。

问题是我相当确定我为我的节点分配了正确的空间量,我认为 Valgrind 证实了这一点,因为我只有 2 个未释放的块(在 365,371 个分配中)。

无论如何,我会发布整个代码(以防有人需要完整的上下文),但我认为相关部分是加载函数和清除函数,我分别在其中分配和释放内存。

/**
c* Implements a dictionary's functionality.
*/
#include <ctype.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include "dictionary.h"

// number of characters we are using (a-z and ')
#define LETTERS 27

// max guaranteed number of nonnegative char values that exist
#define CHARVALUES 128

// create node structure for trie
typedef struct node
{
    struct node *children[LETTERS];
    bool is_word;
}
node;

// create root node for trie
node *root;

// stores the size of our dictionary
unsigned int dict_size = 0;

/**
 * Returns true if word is in dictionary else false.
 */
bool check(const char *word)
{
    // keeps track of where we are; starts with root for each new word
    node *current_node = root;

    while (*word != '\0')
    {

        // indices: 'a' -> 0, ..., 'z' -> 25, '\' -> 26
        int index = (tolower(*word) - 'a') % CHARVALUES;
        if (index >= LETTERS - 1)
        {
            // by assumption, the char must be '\'' if not '\n' or a letter
            index = LETTERS - 1;
        }

        // if the node we need to go to is NULL, the word is not here
        if (current_node->children[index] == NULL)
        {
            return false;
        }

        // go to the next logical node, and look at the next letter of the word
        current_node = current_node->children[index];
        word++;
    }
    return current_node->is_word;
}

/**
 * Loads dictionary into memory. Returns true if successful else false.
 */
bool load(const char *dictionary)
{

    FILE *inptr = fopen(dictionary, "r");
    if (inptr == NULL)
    {
        return false;
    }

    // allocate memory for the root node
    root = malloc(sizeof(node));

    // store first letter (by assumption, it must be a lowercase letter)
    char letter = fgetc(inptr);

    // stores indices corresponding to letters
    int index = 0;

    /**
     * we can assume that there is at least one word; we will execute the loop
     * and assign letter a new value at the end. at the end of each loop, due
     * to the inside loop, letter will be a newline; we know the EOF in the
     * dictionary follows a newline, so the loop will terminate appropriately
     */
    do
    {
        // keeps track of where we are; starts with root for each new word
        node *current_node = root; 

        // this loop will only execute if our character is a letter or '\''
        while (letter != '\n')
        {
            // indices: 'a' -> 0, ..., 'z' -> 25, '\' -> 26
            index = (letter - 'a') % CHARVALUES;
            if (index >= LETTERS - 1)
            {
                // by assumption, the char must be '\'' if not '\n' or a letter
                index = LETTERS - 1;
            }

            // allocate memory for a node if we have not done so already
            if (current_node->children[index] == NULL)
            {
                current_node->children[index] = malloc(sizeof(node));

                // if we cannot allocate the memory, unload and return false
                if (current_node->children[index] == NULL)
                {
                    unload();
                    return false;
                }

            }

            // go to the appropriate node for the next letter in our word
            current_node = current_node->children[index];

            // get the next letter
            letter = fgetc(inptr);
        }

        // after each linefeed, our current node represents a dictionary word
        current_node->is_word = true;
        dict_size++;

        // get the next letter
        letter = fgetc(inptr);
    }
    while (letter != EOF);

    fclose(inptr);

    // if we haven't returned false yet, then loading the trie must have worked
    return true;
}

/**
 * Returns number of words in dictionary if loaded else 0 if not yet loaded.
 */
unsigned int size(void)
{
    return dict_size;
}

void clear(node *head)
{
    for (int i = 0; i < LETTERS; i++)
    {
        if (head->children[i] != NULL)
        {
            clear(head->children[i]);
        }
    }
    free(head);
}

    /**
     * Unloads dictionary from memory. Returns true if successful else false.
     */
    bool unload(void)
    {
        clear(root);
        return true;
    }

相关的valgrind输出如下:

==18981== HEAP SUMMARY:
==18981==     in use at exit: 448 bytes in 2 blocks
==18981==   total heap usage: 365,371 allocs, 365,369 frees, 81,843,792 bytes allocated
==18981== 
==18981== 448 (224 direct, 224 indirect) bytes in 1 blocks are definitely lost in loss record 2 of 2
==18981==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==18981==    by 0x4011B0: load (dictionary.c:111)
==18981==    by 0x4008CD: main (speller.c:40)
==18981== 
==18981== LEAK SUMMARY:
==18981==    definitely lost: 224 bytes in 1 blocks
==18981==    indirectly lost: 224 bytes in 1 blocks
==18981==      possibly lost: 0 bytes in 0 blocks
==18981==    still reachable: 0 bytes in 0 blocks
==18981==         suppressed: 0 bytes in 0 blocks
==18981== 1 errors in context 3 of 11:
==18981== 
==18981== 
==18981== Invalid read of size 8
==18981==    at 0x40120C: load (dictionary.c:123)
==18981==    by 0x4008CD: main (speller.c:41)
==18981==  Address 0xb3fde70 is 16 bytes before a block of size 224 alloc'd
==18981==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==18981==    by 0x4011CB: load (dictionary.c:111)
==18981==    by 0x4008CD: main (speller.c:41)
==18981== 
==18981== 
==18981== 1 errors in context 4 of 11:
==18981== Invalid read of size 8
==18981==    at 0x4011E0: load (dictionary.c:114)
==18981==    by 0x4008CD: main (speller.c:41)
==18981==  Address 0xb3fde70 is 16 bytes before a block of size 224 alloc'd
==18981==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==18981==    by 0x4011CB: load (dictionary.c:111)
==18981==    by 0x4008CD: main (speller.c:41)
==18981== 
==18981== 
==18981== 1 errors in context 5 of 11:
==18981== Invalid write of size 8
==18981==    at 0x4011D4: load (dictionary.c:111)
==18981==    by 0x4008CD: main (speller.c:41)
==18981==  Address 0xb3fde70 is 16 bytes before a block of size 224 alloc'd
==18981==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==18981==    by 0x4011CB: load (dictionary.c:111)
==18981==    by 0x4008CD: main (speller.c:41)
==18981== 
==18981== 
==18981== 1 errors in context 6 of 11:
==18981== Invalid read of size 8
==18981==    at 0x4011B2: load (dictionary.c:109)
==18981==    by 0x4008CD: main (speller.c:41)
==18981==  Address 0xb3fde70 is 16 bytes before a block of size 224 alloc'd
==18981==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==18981==    by 0x4011CB: load (dictionary.c:111)
==18981==    by 0x4008CD: main (speller.c:41)

所以,我对这个输出的解释是,在下面的代码块中:

        if (current_node->children[index] == NULL)
        {
            current_node->children[index] = malloc(sizeof(node));

            // if we cannot allocate the memory, unload and return false
            if (current_node->children[index] == NULL)
            {
                unload();
                return false;
            }

        }

malloc 语句(实际上是行 dictionary.c:111)被执行两次,这样分配的内存就永远不会被释放。 (这是正确的吗?)现在,这让我认为真正的问题在于我的 clear 函数,即它写得不好并且没有清除我的 trie 的每个节点。

但是,我已经盯着代码看了好几个小时,我真的看不出它有什么问题。 (我敢肯定很多;我只是不太擅长这个。)

对此的任何帮助将不胜感激。

作为旁注:我有多个人(不是课程工作人员)告诉我应该将孩子数组中的所有指针初始化为 NULL,但课程工作人员直接告诉我这是可选的,我已经两种方式都进行了测试,结果相同。我知道这可能是一个可移植性的东西,即使它在技术上是这样“工作”的,但只知道那不是我正在寻找的解决方案,因为我知道还有其他一些根本原因(即一个这导致它根本无法在任何设备上运行......)

再次,如果您能以任何方式帮助解决我的逻辑问题,我将不胜感激。我一直试图解决这个问题几个小时都无济于事。

【问题讨论】:

  • 故意的。正如我所提到的,我在使用和不使用指针初始化(针对根数组和所有子数组)的情况下测试了程序,结果相同。
  • 实际上,我们是在运行时排名的,所以从技术上讲,只要它可以编译并且“工作”而不会泄漏内存,我们就被鼓励削减任何不必要的东西,所以我削减了它。不过,我承认这是一种“错误”的总体开发方法。
  • 是的,这正是这个问题,在我开始认为它无法修复之前
  • 在担心速度之前先做好准备。记住优化的基本规则:(1)不要这样做! (2) (仅限专家)不要这样做。
  • 您在代码中有明显的越界读取,您的问题是关于泄漏?

标签: c memory memory-management memory-leaks trie


【解决方案1】:
root = malloc(sizeof(node));

这会产生一大块未初始化的内存。

if (current_node->children[index] == NULL)

这里假设内存已经被初始化,实际上它是垃圾。

您需要在使用root 之前对其内容进行初始化,或者使用 calloc 将它们全部设置为零。

【讨论】:

  • 正如我所说,我已经测试了代码,无论是否将这些指针初始化为 NULL。我也测试了有无calloc的情况。它除了可移植性没有任何帮助,实际上不鼓励这种分配,因为它不是必需的,并且会显着增加运行时间。这不是我的内存错误的根源(至少在编译代码的机器上,这是最重要的),考虑到我在采取这些预防措施时会遇到完全相同的错误。
  • @mattstone malloc:ed 内存的内容可以是任何垃圾。这不是可移植性问题,您总是必须初始化它们。如果您的程序由于某种原因在您没有这样做的情况下运行,那完全是(坏)运气。
  • 我不确定为什么,但我们被直接告知不需要这样做。公平地说,我们在线提交代码,在其他地方对其进行编译和正确性测试,所以也许他们已经解决了?我不知道他们为什么这样做。我同意这不聪明。但结果是一致的。这也可能与我分配所有内存然后释放所有内存有关,而一旦程序已经运行,则无需任何复杂的交互。
  • @mattstone 告诉你的人是无能的。这是很基本的东西。正式地,它由标准 C11 7.22.3.4 “malloc 函数为大小由大小指定且其值不确定的对象分配空间。”
  • @mattstone 此外,您应该仔细检查int index = (tolower(*word) - 'a') % CHARVALUES; 的值,以确保它没有被设置为越界。
【解决方案2】:

在使用 calloc() 切换两个 malloc() 语句后(如其他人所建议的那样;这消除了您原本众多的 valgrind 错误),添加了一个小样本字典和以下简约 main():

int main() {

    load("dict.txt");

    printf("Checked: %i\n", check("hello"));
    printf("Checked: %i\n", check("sdfsdf"));

    unload();

    return 0;
}

...您的代码运行得更干净,没有任何内存泄漏:

==636== HEAP SUMMARY:
==636==     in use at exit: 0 bytes in 0 blocks
==636==   total heap usage: 15 allocs, 15 frees, 42,688 bytes allocated
==636==
==636== All heap blocks were freed -- no leaks are possible
==636==
==636== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 12 from 8)

一个明显的泄漏是如果你从 load() 返回 false - 你不会释放文件指针。

编辑:当您将大写单词引入字典时,Valgrind 开始(再次)抛出各种错误。所以把你的调试工作集中在那里。

【讨论】:

    猜你喜欢
    • 2017-08-11
    • 1970-01-01
    • 2012-01-12
    • 2011-08-13
    • 2018-02-23
    • 2021-09-27
    • 2013-07-29
    • 2017-07-29
    相关资源
    最近更新 更多