【问题标题】:C - Sorting strings in a file, but with limited memory usageC - 对文件中的字符串进行排序,但内存使用量有限
【发布时间】:2021-02-18 11:16:50
【问题描述】:

我正在编写一个程序,它生成字符数组并将它们添加到文件中,所以它就像一个字符数组数组。生成部分很好,文件也被正确创建和保存。 但是,其中一项任务说我应该使用插入排序(也使用库和系统选项)对文件中的记录进行排序。排序的关键是记录的第一个字节的值。另外,我一次只能在内存中有两条记录。

到目前为止,我已经得到了这个代码:生成recordsAmount 大小为bufferSize 的记录。

void generate(char *path, int recordsAmount, int bufferSize)
{

    int i,j;

    int start = open(path,O_CREAT| O_TRUNC | O_WRONLY | S_IRUSR | S_IWUSR);
    for(i = 0; i < recordsAmount; i++){
             char buffer[bufferSize];
        for(j = 0; j < bufferSize; j++){
            buffer[j] = 'A' + (rand() % 26);
            write(start, &buffer[j], sizeof(buffer[j]));

        }

    write(start, "\n", 1);
    }

}

然后我有一个使用系统选项的排序函数的代码:

void sysSort(char *path, int recordsAmount, int bufferSize)
{
    int file = open(path,O_RDWR);
    if((file = open(path, O_RDWR)) == NULL){
        printf("Error while opening  a file");
    }
    int i, j;
    for(i = 1; i < recordsAmount; i++)
    {
       unsigned char bufferOne[bufferSize+1];
       unsigned  char bufferTwo[bufferSize+1];
        lseek(file, sizeof(bufferOne)*i, SEEK_SET);
        read(file, &bufferOne, sizeof(bufferOne));

         j = i - 1;


        lseek(file, sizeof(bufferTwo)*j, SEEK_SET);
        read(file, &bufferTwo, sizeof(bufferTwo));
        while((unsigned char) bufferTwo[0] > (unsigned char) bufferOne[0]){

        lseek(file, sizeof(bufferTwo)* (j+1), SEEK_SET);
        write(file, &bufferTwo, sizeof(bufferTwo));
        j--;
        lseek(file, sizeof(bufferTwo) * j, SEEK_SET);
        read(file, &bufferTwo, sizeof(bufferTwo));
        }
        lseek(file, sizeof(bufferOne) * (j+1), SEEK_SET);
        write(file, &bufferOne, sizeof(bufferOne));

    }
}

与库函数排序非常相似:

void libSort(char *path, int recordsAmount, int bufferSize)
{
    FILE *file = fopen(path, "r+");
    int i, j;
    for(i = 1; i < recordsAmount; i++)
    {
        char bufferOne[bufferSize+1];
         char bufferTwo[bufferSize+1];
        fseek(file, i* sizeof(bufferOne),SEEK_SET);
        fread(bufferOne, sizeof(char),(size_t) bufferSize ,file);
         j = i-1;

          fseek(file, sizeof(bufferTwo), SEEK_SET);
          fread(bufferTwo, sizeof(unsigned char), (size_t) bufferSize, file);
         while((unsigned char) bufferTwo[0] > (unsigned char) bufferOne[0]){
             fseek(file, sizeof(bufferTwo)* (j+1), SEEK_SET);
             fwrite(&bufferTwo, sizeof(unsigned char), (size_t) bufferSize, file);
            j--;
            fseek(file, sizeof(bufferTwo) * i, SEEK_SET);
            fread(&bufferTwo, sizeof(unsigned char), (size_t) bufferSize, file);

         }
        fseek(file, sizeof(bufferOne) * (j+1), SEEK_SET );
        fwrite(&bufferOne, sizeof(unsigned char), (size_t) bufferSize, file);
    }
}

但是,当我使用 sysSort 时 - 没有任何反应,文件保持不变。当我尝试使用libSort 时,出现分段错误。我不确定是什么原因造成的,有什么想法吗?任何帮助,将不胜感激 编辑,按照 LSerni 的建议,我使用 valgrind 进行排序。我尝试使用系统和库函数对具有 10 行 10 个字符的文件 ZZZ 进行排序。 对于系统:

==803== Memcheck, a memory error detector
==803== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==803== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==803== Command: ./program2 sort ZZZ 10 10 sys
==803==
==803== Warning: invalid file descriptor -1 in syscall read()
==803== Warning: invalid file descriptor -1 in syscall read()
==803== Conditional jump or move depends on uninitialised value(s)
==803==    at 0x109DB6: sysSort (in /mnt/c/Users/Czaro/ubuntu/program2)
==803==    by 0x1095A7: main (in /mnt/c/Users/Czaro/ubuntu/program2)
==803==
==803== Warning: invalid file descriptor -1 in syscall write()
==803== Warning: invalid file descriptor -1 in syscall read()
==803== Warning: invalid file descriptor -1 in syscall read()
==803== Warning: invalid file descriptor -1 in syscall write()
==803== Warning: invalid file descriptor -1 in syscall read()
==803== Warning: invalid file descriptor -1 in syscall read()
==803== Warning: invalid file descriptor -1 in syscall write()
==803== Warning: invalid file descriptor -1 in syscall read()
==803== Warning: invalid file descriptor -1 in syscall read()
==803== Warning: invalid file descriptor -1 in syscall write()
==803== Warning: invalid file descriptor -1 in syscall read()
==803== Warning: invalid file descriptor -1 in syscall read()
==803== Warning: invalid file descriptor -1 in syscall write()
==803== Warning: invalid file descriptor -1 in syscall read()
==803== Warning: invalid file descriptor -1 in syscall read()
==803== Warning: invalid file descriptor -1 in syscall write()
==803== Warning: invalid file descriptor -1 in syscall read()
==803== Warning: invalid file descriptor -1 in syscall read()
==803== Warning: invalid file descriptor -1 in syscall write()
==803== Warning: invalid file descriptor -1 in syscall read()
==803== Warning: invalid file descriptor -1 in syscall read()
==803== Warning: invalid file descriptor -1 in syscall write()
==803== Warning: invalid file descriptor -1 in syscall read()
==803== Warning: invalid file descriptor -1 in syscall read()
==803== Warning: invalid file descriptor -1 in syscall write()
==803==
==803== HEAP SUMMARY:
==803==     in use at exit: 0 bytes in 0 blocks
==803==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==803==
==803== All heap blocks were freed -- no leaks are possible
==803==
==803== Use --track-origins=yes to see where uninitialised values come from
==803== For lists of detected and suppressed errors, rerun with: -s
==803== ERROR SUMMARY: 9 errors from 1 contexts (suppressed: 0 from 0)

对于图书馆:

==804== Memcheck, a memory error detector
==804== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==804== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==804== Command: ./program2 sort ZZZ 10 10 lib
==804==
==804== Invalid read of size 4
==804==    at 0x48E24A7: fseek (fseek.c:35)
==804==    by 0x10A014: libSort (in /mnt/c/Users/Czaro/ubuntu/program2)
==804==    by 0x1095F6: main (in /mnt/c/Users/Czaro/ubuntu/program2)
==804==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==804==
==804==
==804== Process terminating with default action of signal 11 (SIGSEGV)
==804==  Access not within mapped region at address 0x0
==804==    at 0x48E24A7: fseek (fseek.c:35)
==804==    by 0x10A014: libSort (in /mnt/c/Users/Czaro/ubuntu/program2)
==804==    by 0x1095F6: main (in /mnt/c/Users/Czaro/ubuntu/program2)
==804==  If you believe this happened as a result of a stack
==804==  overflow in your program's main thread (unlikely but
==804==  possible), you can try to increase the size of the
==804==  main thread stack using the --main-stacksize= flag.
==804==  The main thread stack size used in this run was 8388608.
==804==
==804== HEAP SUMMARY:
==804==     in use at exit: 0 bytes in 0 blocks
==804==   total heap usage: 1 allocs, 1 frees, 472 bytes allocated
==804==
==804== All heap blocks were freed -- no leaks are possible
==804==
==804== For lists of detected and suppressed errors, rerun with: -s
==804== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Segmentation fault

【问题讨论】:

  • 你可以通过 valgrind 运行文件来检查它在哪里做了不应该做的事情。
  • 我做了,粘贴了我在帖子中得到的内容,以前从未使用过 valgrind...
  • if((file = open(path, O_RDWR)) == NULL){ 您在这里混淆了文件指针和文件描述符。 open() 返回一个 int,而不是指针。
  • 另外,sysSort() 打开文件两次(或至少尝试这样做),并关闭它零次。每次调用最多会泄漏两个打开的文件描述。
  • 另一方面,libSort() 不检查打开文件是否成功。 sysSort() 结果表明打开 不会 成功,这可以解释为什么你会在 libSort() 中遇到段错误(变量 fileNULL,我想你会发现)。

标签: c file sorting


【解决方案1】:

sysSort()libSort() 的问题似乎是它们没有成功打开文件。 open() 在此事件中返回 -1(不是 NULL),这与您在这种情况下收到的所有警告相匹配。由于sysSort() 无法打开文件,libSort() 似乎也无法打开它,但该函数不会测试打开是否成功。如果它失败了,libSort() 对结果空指针的使用会产生未定义的行为,段错误是完全合理的表现。

我对根本原因的猜测是创建文件时未正确设置文件的权限。这可能是由于 generate() 中的错误 open() 调用引起的:

    int start = open(path,O_CREAT| O_TRUNC | O_WRONLY | S_IRUSR | S_IWUSR);

当您在打开标志中包含O_CREAT 时,您必须传递附加参数,指定在创建新文件时使用的初始模式。相反,您将ORing 模式位放入标志中,这是不正确的。这将解决该特定问题,并且我怀疑它将允许两个排序功能随后成功打开文件:

    int start = open(path, O_CREAT| O_TRUNC | O_WRONLY, S_IRUSR | S_IWUSR);

【讨论】:

  • 我将标志更改为 O_CREAT| O_TRUNC | O_RDWR 但 id 没有发生任何事情。现在函数打开一个文件,但是,当我有 5 行 5 个字母时,libSort 用一条记录再生成 100 行,sysSort 不会产生任何输出
  • 很抱歉你的程序还不能正常工作,@krysznys,但很明显你问的问题已经解决了,因为libSort() 现在可以修改文件了。
  • 感谢您的帮助,我会将其标记为已批准的答案! :D
猜你喜欢
  • 2014-05-18
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2012-02-24
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多