字符串和指针答案

【问题标题】：Strings & Pointers字符串和指针
【发布时间】：2011-03-16 04:33:05
【问题描述】：

我有一个关于字符串和指针的问题。请仅使用 C/C++ 程序进行说明......

有一个文件，每行包含 1 个单词。我知道没有。文件中的单词。请在小代码的帮助下解释如何有效地将这些单词存储在 RAM 中。

是fscanf(fp,"%s",word) & strcpy ，将单词存储在 RAM 中的唯一方法...没有其他有效的算法或逻辑可用..

谢谢。

【问题讨论】：

内存 ?程序通常关注堆栈或堆。
你想要 C 还是 C++？它们是两种不同的语言。您标记了 C，所以我假设您对 C++ 的答案并不真正感兴趣。

标签： c algorithm data-structures

【解决方案1】：

可能最有效的方法是在一个块中将整个文件读入内存（使用fread）。然后分配一个指针数组，每个字一个。然后遍历内存中的文件，将 \n 字符更改为 \0 并将指针存储在数组中每个 \0 之后的字符。

它是高效的，因为它只执行一次 I/O 操作，两次内存分配，并循环文件中的字符两次（一次将它们复制到缓冲区，再一次将它们分解为单独的字符串）。您描述的算法（fscanf 和strcpy）将执行许多 I/O 操作，为每个单词分配内存，并循环字符至少 3 次（一次读入缓冲区，一次查找长度分配内存，并从缓冲区复制一次到分配的内存中）。

这是一个没有错误检查的简单版本：

char* buffer; // pointer to memory that will store the file
char** words; // pointer to memory that will store the word pointers

// pass in FILE, length of file, and number of words
void readfile(FILE *file, int len, int wordcnt)
{
    // allocate memory for the whole file
    buffer = (char*) malloc(sizeof(char) * len);
    // read in the file as a single block
    fread(buffer, 1, size, file);

    // allocate memory for the word list
    words = (char**) malloc(sizeof(char*) * wordcnt);
    int found = 1, // flag indicating if we found a word
                   // (starts at 1 because the file begins with a word)
        curword = 0; // index of current word in the word list

    // create a pointer to the beginning of the buffer
    // and advance it until we hit the end of the buffer
    for (char* ptr = buffer; ptr < buffer + len; ptr++)
    {
        // if ptr points to the beginning of a word, add it to our list
        if (found)
            words[curword++] = ptr;
        // see if the current char in the buffer is a newline
        found = *ptr == '\n';
        // if we just found a newline, convert it to a NUL
        if (found)
            *ptr = '\0';
    }
}

这是使用strtok 的稍微简单的版本：

char* buffer;
char** words;

void readfile(FILE *file, int len, int wordcnt)
{
    buffer = (char*) malloc(sizeof(char) * len);
    fread(buffer, 1, size, file);
    buffer[len] = '\0';

    words = (char**) malloc(sizeof(char*) * wordcnt);
    int curword = 0;
    char* ptr = strtok(buffer, "\n");
    while (ptr != NULL)
    {
        words[curword++] = ptr;
        ptr = strtok(NULL, "\n");
    }
}

请注意，以上两个示例假设文件中的最后一个单词以换行符结尾！

【讨论】：

请记住，逐字节或逐块读取文件并没有太大的性能差异，因为操作系统无论如何都是逐块读取的。所以你的方法只是在编程方面增加了额外的工作，没有太大的收获。
这是一个很好的解决方案，但是当低端系统必须分配 256MB 左右的内存时可能会效率低下。
@AbiusX：I/O 调用的数量可能会对这样一个简单程序的运行时间产生相当大的影响。
@malfy：如果单词列表是 256MB，没有算法能够避免分配 256MB 来保存它，并且我的存储它以尽可能少的开销（不使用压缩）。唯一的区别是我的需要地址空间是连续的，但这对所有现代 VM CPU 来说都不是问题。
@AGeek：我为你评论了第一个例子。

【解决方案2】：

您可以将整个文件读入内存块，然后遍历该块，将每个 '\r' 或 '\n' 替换为 0。现在您只需在块中搜索紧跟一个或多个 0 的字符。这与您将获得的空间效率差不多。现在，如果您还想要快速访问，您可以分配另一个指针块，并将每个指针设置为指向字符串的开头。仍然比一个指针块更有效，每个指针都指向一个单独分配的字符串。

【讨论】：

【解决方案3】：

如果您希望您的字符串不消耗额外的未使用字节，请执行以下操作：

char * * array=new char*[COUNT_OF_WORDS];


fscanf(fp,"%s",word);
int len=strlen(word);
array[i]=new char[len+1];
strcpy(array[i],word);

【讨论】：

您使用的支持new char *[COUNT_OF_WORDS] 的C 编译器是什么？
请仅用 C/C++ 程序解释
@Chris：可能是 C++ 编译器！

【解决方案4】：

为什么是strcpy？只需 fscanf 直接进入目标内存即可。

【讨论】：

问题是你不知道为目标内存分配多少。 OP 可能正在考虑将fscanf 放入缓冲区，找出字符串的长度，为该字符串分配足够的内存，然后将strcpy 放入新分配的内存中。

【解决方案5】：

既然，你引用了——“请仅用 C/C++ 程序解释......”使用包含字符串的向量很容易 - std::vector< std::string >

std::string word;

std::vector < std::string > readWords ;  // A vector to hold the read words.

ifstream myfile ("fileToRead.txt");
if (myfile.is_open())
{
    while ( myfile.good() )
    {
       getline (myfile,word);  // This gets you the first word get copied to line.
       readWords.push_back(word) ; // Each read word is being copied to the vector
    }
    myfile.close();
}

所有读取的单词都复制到向量 readWords 中，您可以遍历它以查看它们实际上是什么。

【讨论】：

【解决方案6】：

这是一种快速而肮脏的方法，没有错误检查，使用静态内存和使用 fgets。

#define MAX_NUM_WORDS   10
#define MAX_LEN 128

void get_words(char *p_file, char *words)
{
  FILE *f;

  f = fopen(p_file, "r");
  while (fgets(words, MAX_LEN, f))
    words += MAX_LEN+1;

  fclose(f); 
}

main()
{
  char word_array[MAX_NUM_WORDS][MAX_LEN+1];
  int i;

  get_words("words.txt", word_array);

  for (i=0; i<MAX_NUM_WORDS; i++)
    printf("Word: %s", word_array[i]);
}

【讨论】：

我假设每个单词的最大长度也是已知的。如果不知道，那么我们不能使用静态mem，必须动态分配内存。