如何将数组中的单词更改为小写并按字母顺序对唯一单词进行排序答案

【问题标题】：How can I change the words in the array to lowercase and sort the unique words alphabetically如何将数组中的单词更改为小写并按字母顺序对唯一单词进行排序
【发布时间】：2021-12-25 21:59:58
【问题描述】：

我创建了一个程序，它必须对单词进行排序并搜索唯一的单词。它还应该计算这些单词在列表中出现的次数。唯一词列表和频率可以存储在动态数组中。该程序将索引列表（唯一单词列表）与出现频率一起保存在提示用户提供的数据文件中。我不能使用向量和任何现有的数据结构，例如列表类

这是一个问题：您必须开发满足以下规范的解决方案（软件）： Ø 程序提示用户输入存储文本的输入文本文件的名称。程序必须打印错误消息，以防打开文件时发生错误。程序必须读取单词并将它们存储到字符串数组中。标点字符必须被忽略。所有字母字符必须转换为小写字符以消除区分大小写。 Ø 程序必须对单词进行排序并搜索唯一的单词。也应该算这些单词在列表中出现的次数。唯一词列表和频率可以存储在动态数组中。 Ø 程序保存索引列表（唯一词列表），以及频率发生，在提示用户提供的数据文件中。程序必须数据存储在文件中后，在输出屏幕上打印确认消息。 Ø 程序必须在输出屏幕上打印索引列表。以下是有关您的解决方案的一些注意事项： Ø 程序必须以模块化方式设计。多个可重用的功能将被实现来解决问题（例如搜索字符串的函数，对数组元素进行排序的函数，将索引写入数组的函数输出文件，从输入文件中返回下一个单词的函数等）。 Ø 所有非字母字符都必须作为分隔符来分隔文本文件。 Ø 索引的大小（唯一词的总数未知编译时间）。必须使用动态内存分配来调整运行时的索引列表，根据需要。 Ø 静态数组维度应该以符号常量的形式给出。这样的定义应该用于声明数组。 Ø 将一维数组传递给函数时，将维度作为参数传递。将二维数组传递给函数时，将行维度作为参数传递。在里面在二维数组的情况下，必须使用符号常量作为列维度在函数定义的参数声明中。 Ø 不允许使用任何已有的数据结构（如列表类）解决方案。相反，您应该将索引创建为动态字符串数组（或动态二维字符数组）

这是我的代码：

#include <iostream>
#include <fstream>
#include <string>
#include <algorithm>
#include <iomanip>


#define SIZE 100


void findUnique();

using namespace std;

int main()
{
    string array[SIZE];
    int loop = 0;
    string line;
    string letter;
    ifstream file1;
    file1.open("readText.txt");
    if (file1.fail())
    {
        cerr << "error opening the file" << endl;
        exit(-1);
    }
    else if (file1.is_open()) //if the file is open
    {
        while (!file1.eof()) //while the end of file is NOT reached
        {
            file1 >> line;
            getline(file1, line); //get one line from the file

            line.erase(std::remove_if(line.begin(), line.end(), ispunct), line.end());
            array[loop] = line;
            cout << array[loop] << endl; //and output it
            loop++;

        }
        
    }

    findUnique();
    return (0);
    
}

void findUnique()
{

    string filename;
    cout << "Enter the name of the file" << endl;
    cin >> filename;
    ifstream file;
    file.open(filename);
    if (!file)
    {
        cout << "Error: Failed to open the file.";
    }
    else
    {
        string stringContents;
        int stringSize = 0;

        // find the number of words in the file
        while (file >> stringContents)
            stringSize++;

        // close and open the file to start from the beginning of the file
        file.close();
        file.open(filename);


        string* mainContents = new string[stringSize];   // dynamic array for strings found
        int* frequency = new int[stringSize];           // dynamic array for frequency
        int uniqueFound = 0;                            // no unique string found

        for (int i = 0; i < stringSize && (file >> stringContents); i++)
        {
            //remove trailing punctuations 
            while (stringContents.size() && ispunct(stringContents.back()))
                stringContents.pop_back();

            // process string found 
            bool found = false;
            for (int j = 0; j < uniqueFound; j++)
                if (mainContents[j] == stringContents) {  // if string already exist
                    frequency[j] ++;     // increment frequency 
                    found = true;
                }
            if (!found) {   // if string not found, add it !  
                mainContents[uniqueFound] = stringContents;
                frequency[uniqueFound++] = 1;   // and increment number of found
            }
        }
        // display results
        cout << "Word" << setw(20) << "Frequency\n";
        for (int i = 0; i < uniqueFound; i++)
        {
            cout << mainContents[i] << "\t\t" << frequency[i] << endl;
            ofstream file2;
            file2.open("writeText.txt");
            file2 << mainContents[i] << "\t\t" << frequency[i] << endl;
        }
    }
    
}

【问题讨论】：

你的程序太复杂了。使用std::map 可以轻松简化它。
另外，请在问题中包含示例输入及其预期输出。
当你遇到一个复杂的问题时，把它分成更小更简单的问题。这样做直到没有问题可以被分割。然后一一实现，最好使用类和函数。然后添加新的东西也会变得更容易，比如小写所有字符串（这是对std::transform的简单调用）或对数组进行排序。
对于排序，拥有两个独立的数组会使排序变得更加困难。如果您有一个结构或对（或字符串和整数）数组，那将很简单。或者，如果您有一个 std::map 建议，内容已按键排序。
如果文本是 unicode，则没有简单的解决方案。

标签： c++ arrays

【解决方案1】：

您可以使用std::map 简化您的程序，如下所示。下面的程序在 input.txt 文件中找到唯一的单词，并跟踪每个单词对应的计数。

#include <iostream>
#include <map>
#include <sstream>
#include <fstream>
int main() {
    
    //this map maps each word in the file to their respective count
    std::map<std::string, int> stringCount;
    std::string word, line;
    int count = 0;//this count the total number of words
    
    std::ifstream inputFile("input.txt");
    if(inputFile)
    {
        while(std::getline(inputFile, line))//go line by line
        {
            std::istringstream ss(line);
            while(ss >> word)//go word by word 
            {
                //increment the count 
                stringCount[word]++;
            }
        }
    }
    else 
    {
        std::cout<<"File cannot be opened"<<std::endl;
    }
    
    inputFile.close();
    
    std::cout<<"Total number of unique words are:"<<stringCount.size()<<std::endl;
    for(std::pair<std::string, int> pairElement: stringCount)
    {
        std::cout<<pairElement.first<<" : "<<pairElement.second<<std::endl;
      
    }
    return 0;
}

程序的输出可见here。

上面的程序打印文件中唯一单词的数量以及每个唯一单词对应的计数（频率）。

【讨论】：

我不允许使用任何现有的数据结构，例如列表类
@Bee 您在最初的问题中没有提到这个重要的事情。还有为什么不允许你使用std::map？
哦，是的，这是我的代码工作的唯一方式。如果没有，我真的不知道该怎么做
请在下面查看我的回答，它可以满足您的要求。它只使用纯代码。 . .

【解决方案2】：

我将向您展示一个对所有内容进行完全动态内存管理的解决方案。

即使是字符串。

所以，我不会使用任何库函数，也不会使用 C++ 容器。只是简单的代码。这需要很多辅助函数。 . .

动态内存管理将在 C++ 中使用 new 来完成。问题一直是我们可能不知道要提前分配的大小。

但这可以通过假定的初始数组大小来处理。如果我们发现这个数组大小不够，那么我们分配更大的新内存（通常是以前的两倍）并将所有元素从以前使用的内存复制到新内存。然后我们删除旧内存，并将新分配的内存重新分配给旧指针。

这会导致很多很多行的重复代码。因此，std::string 和 std::vector 被发明出来了。

但是。使用这种陈旧过时的方法是您的要求。现在甚至强烈不鼓励使用 new 和 delete 原始指针来拥有内存和 C 样式数组。在现实生活中，你永远不应该这样做。

所以，基本功能。我们从从文件中读取单词开始。我们将使用一个包含 2 个状态的循环。我们要么等待单词的开头，要么等待单词的结尾。这取决于我们从文件中读取的字符。如果我们处于找到单词开头的模式，那么我们将逐个字符地复制到一个新字符串中。

如果我们在一个单词的末尾，那么我们将刚刚读取的单词存储在我们的单词数组中，并等待下一个单词的开头。

我们会一直这样做，直到文件结束。

对于排序，我们实现了标准的冒泡排序方法，我们只会对指向字符串的指针进行排序，而不是对字符串本身进行排序。

为了获取单词的唯一性，我们使用排序数组。

在循环中，我们将检查当前单词是否与数组中的下一个单词相同。如果相等，则我们跳过，计算重复项并查找下一个单词。如果不相等，那么我们将最后一个单词存储在新的结果数组中。这将是独一无二的。

我们还将存储计数器。

所以。有许多可能的解决方案。请看下面其中之一。这是令人难以置信的 350 行代码，并且与 C++ 没有太大关系。无论如何：

#include <iostream>
#include <fstream>

// Some abbreviations
typedef char* String;
typedef String *StringArray;
typedef unsigned int *CounterArray;

// Convert a character to a lower case character
char lowerCase(char c) {
    if (c >= 'A' and c <= 'Z')
        c += ('a' - 'A');
    return c;
}
// Check if the character is considered to be a part of a word. 
// If you want also numbers and underscores, then uncomment the commented part
bool isWordCharacter(char c) {
    return (c >= 'A' and c <= 'Z') or ((c >= 'a' and c <= 'z')) /*or (c >= '0' and c <= '9') or (c == '_')*/;
}

// Simple comparison of Strings, like C-library function
int stringCompare(String s1, String s2)
{
    // Check, as long as it is equal
    while ((*s1 != '\0' and *s2 != '\0') and *s1 == *s2) {
        s1++; s2++;
    }
    // compare the mismatching character and return the result
    return (*s1 == *s2) ? 0 : (*s1 > *s2) ? 1 : -1;
}

// Get the length of a string. Like C-library function
unsigned int stringLength(String string) {
    unsigned int result = 0;
    while (*string++) ++result;
    return result;
}

// Duplicate a string
String createAndCopy(String string) {
    unsigned int length = stringLength(string)+1;
    String newString = new char[length];
    for (unsigned int k = 0; k < length; ++k)
        newString[k] = string[k];
    return newString;
}

// Get a filename from the user
String getFileName() {
    // Initial estimated size of string
    unsigned int stringSize = 32;
    String fileName = new char[stringSize+1]{};

    // Readuntil '\n'
    char c;
    unsigned int index = 0;
    while (std::cin.get(c) and c!='\n') {

        // Check, if we have enough space
        if (index >= stringSize) {

            // No, create char array with double the space than before
            stringSize *= 2;
            String temp = new char[stringSize+1] {};
            // Copy all data from string to temp
            for (unsigned int k = 0; k < index; ++k)
                temp[k] = fileName[k];

            // Delete old string
            delete[] fileName;
            // And reassign new one
            fileName = temp;
        }
        // Store the character 
        fileName[index++] = c;
    }
    // And the terminating 0
    fileName[index] = '\0';

    // Try to open the file
    std::ifstream ifs(fileName);

    // Check, if it could be opened
    if (not ifs) {
        // Could not be opened. Show error message   
        std::cerr << "\n\n*** Error: could not open file: '" << fileName << "'\n\n";

        // Delet alloocated memory
        delete[] fileName;
        // Indicate a bad result 
        fileName = nullptr;
    }
    return fileName;
}

// Rad all words from a stream to a dynamic array
unsigned int readWordsFromStreamToArray(std::ifstream* is, StringArray* stringArray) {

    const int InitialStringArraySize = 16u;
    const int InitialStringSize = 32u;

    // Define array of strings with initial array size and allocate memory
    unsigned int stringArraySize = InitialStringArraySize;
    unsigned int indexInStringArray = 0;
    *stringArray = new String[stringArraySize];

    // We have alocal string for which we will later allocate memory
    unsigned int stringSize = InitialStringSize;
    unsigned int indexInString = 0;
    String string{};

    // We have 2 states. Either we wait for the beginning of a word or for the end of a word
    bool waitForBeginOfWord{true};

    // As long as we ar in the condition to read characters
    bool readCharactersOK{ true };
    while (readCharactersOK) {
        
        // Read a character and check, if this was ok
        char c; is->get(c);
        readCharactersOK = (bool)(*is);

        // We are in one of 2 states. Wait for begin of word or wait for end of word
        if (waitForBeginOfWord) {
            // As long, as we do not find the begin of a new word
            if (readCharactersOK and isWordCharacter(c)) {
                // Got state "wait for end of word"
                waitForBeginOfWord = false;
                // Now, we have a character from a word. Create a new string
                string = new char[stringSize + 1]{};
                // And stat again with index 0
                indexInString = 0;
            }
        }

        // Are we in state wait for end of word?
        if (not waitForBeginOfWord) {
            // Do we have a vild character
            if (readCharactersOK and isWordCharacter(c)) {

                // Now, we have a letter that belongs to a word
                // We want to add this now to our string, but need too check,if it is big enough
                if (indexInString >= stringSize) {

                    // string is bigger than expected, allocate more memory. Double than before
                    stringSize *= 2;

                    String temp = new char[stringSize + 1];

                    // Copy old string to new temp string
                    for (unsigned k = 0; k < indexInString; ++k)
                        temp[k] = string[k];

                    // Free the memory of the old string
                    delete[] string;

                    // And make the temp string to our current string
                    string = temp;
                }
                string[indexInString++] = lowerCase(c);
            }
            else {
                // Now we are either at end of file or we have read a none-word character
                // Now, a word is read. Terminate string with a 0
                string[indexInString] = '\0';

                // We want to add the word to the string array.
                // First check, if there is still enough space
                if (indexInStringArray >= stringArraySize) {

                    // We need more memory
                    stringArraySize *= 2;

                    // Create a bigger array
                    StringArray temp = new String[stringArraySize];

                    // Copy all strings from the old array to this temporaray array
                    for (unsigned int k = 0; k < indexInStringArray; ++k)
                        temp[k] = (*stringArray)[k];

                    // Delete old memory
                    delete[] (*stringArray);

                    // And assign newly created memory
                    *stringArray = temp;
                }
                // Store next word in array
                (*stringArray)[indexInStringArray++] = string;

                // Next time, we need to wait for the begin of a word again.
                waitForBeginOfWord = true;
            }
        }
    } // Return number of words
    return indexInStringArray;
}
// Standard buuble sort algorithm. Only pointers will be exchanged
void bubbleSort(StringArray* stringArray, unsigned int numberOfWords) {

    // Check whether we still need to sort
    bool sorted = false; 

    // Abbreviation
    StringArray ptr = *stringArray;

    // As long as we need to sort
    while (!sorted) // repeat until no more swaps
    {
        sorted = true; // Assume everything sorted
        for (unsigned int j = 0; j < numberOfWords-1; j++) 
        {
            if (stringCompare(*(ptr + j),*(ptr + j + 1))==1) 
            {
                // Swap 2 pointers
                String temp = *(ptr + j);
                *(ptr + j) = *(ptr + j + 1);
                *(ptr + j + 1) = temp;
                // we swapped, so keep sorting
                sorted = false; 
            }
        }
    }
}

// Get unique words from a sorted array of words
unsigned int makeUuniqueAndCount(StringArray* stringArray, unsigned int numberOfWords, StringArray* uniqueStringArray, CounterArray* counterArray) {

    // first allocate memory for the resulting array
    *uniqueStringArray = new String[numberOfWords]{};
    *counterArray = new unsigned int[numberOfWords] {};

    // Indices for the resulting arrays
    unsigned int uniqueStringArrayIndex = 0;

    // Here we count the frequency of the words. A words always exists at leastr once
    unsigned int wordCounter = 1;

    // For all words in the original array
    for (unsigned int k=0; k < numberOfWords - 1; ++k) {

        // List is sorted. So, 2 identical words would follow. Check this
        if (stringCompare((*stringArray)[k], (*stringArray)[k + 1]) != 0) {

            // Differentword found. Duplicate search for this word is over
            // Create a new string and copy old word to new array
            String s = createAndCopy((*stringArray)[k]);
            (*uniqueStringArray)[uniqueStringArrayIndex] = s;

            // Store the word counter for this word
            (*counterArray)[uniqueStringArrayIndex] = wordCounter;

            // We start now to count from the beginning
            wordCounter = 1;
            ++uniqueStringArrayIndex;
        }
        else {
            // Duplicate word found, increase word counter
            ++wordCounter;
        }
    }
    // And now, the original allocated array for the duplicate words are too big.
    // Allocate real size and recopy.
    if (uniqueStringArrayIndex != numberOfWords) {

        // Get new, exact fitting temp array
        StringArray temp1 = new String[uniqueStringArrayIndex];
        // Copy all words into temp
        for (unsigned int k = 0; k < uniqueStringArrayIndex; ++k)
            temp1[k] = (*uniqueStringArray)[k];
        // delete olf content
        delete[](*uniqueStringArray);
        // And reassign
        (*uniqueStringArray) = temp1;

        // Get new, exact fitting temp array
        CounterArray temp2 = new unsigned int[uniqueStringArrayIndex];
        // Copy all words into temp
        for (unsigned int k = 0; k < uniqueStringArrayIndex; ++k)
            temp2[k] = (*counterArray)[k];
        // delete olf content
        delete[](*counterArray);
        // And reassign
        (*counterArray) = temp2;
    }
    return uniqueStringArrayIndex;
}

int main() {

    // Get a file name
    String fileName = getFileName();

    // If that worked and we got a valid file name
    if (fileName) {

        // Try to open the file
        std::ifstream ifs(fileName);

        // If that worked
        if (ifs) {
          
            // Define our arrays
            StringArray stringArray{};
            unsigned int numberOfWords = readWordsFromStreamToArray(&ifs, &stringArray);

            // Show result
            std::cout << "\n\nRaw word list:--------------------------------------------------\n";
            for (unsigned int i = 0; i < numberOfWords; ++i) {
                std::cout << i + 1 << '\t' << stringArray[i] << '\n';
            }

            // Sort
            bubbleSort(&stringArray, numberOfWords);
            std::cout << "\n\nSorted word list:--------------------------------------------------\n";
            // Sow result
            for (unsigned int i = 0; i < numberOfWords; ++i) {
                std::cout << i + 1 << '\t' << stringArray[i] << '\n';
            }

            // Getting unique strings and count
            StringArray uniqueStringArray{};
            CounterArray counterArray{};
            unsigned int numberOfUniqes = makeUuniqueAndCount(&stringArray, numberOfWords, &uniqueStringArray, &counterArray);
            // Show result
            std::cout << "\n\nUnique word list and count:--------------------------------------------------\n";
            for (unsigned int i = 0; i < numberOfUniqes; ++i) {
                std::cout << i + 1 << '\t' << uniqueStringArray[i] << "\t --> " << counterArray[i] << '\n';
            }

            // Delete all dynamically allocated memory
            for (unsigned int k = 0; k < numberOfWords; ++k) {
                delete[] stringArray[k];
            }
            for (unsigned int k = 0; k < numberOfUniqes; ++k) {
                delete[] uniqueStringArray[k];
            }
            delete[] stringArray;
            delete[] uniqueStringArray;
            delete[] counterArray;
        }
        delete[] fileName;
    }
}

在 C++ 中，你可以像下面这样：

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <iterator>
#include <map>
#include <regex>
#include <algorithm>
#include <cctype>

const std::regex re{ R"(\w+)" };

int main() {
   
    // Tell user what to do: Input a file name
    std::cout << "Please eneter a filename:\n";
    // Read the filename
    if (std::string fileName{}; std::getline(std::cin, fileName)) {

        // Open the file, and check, if it could be opened
        if (std::ifstream inputFileStream{ fileName }; inputFileStream) {

            // Read the complete file into a string
            std::string data(std::istreambuf_iterator<char>(inputFileStream), {});

            // Make everything lowe case
            std::transform(data.begin(), data.end(), data.begin(), [](char c) {return (char)std::tolower(c); });

            // Get all words from the string
            std::vector<std::string> words(std::sregex_token_iterator(data.begin(), data.end(), re), {});

            // Define a counter for the words
            std::map<std::string, size_t> counter{};

            // Show them and count them
            std::cout << "\n\n\nWord list:\n\n";
            for (const std::string& word : words) {
                std::cout << word << '\n';
                counter[word] ++;
            }

            // Show sorted unique list with counts
            std::cout << "\n\n\nUniqe counted word list:\n\n";
            for (const auto& [word, count] : counter)
                std::cout << word << "\t --> " << count << '\n';
        }
        else std::cerr << "\n\n*** Error. Could not open file '" << fileName << "'\n\n";
    }
}

【讨论】：