C ++ - 检查字符串中数组中的所有值答案

【问题标题】：C++ - checking a string for all values in an arrayC ++ - 检查字符串中数组中的所有值
【发布时间】：2020-07-24 23:24:10
【问题描述】：

我有一些来自 Vision API 的解析文本，我正在使用关键字对其进行过滤，如下所示：

    if (finalTextRaw.find("File") != finalTextRaw.npos)
{
    LogMsg("Found Menubar");
}

例如，如果在字符串finalTextRaw 中的任何位置找到关键字“文件”，则该函数被中断并打印一条日志消息。

这种方法非常可靠。但是我只是以这种方式制作了一堆 if-else-if 语句，效率低下，而且当我发现更多需要过滤的单词时，我宁愿效率更高一些。相反，我现在从配置文件中获取一个字符串，然后将该字符串解析为一个数组：

    string filterWords = GetApp()->GetFilter();
    std::replace(filterWords.begin(), filterWords.end(), ',', ' ');  ///replace ',' with ' '
    vector<int> array;
    stringstream ss(filterWords);
    int temp;
    while (ss >> temp)
        array.push_back(temp); ///create an array of filtered words

而且我希望只有一个if 语句用于根据数组检查该字符串，而不是其中许多语句用于根据我必须在代码中手动指定的每个关键字检查字符串。像这样的：

        if (finalTextRaw.find(array) != finalTextRaw.npos)
{
    LogMsg("Found filtered word");
}

当然，该语法不起作用，而且肯定比这更复杂，但希望您明白：如果我的数组中的任何单词出现在该字符串的任何位置，则应忽略该字符串并打印一条日志消息而是。

有什么想法可以设计出这样的功能吗？我猜这需要某种循环。

【问题讨论】：

你知道过滤器的字数吗？在原始文本中？
查看en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm

标签： c++ find

【解决方案1】：

借用 Thomas 的回答，范围内的 for 循环提供了一个简洁的解决方案：

for (const auto &word : words)
{
   if (finalTextRaw.find(word) != std::string::npos)
   {
        // word is found.
        // do stuff here or call a function.
        break;  // stop the loop.
   }
}

【讨论】：

【解决方案2】：

正如 Thomas 所指出的，最有效的方法是将两个文本拆分为一个单词列表。然后使用std::set_intersection 在两个列表中查找匹配项。您可以使用std::vector，只要它已排序。你最终得到O(n*log(n))（n = max words），而不是O(n*m)。

将句子拆分为单词：

auto split(std::string_view sentence) {
    auto result = std::vector<std::string>{};
    auto stream = std::istringstream{sentence.data()};    

    std::copy(std::istream_iterator<std::string>(stream),
              std::istream_iterator<std::string>(), std::back_inserter(result));

    return result;
}

查找两个列表中都存在的单词。这仅适用于排序列表（如集合或手动排序的向量）。

auto intersect(std::vector<std::string> a, std::vector<std::string> b) {
    std::sort(a.begin(), a.end());
    std::sort(b.begin(), b.end());

    auto result = std::vector<std::string>{};
    std::set_intersection(std::move_iterator{a.begin()},
                          std::move_iterator{a.end()}, 
                          b.cbegin(), b.cend(),
                          std::back_inserter(result));

    return result;
}

使用示例。

int main() {
    const auto result = intersect(split("hello my name is mister raw"),
                                  split("this is the final raw text"));

    for (const auto& word: result) {
      // do something with word
    }
}

请注意，这在处理大量或未定义的单词时很有意义。如果您知道限制，您可能希望使用更简单的解决方案（由其他答案提供）。

【讨论】：

【解决方案3】：

您可以使用基本的蛮力循环：

unsigned int quantity_words = array.size();
for (unsigned int i = 0; i < quantity_words; ++i)
{
   std::string word = array[i];
   if (finalTextRaw.find(word) != std::string::npos)
   {
        // word is found.
        // do stuff here or call a function.
        break;  // stop the loop.
   }
}

上述循环获取数组中的每个单词并在finalTextRaw 中搜索该单词。

使用一些std 算法有更好的方法。我会把它留给其他答案。

编辑 1：地图和关联
上面的代码让我很困扰，因为通过finalTextRaw 字符串的次数太多了。

这是另一个想法：

使用finalTextRaw 中的字词创建std::set。
对于array 中的每个单词，检查集合中是否存在。这会减少搜索量（就像搜索一棵树一样）。

您还应该研究在array 中创建一组单词并找到这两组之间的交集。

【讨论】：

啊，我好像收到了Error C2440 'initializing': cannot convert from '_Ty' to 'std::basic_string<char,std::char_traits<char>,std::allocator<char>>'。我认为在这种情况下这意味着我的array 实际上没有被定义？我想我不能确定我的第一个代码块首先正确地生成了数组。
@Chopin 我们需要查看您的实际代码来回答这个问题。