给定一个单词和一个文本，返回文本中单词的字谜出现次数[重复]答案

【问题标题】：Given a word and a text, return the count of the occurrences of anagrams of the word in the text [duplicate]给定一个单词和一个文本，返回文本中单词的字谜出现次数[重复]
【发布时间】：2013-10-10 12:05:49
【问题描述】：

例如。单词是for，文本是forxxorfxdofr，for 的字谜将是ofr、orf、fro 等。所以对于这个特定的例子，答案是3。

这是我想出的。

#include<iostream>
#include<cstring>

using namespace std;

int countAnagram (char *pattern, char *text)
{
    int patternLength = strlen(pattern);
    int textLength = strlen(text);

    int dp1[256] = {0}, dp2[256] = {0}, i, j;

    for (i = 0; i < patternLength; i++)
    {
        dp1[pattern[i]]++;
        dp2[text[i]]++;
    }

    int found = 0, temp = 0;

    for (i = 0; i < 256; i++)
    {
        if (dp1[i]!=dp2[i])
        {
            temp = 1;
            break;
        }
    }

    if (temp == 0)
        found++;


    for (i = 0; i < textLength - patternLength; i++)
    {
        temp = 0;
        dp2[text[i]]--;
        dp2[text[i+patternLength]]++;
        for (j = 0; j < 256; j++)
        {
            if (dp1[j]!=dp2[j])
            {
                temp = 1;
                break;
            }
        }
        if (temp == 0)
            found++;
    }
    return found;
}


int main()
{
    char pattern[] = "for";
    char text[] = "ofrghofrof";

    cout << countAnagram(pattern, text);

}

是否存在针对上述问题的更快算法？

【问题讨论】：

是否允许重叠字谜？如果文字是frof，答案是1还是2？
在这种情况下答案是 2。
我可能没抓住重点，但你为什么不直接在字符串中搜索“for”的字谜呢？你的逻辑似乎很复杂
我不确定您提出的是什么算法。但是您的算法似乎具有时间复杂度 O(NM)，其中 N = 文本长度，M = 模式长度。
在网上搜索“c++ anagramefficient”

标签： c++ algorithm

【解决方案1】：

大部分时间都会花在搜索上，所以为了让算法更省时，目标是减少搜索量或优化搜索。

方法一：搜索起始位置表。

创建一个列表向量，每个字母对应一个向量槽。这可以在以后进行空间优化。

每个插槽都将包含一个文本索引列表。

示例文字：forxxorfxdofr

Slot  List  
'f'    0 --> 7 --> 11  
'o'    1 --> 5 --> 10  
'r'    2 --> 6 --> 12

对于每个单词，查找向量中的字母以获取文本索引列表。对于列表中的每个索引，将列表项中的文本字符串位置与单词进行比较。

所以对于上表和单词“ofr”，第一次比较发生在索引 1，第二次比较发生在索引 5，最后一次比较发生在索引 10。

您可以消除文本索引的近端（索引 + 字长 > 文本长度）。

【讨论】：

【解决方案2】：

如果要转换的模式非常短，以至于搜索它的最佳方法是简单地扫描它，则此算法相当有效。为了允许更长的模式，这里由“for jj”和“for mm”循环表示的扫描可以用更复杂的搜索技术代替。

// sLine -- string to be searched
// sWord -- pattern to be anagrammed
// (in this pseudo-language, the index of the first character in a string is 0)
// iAnagrams -- count of anagrams found

iLineLim = length(sLine)-1
iWordLim = length(sWord)-1

// we need a 'deleted' marker char that will never appear in the input strings
chNil = chr(0)

iAnagrams = 0 // well we haven't found any yet have we
// examine every posn in sLine where an anagram could possibly start
for ii from 0 to iLineLim-iWordLim do {
  chK = sLine[ii]
  // does the char at this position in sLine also appear in sWord
  for jj from 0 to iWordLim do {
    if sWord[jj]=chK then { 
      // yes -- we have a candidate starting posn in sLine

      // is there an anagram of sWord at this position in sLine
      sCopy = sWord // make a temp copy that we will delete one char at a time
      sCopy[jj] = chNil // delete the char we already found in sLine
      // the rest of the anagram would have to be in the next iWordLim positions
      for kk from ii+1 to ii+iWordLim do {
        chK = sLine[kk]
        cc = false
        for mm from 0 to iWordLim do { // look for anagram char
          if sCopy[mm]=chK then { // found one
            cc = true
            sCopy[mm] = chNil // delete it from copy
            break // out of 'for mm'
          }
        }
        if not cc then break // out of 'for kk' -- no anagram char here
      }
      if cc then { iAnagrams = iAnagrams+1 }

      break // out of 'for jj'
    }
  }
}

-阿尔。

【讨论】：

-1。这个问题被标记为c++，虽然这肯定不是C++。

【解决方案3】：

您可以使用乘法的交换性以及原始分解的唯一性。这依赖于我之前的回答here

创建从每个字符到素数列表的映射（尽可能小）。例如a-->2, b-->3, c-->5 等等。这可以保存在一个简单的数组中。

现在，将给定的单词转换为匹配其每个字符的素数的乘积。此结果将等于该单词的任何字谜的类似乘法。

现在扫描数组，在任何给定的步骤中，保持与最后 L 个字符匹配的素数相乘（其中 L 是单词的长度）。所以每次你前进时，你都会这样做

mul = mul * char2prime(text[i]) / char2prime(text[i-L])

只要这个乘法等于你的单词的乘法 - 增加总计数器，你就完成了

请注意，这种方法适用于短词，但素数乘法会很快溢出 64b var（约 9-10 个字母），因此您必须使用大量数学库来支持较长的词.

【讨论】：

为什么不为搜索词中的每个字母使用 2，否则使用 3？这样，您可以在字符串中包含更多字母（最多 log_3(2^31-1) 个字母）。一个更有效的（几乎没有限制）将只是使用一个数组来存储最后一个 L 字符中每个字母的计数。
@justhalf，这不能确保唯一标识。如果搜索词是for，您还将点击ffr 或rro 等。至于使用数组 - 有可能，尽管有额外的空间，以及在每次迭代时比较数组值 - 这个我认为解决方案应该具有更好的复杂性，而且无论如何 - 它是不同的:)
啊，我忘了。关于数组，由于数组大小只有 26（或 52），我认为这并不重要。 =)