查找字符串中子字符串的计数答案

【问题标题】：find the count of substring in string查找字符串中子字符串的计数
【发布时间】：2012-02-21 13:48:27
【问题描述】：

我必须使用 C 语言查找字符串中子字符串的计数。我正在使用函数strstr，但它只找到第一次出现。

我对算法的想法类似于在字符串中搜索，而 strstr 不返回 null 并且在每个循环上子串主字符串。我的问题是如何做到这一点？

【问题讨论】：

【解决方案1】：

你可以这样做

int count = 0;
const char *tmp = myString;
while(tmp = strstr(tmp, string2find))
{
   count++;
   tmp++;
}

也就是说，当你得到一个结果时，从字符串的下一个位置重新开始搜索。

strstr() 不仅可以从字符串的开头开始工作，而且可以从任何位置开始。

【讨论】：

如果它们需要是不同的子字符串，你可以考虑count+=strlen(string2find)
编辑，我添加了针对 string2find="" 的问题的保护
@Dave，小心“”的无限循环
@Dave 和未来的读者，我相信你的意思是tmp += strlen(string2find)。在您的示例中，您将按字符串的长度增加出现次数！
如果您在“zzzz”中找到“zz”，它将返回 3 并且（使用 tmp++）我相信这是正确的答案，如果您执行类似 tmp += strlen(string2find) 之类的操作，这将只是返回 2。

【解决方案2】：

是否应该使用已处理的字符串部分？

例如，在foooo、2 或3中搜索oo 的预期答案是什么？

如果是后者（我们允许子串重叠，答案是三个），那么 Joachim Isaksson suggested 是正确的代码。

如果我们搜索不同的子字符串（答案应该是两个），那么请看下面的代码（以及在线示例here）：

char *str = "This is a simple string";
char *what = "is";

int what_len = strlen(what);
int count = 0;

char *where = str;

if (what_len) 
    while ((where = strstr(where, what))) {
        where += what_len;
        count++;
    }

【讨论】：

【解决方案3】：

使用KMP，你可以在 O(n) 中完成

int fail[LEN+1];
char s[LEN];
void getfail()
{
    //f[i+1]= max({j|s[i-j+1,i]=s[0,j-1],j!=i+1})
    //the correctness can be proved by induction
    for(int i=0,j=fail[0]=-1;s[i];i++)
    {
        while(j>=0&&s[j]!=s[i]) j=fail[j];
        fail[i+1]=++j;
        if (s[i+1]==s[fail[i+1]]) fail[i+1]=fail[fail[i+1]];//optimizing fail[]
    }
}

int kmp(char *t)// String s is pattern and String t is text!
{
    int cnt=0;
    for(int i=0,j=0;t.s[i];i++)
    {
        while(j>=0&&t.s[i]!=s[j]) j=fail[j];
        if (!s[++j])
        {
            j=fail[j];
            cnt++;
        }
    }
    return cnt;// how many times s appeared in t.
}

【讨论】：

【解决方案4】：

结果可能会有所不同，具体取决于您是否允许重叠：

// gcc -std=c99
#include <stdbool.h>
#include <stdio.h>
#include <string.h>

static int
count_substr(const char *str, const char* substr, bool overlap) {
  if (strlen(substr) == 0) return -1; // forbid empty substr

  int count = 0;
  int increment = overlap ? 1 : strlen(substr);
  for (char* s = (char*)str; (s = strstr(s, substr)); s += increment)
    ++count;
  return count;
}

int main() {
  char *substrs[] = {"a", "aa", "aaa", "b", "", NULL };
  for (char** s = substrs; *s != NULL; ++s)
    printf("'%s' ->  %d, no overlap: %d\n", *s, count_substr("aaaaa", *s, true),
       count_substr("aaaaa", *s, false));
}

Output

'a' ->  5, no overlap: 5
'aa' ->  4, no overlap: 2
'aaa' ->  3, no overlap: 1
'b' ->  0, no overlap: 0
'' ->  -1, no overlap: -1

【讨论】：

【解决方案5】：

假设s 和substr 非空且非空：

/* #times substr appears in s, no overlaps */
int nappear(const char *s, const char *substr)
{
    int n = 0;
    const char *p = s;

    size_t lenSubstr = strlen(substr);

    while (*p) {
        if (memcmp(p, substr, lenSubstr) == 0) {
            ++n;
            p += lenSubstr;
        } else 
            ++p;
    }
    return n;
}

【讨论】：

【解决方案6】：

/* 
 * C Program To Count the Occurence of a Substring in String 
 */
#include <stdio.h>
#include <string.h>

char str[100], sub[100];
int count = 0, count1 = 0;

void main()
{
    int i, j, l, l1, l2;

    printf("\nEnter a string : ");
    scanf("%[^\n]s", str);

    l1 = strlen(str);

    printf("\nEnter a substring : ");
    scanf(" %[^\n]s", sub);

    l2 = strlen(sub);

    for (i = 0; i < l1;)
    {
        j = 0;
        count = 0;
        while ((str[i] == sub[j]))
        {
            count++;
            i++;
            j++;
        }
        if (count == l2)
        {
            count1++;                                   
            count = 0;
        }
        else
            i++;
    }    
    printf("%s occurs %d times in %s", sub, count1, str);
}

【讨论】：

不要无缘无故使用全局变量。 void main 错误；应该是int main。 "%[^\n]s" 没有做你想做的事； s 不是 % 指令的一部分，需要输入文字 s。您没有指定输入的上限；这是潜在的缓冲区溢出。如果必须使用它，请始终检查 scanf 的返回值。不要使用scanf 进行用户输入。 strlen 返回 size_t，而不是 int。 while 条件中有多余的括号；虽然本身不是错误，但如果您将== 拼写为=，这会使gcc 给您的警告静音。
while 循环不检查字符串结尾，如果所有字符都匹配，则可以在str 和sub 的结尾处运行。 j 和 count 始终设置在一起；它们实际上是相同的变量。你的算法完全坏了：它没有找到例如"ab" 在"aab".
一般来说，避免发布仅包含代码的答案。对算法的描述或对您的答案与其他答案有何不同的解释会有所帮助。