All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

 

原题链接:https://oj.leetcode.com/problems/repeated-dna-sequences/

 

straight-forward method(TLE)

算法分析

直接字符串匹配;设计next数组,存字符串中每个字母在其中后续出现的位置;遍历时以next数组为起始。

 

简化考虑长度为4的字符串

 

case1:

src A C G T A C G T

next [4] [5] [6] [7] [-1] [-1] [-1] [-1]

 

那么匹配ACGT字符串的过程,匹配next[0]之后的3位字符即可

 

case2:

src A C G T A A C G T

next [4] [5] [6] [7] [5] [-1] [-1] [-1] [-1]

 

多个A字符后继,那么需要匹配所有后继,匹配next[0]不符合之后,还要匹配next[next[0]]

 

case3:

src A A A A A A

next [1] [2] [3] [4] [5] [-1]

 

重复的情况,在next[0]匹配成功时,可以把next[next[0]]置为-1,即以next[0]开始的长度为4的字符串已经成功匹配过了,无需再次匹配了;当然这么做只能减少重复的情况,并不能消除重复,因此仍需要使用一个set存储匹配成功的结果,方便去重

 

时间复杂度

构造next数组的复杂度O(n^2),遍历的复杂度O(n^2);总时间复杂度O(n^2)

 

代码实现

 1 #include <string>
 2 #include <vector>
 3 #include <set>
 4 
 5 class Solution {
 6 public:
 7     std::vector<std::string> findRepeatedDnaSequences(std::string s);
 8 
 9     ~Solution();
10 
11 private:
12     std::size_t* next;
13 };
14 
15 std::vector<std::string> Solution::findRepeatedDnaSequences(std::string s) {
16     std::vector<std::string> rel;
17 
18     if (s.length() <= 10) {
19         return rel;
20     }
21 
22     next = new std::size_t[s.length()];
23 
24     // cal next array
25     for (int pos = 0; pos < s.length(); ++pos) {
26         next[pos] = s.find_first_of(s[pos], pos + 1);
27     }
28 
29     std::set<std::string> tmpRel;
30 
31     for (int pos = 0; pos < s.length(); ++pos) {
32         std::size_t nextPos = next[pos];
33         while (nextPos != std::string::npos) {
34             int ic = pos;
35             int in = nextPos;
36             int count = 0;
37             while (in != s.length() && count < 9 && s[++ic] == s[++in]) {
38                 ++count;
39             }
40             if (count == 9) {
41                 tmpRel.insert(s.substr(pos, 10));
42                 next[nextPos] = std::string::npos;
43             }
44             nextPos = next[nextPos];
45         }
46     }
47 
48     for (auto itr = tmpRel.begin(); itr != tmpRel.end(); ++itr) {
49         rel.push_back(*itr);
50     }
51 
52     return rel;
53 }
54 
55 Solution::~Solution() {
56     delete [] next;
57 }
View Code

相关文章:

  • 2021-12-31
  • 2021-10-06
  • 2022-02-24
  • 2021-11-28
  • 2021-09-26
猜你喜欢
  • 2022-12-23
  • 2022-12-23
  • 2022-12-23
  • 2021-07-27
  • 2021-10-03
相关资源
相似解决方案