All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
原题链接:https://oj.leetcode.com/problems/repeated-dna-sequences/
straight-forward method(TLE)
算法分析
直接字符串匹配;设计next数组,存字符串中每个字母在其中后续出现的位置;遍历时以next数组为起始。
简化考虑长度为4的字符串
case1:
src A C G T A C G T
next [4] [5] [6] [7] [-1] [-1] [-1] [-1]
那么匹配ACGT字符串的过程,匹配next[0]之后的3位字符即可
case2:
src A C G T A A C G T
next [4] [5] [6] [7] [5] [-1] [-1] [-1] [-1]
多个A字符后继,那么需要匹配所有后继,匹配next[0]不符合之后,还要匹配next[next[0]]
case3:
src A A A A A A
next [1] [2] [3] [4] [5] [-1]
重复的情况,在next[0]匹配成功时,可以把next[next[0]]置为-1,即以next[0]开始的长度为4的字符串已经成功匹配过了,无需再次匹配了;当然这么做只能减少重复的情况,并不能消除重复,因此仍需要使用一个set存储匹配成功的结果,方便去重
时间复杂度
构造next数组的复杂度O(n^2),遍历的复杂度O(n^2);总时间复杂度O(n^2)
代码实现
1 #include <string> 2 #include <vector> 3 #include <set> 4 5 class Solution { 6 public: 7 std::vector<std::string> findRepeatedDnaSequences(std::string s); 8 9 ~Solution(); 10 11 private: 12 std::size_t* next; 13 }; 14 15 std::vector<std::string> Solution::findRepeatedDnaSequences(std::string s) { 16 std::vector<std::string> rel; 17 18 if (s.length() <= 10) { 19 return rel; 20 } 21 22 next = new std::size_t[s.length()]; 23 24 // cal next array 25 for (int pos = 0; pos < s.length(); ++pos) { 26 next[pos] = s.find_first_of(s[pos], pos + 1); 27 } 28 29 std::set<std::string> tmpRel; 30 31 for (int pos = 0; pos < s.length(); ++pos) { 32 std::size_t nextPos = next[pos]; 33 while (nextPos != std::string::npos) { 34 int ic = pos; 35 int in = nextPos; 36 int count = 0; 37 while (in != s.length() && count < 9 && s[++ic] == s[++in]) { 38 ++count; 39 } 40 if (count == 9) { 41 tmpRel.insert(s.substr(pos, 10)); 42 next[nextPos] = std::string::npos; 43 } 44 nextPos = next[nextPos]; 45 } 46 } 47 48 for (auto itr = tmpRel.begin(); itr != tmpRel.end(); ++itr) { 49 rel.push_back(*itr); 50 } 51 52 return rel; 53 } 54 55 Solution::~Solution() { 56 delete [] next; 57 }