187. Repeated DNA Sequences
by Botao Xiao
Question
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: “ACGAATTCCG”. When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
Example:
Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
Output: ["AAAAACCCCC", "CCCCCAAAAA"]
Thinking:
- Method: bruto force, TLE, O(N ^ 2)
class Solution {
public List<String> findRepeatedDnaSequences(String s) {
List<String> result = new ArrayList<>();
if(s == null || s.length() <= 10) return result;
int len = s.length();
LABEL:
for(int i = 0; i <= len - 11; i++){
String cmp = s.substring(i, i + 10);
if(result.contains(cmp)) continue;
for(int j = i + 1; j <= len - 10; j++){
String cmp1 = s.substring(j, j + 10);
if(cmp.equals(cmp1)){
result.add(cmp);
continue LABEL;
}
}
}
return result;
}
}
- Method 2: 滑动窗口, HashSet, O(N)
- 将所有的可能都放在一个哈希集合(seen)中。
- 如果后面出现的子串已经出现在seen中,则加入repeat中。
class Solution {
public List<String> findRepeatedDnaSequences(String s) {
Set<String> seen = new HashSet<>(), repeat = new HashSet<>();
int len = s.length();
for(int i = 0; i <= len - 10; i++){
String sub = s.substring(i, i + 10);
if(!seen.add(sub)) repeat.add(sub);
}
return new ArrayList<String>(repeat);
}
}
二刷
- 最开始没有想到用两个Set,于是用了Map来记录出现的序列。
class Solution { public List<String> findRepeatedDnaSequences(String s) { List<String> result = new LinkedList<>(); Map<String, Integer> map = new HashMap<>(); if(s == null || s.length() < 10) return result; int len = s.length(); for(int i = 0; i <= len - 10; i++){ String sub = s.substring(i, i + 10); if(!map.containsKey(sub)){ map.put(sub, 1); }else if(map.get(sub) == 1){ result.add(sub); map.put(sub, map.get(sub) + 1); }else continue; } return result; } }
Subscribe via RSS