KMP

Introduction

KMP算法的关键是利用匹配失败后的信息,尽量减少模式串与主串的匹配次数以达到快速匹配的目的。具体实现就是实现一个next()函数,函数本身包含了模式串的局部匹配信息。时间复杂度O(m+n)。

理解

  1. next数组的含义:表示了匹配串的最长前缀的位置
    • abcjkdabc的最长相同前后缀是abc,这意味着在最后一个字符处出现了不匹配可以直接从j处开始新的匹配。
    • 这样就能避免很多的重复匹配。

实现

package ca.mcmaster.chapter.five.string;
public class Kmp {
    // next array represents the index of pattern when current failure exists.
	private static int[] getNext(String pattern){
		char[] arr = pattern.toCharArray();
		int len = pattern.length();
		int[] next = new int[len];  // Create a next array used to hold all indexes when current character is failed.
		next[0] = -1;   // For the first character, we define -1 since if first character is wrong, then we have no where to go.
		for(int i = 1, j = -1; i < len; i++){ 
		// i represents the index of character in pattern string. next[i] saves the value when i index isn't correct, the cursor won't go back to original position of pattern string but go to next[i] position of pattern and check from that position.
			while(j > -1 && arr[j + 1] != arr[i]){ 
			// j > -1 means we have reached the the begining of pattern.
			// arr[j + 1] != arr[i] means current character doesn't match index at i, we will try to go back to the index saved in next.
				j = next[j];
			}
			if(arr[i] == arr[j + 1]) j++;   // Check the reason of breaking the while loop, current is because we find same pattern.
			next[i] = j;    // Save the pattern.
		}
		return next;
	}
	public static int kmp(String text, String pattern){
		int[] next = getNext(pattern);
		char[] tArr = text.toCharArray();
		char[] pArr = pattern.toCharArray();
		int tLen = text.length();
		int pLen = pattern.length();
		int j = -1;
		for(int i = 0; i < tLen; i++){
			while(j > -1 && tArr[i] != pArr[j + 1]){
				j = next[j];
			}
			if(tArr[i] == pArr[j + 1]) j++;
			if(j == pLen - 1) return i - pLen + 1;
		}
		return -1;
	}
	public static void main(String[] args) {
		String a = "ababaca";
		System.out.println(kmp("bacbababadababacambabacaddababacasdsd", a));
	}
}