Programming, Algorithm and Technical Tips: A More Efficient Algorithm for Longest Increasing Subsequence

Monday, November 14, 2011

A More Efficient Algorithm for Longest Increasing Subsequence

Problem:
Given an array of numbers or a string of length \(n\), find a longest increasing subsequence.
In my previous post, I discussed the dynamic programming approach with time complexity \(O(n^2)\). For this specific problem, we can have a more efficient algorithm which is also based on dynamic programming. Please see Wikipedia for reference.

The algorithm uses only arrays and binary searching. It processes the sequence elements in order, maintaining the longest increasing subsequence found so far. Denote the sequence values as \(X[1]\), \(X[2]\), etc. Then, after processing \(X[i]\), the algorithm will have stored values in two arrays:

\(M[j]\) — stores the position \(k\) of the smallest value \(X[k]\) such that there is an increasing subsequence of length \(j\) ending at \(X[k]\) on the range \(k ≤ i\) (note we have \(j ≤ k ≤ i\) here).

\(P[k]\) — stores the position of the predecessor of \(X[k]\) in the longest increasing subsequence ending at \(X[k]\).

In addition the algorithm stores a variable \(L\) representing the length of the longest increasing subsequence found so far.

Note that, at any point in the algorithm, the sequence \[X[M[1]], X[M[2]], ..., X[M[L]]\] is nondecreasing. For, if there is an increasing subsequence of length \(i\) ending at \(X[M[i]]\), then there is also a subsequence of length \(i-1\) ending at a smaller value: namely the one ending at \(X[P[M[i]]]\). Thus, we may do binary searches in this sequence in logarithmic time.

The algorithm, then, proceeds as follows.

L = 0
for i = 1, 2, ... n:
   binary search for the largest positive j ≤ L 
     such that X[M[j]] < X[i] (or set j = 0 if no such value exists)
   P[i] = M[j]
   if j == L or X[i] < X[M[j+1]]:
      M[j+1] = i
      L = max(L, j+1)

The result of this is the length of the longest sequence in \(L\). The actual longest sequence can be found by backtracking through the \(P\) array: the last item of the longest sequence is in \(X[M[L]]\), the second-to-last item is in \(X[P[M[L]]]\), etc. Thus, the sequence has the form \[..., X[P[P[M[L]]]], X[P[M[L]]], X[M[L]].\]
Because the algorithm performs a single binary search per sequence element, its total time can be expressed using Big \(O\) notation as \(O(n\log n)\).

References: