Friday, November 4, 2011

Remove Duplicate Characters

Problem:
Given a string, remove the duplicate characters in the string.

Assumption:
All the characters are from ASCII set and in the range 0~255. Also, the character null character does not appear in the string.

Requirement:
An extra copy of the array is not allowed.

Similar to last post about approaches to deciding unique characters or not, this can be done in place or with an array of fixed size.

Note:
Although C++ algorithm library has unique function, which may seem appropriate for this problem. However, that unique function just removes the consecutive duplicates.

We may also sort the string first if we can remove any duplicates, not necessarily keeping the first one in appearance order. In this way, we may get to time complexity \(O(n\log n)\). Also, this method would destroy the order of unique characters. This solution is not shown in this post.

Solution 1:
For each character in the string and unique, detect all duplicates in the string afterwards. Mark those duplicates and remove them in the end. Suppose the string is given as a char array, otherwise we may need convert a String to StringBuffer, StringBuilder, or CharArray.

Solution 2:
For each character in the string, first determine if it's duplicate. If yes, simply skip it; if not, keep it in the string.

Solution 3:
An extra array of fixed size can be used to record which characters have appeared. This can reduce the time complexity from \(O(n^2)\) to \(O(n)\). Also, we can have boolean array or bitset approaches. Each of them can be applied to the above 2 schemes and we can have 4 different algorithms. Here, I am showing boolean array with Scheme 1 in this solution, and bitset with Scheme 2 for next solution.

Solution 4:
Use bitset with the above Scheme 2 to remove the duplicates.

No comments:

Post a Comment