从数组中删除重复字符答案

【问题标题】：Removing duplicate character from array从数组中删除重复字符
【发布时间】：2011-03-24 19:29:48
【问题描述】：

在阅读Gayle Laakmann 的一本名为Cracking the coding interview 的书时，我遇到了这个问题

设计一个算法并编写代码来去除重复字符在不使用任何额外缓冲区的字符串中。注意：一两个附加变量很好。没有额外的数组副本。

还有这段代码：-

 public static void removeDuplicates(char[] str) {
        if (str == null) {
            return;
        }
        int len = str.length;
        if (len < 2) {
            return;
        }

        int tail = 1;

        for (int i = 1; i < len; ++i) {
            int j;
            for (j = 0; j < tail; ++j) {
                if (str[i] == str[j]) {
                    break;
                }
            }
            if (j == tail) {
                str[tail] = str[i];
                ++tail;
            }
        }
        str[tail] = 0;
    }

应该从数组中删除重复字符。通过一次又一次地替换同一个字符，我似乎并不理解算法在做什么。我以为只有我觉得算法不起作用，但事实上，当我运行这段代码时，它给了我错误的输出。这是书中的严重错误还是我没有理解这个问题？

【问题讨论】：

标签： java arrays algorithm

【解决方案1】：

算法似乎在工作，但没有清除剩余的字符。将代码更改为以下代码，它可以工作：注意：替换：

str[tail] = 0;

与：

    for(; tail < len;tail++){
        str[tail] = 0;
    }

public static void removeDuplicates(char[] str) {
        if (str == null) {
            return;
        }
        int len = str.length;
        if (len < 2) {
            return;
        }

        int tail = 1;

        for (int i = 1; i < len; ++i) {
            int j;
            for (j = 0; j < tail; ++j) {
                if (str[i] == str[j]) {
                    break;
                }
            }

            if (j == tail) {
                str[tail] = str[i];
                ++tail;
            }

        }
        for(; tail < len;tail++){
            str[tail] = 0;
        }

    }

【讨论】：

for char[] str = {'a','a'};它给出了 [a, ]

【解决方案2】：

使用位向量的解决方案。

时间：O(n)，其中n = length of the string

空格：O(1)

void removeduplicatas(char str[]){
    int i, checker = 0, bitvalue = 0, value = 0, tail = 0;
    i = 0;
    tail = 0;
    while(str[i]){
        value = str[i] - 'a';
        bitvalue = 1 << value;
        if((checker & bitvalue) == 0 ){
            str[tail++] = str[i];
            checker |= bitvalue;
        }
        i++;
    }
    str[tail] = '\0';
}

【讨论】：

【解决方案3】：

在 Java 中，数组的大小是固定的。因此，如果调用的函数发现任何重复项，则无法更改输入数组的大小。您的函数只是制作与0 重复的子数组的起始索引。因此，当您在调用函数中打印数组内容时，0 的元素不会被打印，但它后面的元素（如果有的话）会被打印。

YoK 的答案使子数组中所有重复的元素为 0。这样当您在调用函数中打印它时，不会打印重复的元素。但是你需要记住，数组的大小仍然没有改变。

或者，您可以返回具有唯一字符的子数组的大小。在你的情况下是tail。

另一种选择是将输入作为StringBuffer 传递并就地进行更改：

public static void removeDuplicates(StringBuffer str) {                        

        int len = str.length();

        // if the string as less than 2 char then it can't have duplicates.
        if (len < 2) {                         
                return;
        }

        // fist character will never be duplicate.
        // tail is the index of the next unique character.
        int tail = 1;

        // iterate from 2nd character.
        for (int i = 1; i < len; ++i) {
                int j;

                // is char at index i already in my list of uniq char?
                for (j = 0; j < tail; ++j) {
                        if (str.charAt(i) == str.charAt(j)) {
                                break;
                        }      
                }

                // if no then add it to my uniq char list.
                if (j == tail) {                       
                        str.setCharAt(tail, str.charAt(i));

                        // increment tail as we just added a new ele.
                        ++tail;
                }
        }
        // at this point the characters from index [0,tail) are unique
        // if there were any duplicates they are between [tail,input.length)
        // so truncate the length of input to tail.
        str.setLength(tail);
}

Ideone Link

【讨论】：

【解决方案4】：

这是一种使用 C++ 和递归循环遍历字符串的每个字符并在固定宽度字符中使用上述位串方法的解决方案。您需要确保固定宽字符串长于需要检查的 k 类型字符。

#include <cstdint>
#include <iostream>

bool CheckUniqueChars(char *string, uint32_t index, uint32_t checker){

char character = string[index];

if(character=='\0'){
    return true;
}else{
    int value = character - 'a';

    if((checker&(1<<value))>0){
        return false;
    }else{
       checker |= (1<<value);
       return CheckUniqueChars(string,++index,checker);
    }
   }
}


int main(int argc, char *argv[]){

    char *string = argv[1];
    uint32_t idx=0,checker=0;

 if(CheckUniqueChars(string,idx,checker)){
        std::cout << "all characters are unique" << std::endl;
 }else{
    std::cout << "there are duplicate characters" << std::endl;
 }

 return 0;
}

【讨论】：

【解决方案5】：

我临时编写了 YoK 给出的代码以避免使用

for(; tail < len;tail++){
       str[tail] = 0;
}

相反，我们可以在第一个循环本身中设置空白。

public static void removeDuplicates(char[] str){
    if (str == null) {
        return;
    }
    int len = str.length;
    if (len < 2) {
        return;
    }

    int tail = 1;

    for(int i=1;i<len;++i){
        int j;
        for(j=0;j<tail;++j){
            if(str[i] == str[j]) break;
        }
        if(j==tail){
            str[tail] = str[i];
            if(i!=tail)str[i]=0;
            ++tail;
        }else{
            str[i]=0;
        }

    }
}

【讨论】：

那本书中给出的算法确实不起作用。它已经被 YoK 的回答等其他答案纠正了。我改进了 YoK 的答案以避免使用另一个 for 循环，编辑了我的答案。谢谢。