在不使用任何内置函数的情况下从数组中捕获重复项答案

【问题标题】：Capture duplicates from array without using any inbuilt functions在不使用任何内置函数的情况下从数组中捕获重复项
【发布时间】：2021-03-02 05:17:46
【问题描述】：

我的任务是编写自己的实现来从数组中删除重复对象。数组未排序。

例如，我有这个对象数组

ItemsList[] objects = {
                new ItemsList("ob1"),
                new ItemsList("ob2"),
                new ItemsList("ob2"),
                new ItemsList("ob1"),
                new ItemsList("ob3")
        };

"ob1" stands for itemId

我的目标是获得像 ["ob1", "ob2", "ob3"] 这样的结果数组，但在尝试查找未加倍的对象并将其添加到数组时给出 NullPointerException。

注意：不能使用Set、HashSet、ArrayList、Arrays.copyOf、Sort等或迭代器等任何其他工具。

到目前为止，我已经这样做了：

public String[] removeDuplicates(ItemsList[] objects) {
        String[] noDubs = new String[objects.length];
        int len = objects.length;
        int pos = 0;

        for (int i = 0; i < len; i++) {
            for (int j = i + 1; j < len; j++) {
                if (objects[i].getItemId().equals(objects[j].getItemId())) {
                    noDubs[pos] = objects[i].getItemId();
                    pos++;
                }
                else {
                    //NullPointerException given
                    if (!objects[i].getItemId().equals(objects[j].getItemId()) && !objects[i].getItemId().contains(noDubs[i])) {
                        noDubs[pos] = objects[i].getItemId();
                        pos++;
                    }
                }
            }

        }
        String[] result = new String[pos];
        for(int k = 0; k < pos; k++) {
            result[k] = noDubs[k];
        }
        return result;
    }

getItemId 是类 ItemsList 方法

【问题讨论】：

标签： java arrays

【解决方案1】：

这是另一种选择。这会在第一次找到条目时将其复制到缓冲区，然后将缓冲区复制到正确大小的数组中。

在缓冲区中查找重复项而不是原始数组是有好处的。如果有很多重复项，那么在缓冲区中查找时的检查将比在原始数组中查找时要少。

我拉出循环以检查缓冲区中的项目是否进入另一个函数。这样可以避免嵌套 for 循环，从而使其更易于阅读。

我认为这种整体方法还减少了跟踪所需的变量数量，这也有助于使其更易于阅读。

private static ItemsList[] removeDuplicates(ItemsList[] arr) {
    ItemsList[] buffer = new ItemsList[arr.length];
    int bufferLength = 0;
    
    for (ItemsList candidate : arr) {
        if (!isInBuffer(candidate, buffer, bufferLength)) {
            buffer[bufferLength] = candidate;
            bufferLength++;
        }
    }   
    
    ItemsList[] result = new ItemsList[bufferLength];
    for (int i = 0; i < bufferLength; i++) {
        result[i] = buffer[i];
    }
    return result;
}

private static boolean isInBuffer(ItemsList candidate, ItemsList[] buffer, int bufferLength) {
    for(int i = 0; i < bufferLength; i++) {
        if (Objects.equals(candidate, buffer[i])) {
            return true;
        }
    }
    return false;
}

【讨论】：

【解决方案2】：

这是一个实现。如果它等于下一个元素之一，只需将该项目标记为重复。稍后将其添加到需要增长的输出数组中。

    public String[] removeDuplicates(ItemList[] items) {
        String[] output = {};
        for (int i = 0; i < items.length; i++) {
            boolean isDuplicated = false;
            ItemList current = items[i];            
            if (current == null)
                throw new RuntimeException("item can not be null");
            for (int j = i + 1; j < items.length; j++) {
                if (current.equals(items[j])) {
                    isDuplicated = true;
                    break;
                }
            }
            if (!isDuplicated) {
                String[] temp = new String[output.length + 1];
                System.arraycopy(output, 0, temp, 0, output.length);
                temp[output.length] = current.getItemId();
                output = temp;
            }
        }
        return output;
    }

您还可以将每个重复的元素设为 null，然后将每个非 null 元素添加到输出数组，如下所示：

    public String[] removeDuplicates2(ItemList[] items) {
        String[] temp = new String[items.length];
        int inTemp = 0;
        for (ItemList item : items) {
            boolean isDuplicated = false;
            if (item == null)
                throw new RuntimeException("item can not be null");
            for (int i = 0; i < inTemp; i++) {
                if (item.getItemId().equals(temp[i])) {
                    isDuplicated = true;
                    break;
                }
            }
            if (!isDuplicated) temp[inTemp++] = item.getItemId();
        }
        String[] output = new String[inTemp];
        System.arraycopy(temp, 0, output, 0, inTemp);
        return output;
    }

但请注意，第一种解决方案更快

【讨论】：

如果您不想使用 System.arraycopy 语句，请将其替换为另一个循环。
有没有其他方法，我可以在没有arraycopy 的情况下做到这一点？ :)
是的，将这行代码：System.arraycopy(output, 0, temp, 0, output.length); 替换为 for (int j = 0; j < output.length; j++) temp[j] = output[j];。但我不建议这样做，因为stackoverflow.com/questions/18638743/…
哦，谢谢你，我只是不太确定我是否可以使用它，所以这就是为什么我要求另一种方法：D
如果这是一个教育练习，这是可以理解的。不要在生产中这样做！ ;)

【解决方案3】：

最好使用中间布尔数组来跟踪重复项并定义结果数组的长度。当同一元素可能出现超过 2 次时，这将有助于检测多个重复项。

此外，我们需要确保 equals（可能还有 hashCode）方法在 ItemList 中被正确覆盖：

// class ItemList
public boolean equals(Object o) {
    if (null == o || !(o instanceof ItemList)) {
        return false;
    }
    if (o == this) return true;
    ItemList that = (ItemList) o;
    return Objects.equals(this.itemId, that.itemId);
}

public int hashCode() {
    return Objects.hash(this.itemId);
}

在检查ItemList 是否相等时，最好使用Objects.equals 来处理null 值，而不是抛出NullPointerException。因此，输入 items 中重复的 null 条目也将被过滤掉。

public static String[] removeDuplicates(ItemList[] items) {
    final int n = items.length;
    if (n < 1) {
        return new String[0];
    }
    boolean[] dups = new boolean[n];
    int dupCount = 0;

    for (int i = 0; i < n; i++) {
        ItemList current = items[i];
        for (int j = i + 1; j < n; j++) {
            if (dups[j]) {
                continue;
            }
            if (Objects.equals(current, items[j])) {
                dups[j] = true;
                dupCount++;
            }
        }
    }
    String[] output = new String[n - dupCount];
    for (int i = 0, j = 0; i < n; i++) {
        if (!dups[i]) {
            output[j++] = null == items[i] ? "<NULL>" : items[i].getItemId();
        }
    }
    // info message
    System.out.printf("Found and removed %d duplicate value%s%n", dupCount, dupCount != 1 ? "s" : "");
    return output;
}

测试：

ItemList[] items = {
    null, new ItemList("ob1"), new ItemList("ob2"), new ItemList("ob2"), new ItemList("ob1"),
    new ItemList("ob3"), null, new ItemList("ob3"), new ItemList(null), new ItemList("ob5"),
    new ItemList("ob2"), new ItemList(null), new ItemList("ob4"), new ItemList("ob5"), null,
};

System.out.println(Arrays.toString(removeDuplicates(items)));

// compare removal of duplicates to using set
System.out.println("\nUsing Set");
Set<ItemList> set = new LinkedHashSet<>(Arrays.asList(items));  
System.out.println(set);

输出：

Found and removed 8 duplicate values
[<NULL>, ob1, ob2, ob3, null, ob5, ob4]

Using Set
[null, {id=ob1}, {id=ob2}, {id=ob3}, {id=null}, {id=ob5}, {id=ob4}]

【讨论】：