【问题标题】:.net collection for fast insert/delete.net 集合,用于快速插入/删除
【发布时间】:2009-02-23 00:33:53
【问题描述】:

我需要维护一个连接的客户名册,这些客户的寿命很短,并且经常上下波动。由于潜在的客户数量,我需要一个支持快速插入/删除的集合。有什么建议吗?

【问题讨论】:

    标签: c# .net collections


    【解决方案1】:

    为此考虑基于哈希的集合,例如HashSetDictionaryHashTable,为添加和删除元素提供恒定的时间性能。

    来自 .NET Framework 开发人员指南的更多信息:

    【讨论】:

    • 你确定吗?我相信添加到常规列表 比添加到字典或哈希集要快得多。删除应该是different story
    【解决方案2】:

    C5 通用集合库

    我在 C# 和 C++ 中找到的最佳实现是这些——对于 C#/CLI:

    它经过充分研究,具有可扩展的单元测试,并且自 2 月以来,他们还在 .Net 中实现了通用接口,这使得使用集合变得更加容易。他们出现在 Channel9 上,并且他们对这些集合进行了广泛的性能测试。

    如果您无论如何都在使用数据结构,这些研究人员在他们的库中有一个red-black-tree 实现,类似于您启动 Lütz 反射器并查看 System.Data 的内部结构时发现的结果:p。插入复杂度:O(log(n))。

    无锁 C++ 集合

    那么,如果你可以allow for some C++ interop 并且你绝对需要速度并且希望尽可能少的开销,那么来自 Dmitriy V'jukov 的这些无锁 ADT 可能是世界上最好的,优于英特尔的并发ADT 库。

    我已经阅读了一些代码,这确实是一个精通这些东西如何组合在一起的人的素质。 VC++ 可以进行本机 C++ 互操作而没有烦人的边界。 http://www.swig.org/ 可以帮助您包装 C++ 接口以在 .Net 中使用,或者您可以通过 P/Invoke 自己完成。

    微软的看法

    他们编写了教程,this one implementing a rather unpolished skip-list in C#,并讨论了其他类型的数据结构。 (有一个更好的SkipList at CodeProject,它非常精巧并以良好的方式实现了接口。)它们还有一些与.Net 捆绑在一起的数据结构,即HashTable/Dictionary<,>HashSet。当然还有“ResizeArray”/List 类型以及堆栈和队列,但它们在搜索时都是“线性”的。

    Google 的性能工具

    如果您希望加快内存分配的时间,您可以使用 google 的 perf-tools。它们可在 google 代码中找到,它们包含一个 very interesting multi-threaded malloc-implementation (TCMalloc),它显示出比普通 malloc 更一致的时序。您可以将它与上面的无锁结构一起使用,以真正提高性能。

    通过记忆提高响应时间

    您还可以在函数上使用 memoization 通过缓存来提高性能,如果您正在使用例如F#。 F# 还允许 C++ 互操作,所以你没问题。

    O(k)

    也有可能使用在bloom-filters 上完成的研究自己做某事,这允许 O(k) 查找复杂度,其中 k 是一个常数,取决于您已实现的哈希函数的数量.这就是 google 的 BigTable 的实现方式。这些过滤器将为您提供元素,如果它在集合中,或者可能具有非常低的可能性的元素不是您正在寻找的元素(参见维基百科上的图表 - 它接近 P(wrong_key) -> 0.01 作为大小大约有 10000 个元素,但您可以通过实现进一步的哈希函数/减少集合来解决这个问题。

    我没有搜索过这个的 .Net 实现,但是由于哈希计算是独立的,你可以使用 MS's performance team's implementation of Tasks to speed that up.

    "My" take -- 随机化以达到平均 O(log n)

    碰巧我刚刚做了一个涉及数据结构的课程。在本例中,我们使用了 C++,但它很容易转换为 C#。我们构建了三种不同的数据结构;布隆过滤器、跳过列表和random binary search tree

    看最后一段后面的代码和分析。

    基于硬件的“集合”

    最后,为了让我的回答“完整”,如果你真的需要速度,你可以使用 Routing-tablesContent-addressable memory 之类的东西。这允许您在原则上非常快速地 O(1) 获得数据的“哈希”到值查找。

    随机二叉搜索树/布隆过滤器 C++ 代码

    如果您发现代码中的错误,或者只是关于我如何可以做得更好(或更好地使用模板)的指针,我将非常感谢您的反馈。请注意,布隆过滤器与现实生活中的不同。通常,您不必能够从中删除,然后它比我为允许测试 delete 所做的 hack 更节省空间。

    DataStructure.h

    #ifndef DATASTRUCTURE_H_
    #define DATASTRUCTURE_H_
    
    class DataStructure
    {
    public:
        DataStructure() {countAdd=0; countDelete=0;countFind=0;}
        virtual ~DataStructure() {}
    
        void resetCountAdd() {countAdd=0;}
        void resetCountFind() {countFind=0;}
        void resetCountDelete() {countDelete=0;}
    
        unsigned int getCountAdd(){return countAdd;}
        unsigned int getCountDelete(){return countDelete;}
        unsigned int getCountFind(){return countFind;}
    
    protected:
        unsigned int countAdd;
        unsigned int countDelete;
        unsigned int countFind;
    };
    
    #endif /*DATASTRUCTURE_H_*/
    

    Key.h

    #ifndef KEY_H_
    #define KEY_H_
    
    #include <string>
    using namespace std;
    
    const int keyLength = 128;
    
    class Key : public string
    {
    public:
        Key():string(keyLength, ' ') {}
        Key(const char in[]): string(in){}
        Key(const string& in): string(in){}
    
        bool operator<(const string& other);
        bool operator>(const string& other);
        bool operator==(const string& other);
    
        virtual ~Key() {}
    };
    
    #endif /*KEY_H_*/
    

    Key.cpp

    #include "Key.h"
    
    bool Key::operator<(const string& other)
    {
        return compare(other) < 0;
    };
    
    bool Key::operator>(const string& other)
    {
        return compare(other) > 0;
    };
    
    bool Key::operator==(const string& other)
    {
        return compare(other) == 0;
    }
    

    BloomFilter.h

    #ifndef BLOOMFILTER_H_
    #define BLOOMFILTER_H_
    
    #include <iostream>
    #include <assert.h>
    #include <vector>
    #include <math.h>
    #include "Key.h"
    #include "DataStructure.h"
    
    #define LONG_BIT 32
    #define bitmask(val) (unsigned long)(1 << (LONG_BIT - (val % LONG_BIT) - 1))
    
    // TODO: Implement RW-locking on the reads/writes to the bitmap.
    
    class BloomFilter : public DataStructure
    {
    public:
        BloomFilter(){}
        BloomFilter(unsigned long length){init(length);}
        virtual ~BloomFilter(){}
    
        void init(unsigned long length);
        void dump();
    
        void add(const Key& key);
        void del(const Key& key);
    
        /**
         * Returns true if the key IS BELIEVED to exist, false if it absolutely doesn't.
         */
        bool testExist(const Key& key, bool v = false);
    
    private:
        unsigned long hash1(const Key& key);
        unsigned long hash2(const Key& key);
        bool exist(const Key& key);
        void getHashAndIndicies(unsigned long& h1, unsigned long& h2, int& i1, int& i2, const Key& key);
        void getCountIndicies(const int i1, const unsigned long h1,
            const int i2, const unsigned long h2, int& i1_c, int& i2_c);
    
        vector<unsigned long> m_tickBook;
        vector<unsigned int> m_useCounts;
        unsigned long m_length; // number of bits in the bloom filter
        unsigned long m_pockets; //the number of pockets
    
        static const unsigned long m_pocketSize; //bits in each pocket
    };
    
    #endif /*BLOOMFILTER_H_*/
    

    BloomFilter.cpp

    #include "BloomFilter.h"
    
    const unsigned long BloomFilter::m_pocketSize = LONG_BIT;
    
    void BloomFilter::init(unsigned long length)
    {
        //m_length = length;
        m_length = (unsigned long)((2.0*length)/log(2))+1;
        m_pockets = (unsigned long)(ceil(double(m_length)/m_pocketSize));
        m_tickBook.resize(m_pockets);
    
        // my own (allocate nr bits possible to store in the other vector)
        m_useCounts.resize(m_pockets * m_pocketSize);
    
        unsigned long i; for(i=0; i< m_pockets; i++) m_tickBook[i] = 0;
        for (i = 0; i < m_useCounts.size(); i++) m_useCounts[i] = 0; // my own
    }
    
    unsigned long BloomFilter::hash1(const Key& key)
    {
        unsigned long hash = 5381;
        unsigned int i=0; for (i=0; i< key.length(); i++){
            hash = ((hash << 5) + hash) + key.c_str()[i]; /* hash * 33 + c */
        }
    
        double d_hash = (double) hash;
    
        d_hash *= (0.5*(sqrt(5)-1));
        d_hash -= floor(d_hash);
        d_hash *= (double)m_length;
    
        return (unsigned long)floor(d_hash);
    }
    
    unsigned long BloomFilter::hash2(const Key& key)
    {
        unsigned long hash = 0;
        unsigned int i=0; for (i=0; i< key.length(); i++){
            hash = key.c_str()[i] + (hash << 6) + (hash << 16) - hash;
        }
        double d_hash = (double) hash;
    
        d_hash *= (0.5*(sqrt(5)-1));
        d_hash -= floor(d_hash);
        d_hash *= (double)m_length;
    
        return (unsigned long)floor(d_hash);
    }
    
    bool BloomFilter::testExist(const Key& key, bool v){
        if(exist(key)) {
            if(v) cout<<"Key "<< key<<" is in the set"<<endl;
            return true;
        }else {
            if(v) cout<<"Key "<< key<<" is not in the set"<<endl;
            return false;
        }
    }
    
    void BloomFilter::dump()
    {
        cout<<m_pockets<<" Pockets: ";
    
        // I changed u to %p because I wanted it printed in hex.
        unsigned long i; for(i=0; i< m_pockets; i++) printf("%p ", (void*)m_tickBook[i]);
        cout<<endl;
    }
    
    void BloomFilter::add(const Key& key)
    {
        unsigned long h1, h2;
        int i1, i2;
        int i1_c, i2_c;
    
        // tested!
    
        getHashAndIndicies(h1, h2, i1, i2, key);
        getCountIndicies(i1, h1, i2, h2, i1_c, i2_c);
    
        m_tickBook[i1] = m_tickBook[i1] | bitmask(h1);
        m_tickBook[i2] = m_tickBook[i2] | bitmask(h2);
    
        m_useCounts[i1_c] = m_useCounts[i1_c] + 1;
        m_useCounts[i2_c] = m_useCounts[i2_c] + 1;
    
        countAdd++;
    }
    
    void BloomFilter::del(const Key& key)
    {
        unsigned long h1, h2;
        int i1, i2;
        int i1_c, i2_c;
    
        if (!exist(key)) throw "You can't delete keys which are not in the bloom filter!";
    
        // First we need the indicies into m_tickBook and the
        // hashes.
        getHashAndIndicies(h1, h2, i1, i2, key);
    
        // The index of the counter is the index into the bitvector
        // times the number of bits per vector item plus the offset into
        // that same vector item.
        getCountIndicies(i1, h1, i2, h2, i1_c, i2_c);
    
        // We need to update the value in the bitvector in order to
        // delete the key.
        m_useCounts[i1_c] = (m_useCounts[i1_c] == 1 ? 0 : m_useCounts[i1_c] - 1);
        m_useCounts[i2_c] = (m_useCounts[i2_c] == 1 ? 0 : m_useCounts[i2_c] - 1);
    
        // Now, if we depleted the count for a specific bit, then set it to
        // zero, by anding the complete unsigned long with the notted bitmask
        // of the hash value
        if (m_useCounts[i1_c] == 0)
            m_tickBook[i1] = m_tickBook[i1] & ~(bitmask(h1));
        if (m_useCounts[i2_c] == 0)
            m_tickBook[i2] = m_tickBook[i2] & ~(bitmask(h2));
    
        countDelete++;
    }
    
    bool BloomFilter::exist(const Key& key)
    {
        unsigned long h1, h2;
        int i1, i2;
    
        countFind++;
    
        getHashAndIndicies(h1, h2, i1, i2, key);
    
        return  ((m_tickBook[i1] & bitmask(h1)) > 0) &&
                ((m_tickBook[i2] & bitmask(h2)) > 0);
    }
    
    /*
     * Gets the values of the indicies for two hashes and places them in
     * the passed parameters. The index is into m_tickBook.
     */
    void BloomFilter::getHashAndIndicies(unsigned long& h1, unsigned long& h2, int& i1,
        int& i2, const Key& key)
    {
        h1 = hash1(key);
        h2 = hash2(key);
        i1 = (int) h1/m_pocketSize;
        i2 = (int) h2/m_pocketSize;
    }
    
    /*
     * Gets the values of the indicies into the count vector, which keeps
     * track of how many times a specific bit-position has been used.
     */
    void BloomFilter::getCountIndicies(const int i1, const unsigned long h1,
        const int i2, const unsigned long h2, int& i1_c, int& i2_c)
    {
        i1_c = i1*m_pocketSize + h1%m_pocketSize;
        i2_c = i2*m_pocketSize + h2%m_pocketSize;
    }
    

    ** RBST.h **

    #ifndef RBST_H_
    #define RBST_H_
    
    #include <iostream>
    #include <assert.h>
    #include <vector>
    #include <math.h>
    #include "Key.h"
    #include "DataStructure.h"
    
    #define BUG(str) printf("%s:%d FAILED SIZE INVARIANT: %s\n", __FILE__, __LINE__, str);
    
    using namespace std;
    
    class RBSTNode;
    class RBSTNode: public Key
    {
    public:
        RBSTNode(const Key& key):Key(key)
        {
            m_left =NULL;
            m_right = NULL;
            m_size = 1U; // the size of one node is 1.
        }
        virtual ~RBSTNode(){}
    
        string setKey(const Key& key){return Key(key);}
    
        RBSTNode* left(){return m_left; }
        RBSTNode* right(){return m_right;}
    
        RBSTNode* setLeft(RBSTNode* left) { m_left = left; return this; }
        RBSTNode* setRight(RBSTNode* right) { m_right =right; return this; }
    
    #ifdef DEBUG
        ostream& print(ostream& out)
        {
            out << "Key(" << *this << ", m_size: " << m_size << ")";
            return out;
        }
    #endif
    
        unsigned int size() { return m_size; }
    
        void setSize(unsigned int val)
        {
    #ifdef DEBUG
            this->print(cout);
            cout << "::setSize(" << val << ") called." << endl;
    #endif
    
            if (val == 0) throw "Cannot set the size below 1, then just delete this node.";
            m_size = val;
        }
    
        void incSize() {
    #ifdef DEBUG
            this->print(cout);
            cout << "::incSize() called" << endl;
    #endif
    
            m_size++;
        }
    
        void decrSize()
        {
    #ifdef DEBUG
            this->print(cout);
            cout << "::decrSize() called" << endl;
    #endif
    
            if (m_size == 1) throw "Cannot decrement size below 1, then just delete this node.";
            m_size--;
        }
    
    #ifdef DEBUG
        unsigned int size(RBSTNode* x);
    #endif
    
    private:
        RBSTNode(){}
        RBSTNode* m_left;
        RBSTNode* m_right;
        unsigned int m_size;
    };
    
    class RBST : public DataStructure
    {
    public:
        RBST() {
            m_size = 0;
            m_head = NULL;
            srand(time(0));
        };
    
        virtual ~RBST() {};
    
        /**
         * Tries to add key into the tree and will return
         *      true  for a new item added
         *      false if the key already is in the tree.
         *
         * Will also have the side-effect of printing to the console if v=true.
         */
        bool add(const Key& key, bool v=false);
    
        /**
         * Same semantics as other add function, but takes a string,
         * but diff name, because that'll cause an ambiguity because of inheritance.
         */
        bool addString(const string& key);
    
        /**
         * Deletes a key from the tree if that key is in the tree.
         * Will return
         *      true  for success and
         *      false for failure.
         *
         * Will also have the side-effect of printing to the console if v=true.
         */
        bool del(const Key& key, bool v=false);
    
        /**
         * Tries to find the key in the tree and will return
         *      true if the key is in the tree and
         *      false if the key is not.
         *
         * Will also have the side-effect of printing to the console if v=true.
         */
        bool find(const Key& key, bool v = false);
    
        unsigned int count() { return m_size; }
    
    #ifdef DEBUG
        int dump(char sep = ' ');
        int dump(RBSTNode* target, char sep);
        unsigned int size(RBSTNode* x);
    #endif
    
    private:
        RBSTNode* randomAdd(RBSTNode* target, const Key& key);
        RBSTNode* addRoot(RBSTNode* target, const Key& key);
        RBSTNode* rightRotate(RBSTNode* target);
        RBSTNode* leftRotate(RBSTNode* target);
    
        RBSTNode* del(RBSTNode* target, const Key& key);
        RBSTNode* join(RBSTNode* left, RBSTNode* right);
    
        RBSTNode* find(RBSTNode* target, const Key& key);
    
        RBSTNode* m_head;
        unsigned int m_size;
    };
    
    #endif /*RBST_H_*/
    

    ** RBST.cpp **

    #include "RBST.h"
    
    bool RBST::add(const Key& key, bool v){
        unsigned int oldSize = m_size;
        m_head = randomAdd(m_head, key);
        if (m_size > oldSize){
            if(v) cout<<"Node "<<key<< " is added into the tree."<<endl;
            return true;
        }else {
            if(v) cout<<"Node "<<key<< " is already in the tree."<<endl;
            return false;
        }
        if(v) cout<<endl;
    };
    
    bool RBST::addString(const string& key) {
        return add(Key(key), false);
    }
    
    bool RBST::del(const Key& key, bool v){
        unsigned oldSize= m_size;
        m_head = del(m_head, key);
        if (m_size < oldSize) {
            if(v) cout<<"Node "<<key<< " is deleted from the tree."<<endl;
            return true;
        }
        else {
            if(v) cout<< "Node "<<key<< " is not in the tree."<<endl;
            return false;
        }
    };
    
    bool RBST::find(const Key& key, bool v){
        RBSTNode* ret = find(m_head, key);
        if (ret == NULL){
            if(v) cout<< "Node "<<key<< " is not in the tree."<<endl;
            return false;
        }else {
            if(v) cout<<"Node "<<key<< " is in the tree."<<endl;
            return true;
        }
    };
    
    #ifdef DEBUG
    int RBST::dump(char sep){
        int ret = dump(m_head, sep);
        cout<<"SIZE: " <<ret<<endl;
        return ret;
    };
    
    int RBST::dump(RBSTNode* target, char sep){
        if (target == NULL) return 0;
        int ret = dump(target->left(), sep);
        cout<< *target<<sep;
        ret ++;
        ret += dump(target->right(), sep);
        return ret;
    };
    #endif
    
    /**
     * Rotates the tree around target, so that target's left
     * is the new root of the tree/subtree and updates the subtree sizes.
     *
     *(target)  b               (l) a
     *         / \      right      / \
     *        a   ?     ---->     ?   b
     *       / \                     / \
     *      ?   x                   x   ?
     *
     */
    RBSTNode* RBST::rightRotate(RBSTNode* target) // private
    {
        if (target == NULL) throw "Invariant failure, target is null"; // Note: may be removed once tested.
        if (target->left() == NULL) throw "You cannot rotate right around a target whose left node is NULL!";
    
    #ifdef DEBUG
        cout    <<"Right-rotating b-node ";
        target->print(cout);
        cout    << " for a-node ";
        target->left()->print(cout);
        cout    << "." << endl;
    #endif
    
        RBSTNode* l = target->left();
        int as0 = l->size();
    
        // re-order the sizes
        l->setSize( l->size() + (target->right() == NULL ? 0 : target->right()->size()) + 1); // a.size += b.right.size + 1; where b.right may be null.
        target->setSize( target->size() -as0 + (l->right() == NULL ? 0 : l->right()->size()) ); // b.size += -a_0_size + x.size where x may be null.
    
        // swap b's left (for a)
        target->setLeft(l->right());
    
        // and a's right (for b's left)
        l->setRight(target);
    
    #ifdef DEBUG
        cout    << "A-node size: " << l->size() << ", b-node size: " << target->size() << "." << endl;
    #endif
    
        // return the new root, a.
        return l;
    };
    
    /**
     * Like rightRotate, but the other way. See docs for rightRotate(RBSTNode*)
     */
    RBSTNode* RBST::leftRotate(RBSTNode* target)
    {
        if (target == NULL) throw "Invariant failure, target is null";
        if (target->right() == NULL) throw "You cannot rotate left around a target whose right node is NULL!";
    
    #ifdef DEBUG
        cout    <<"Left-rotating a-node ";
        target->print(cout);
        cout    << " for b-node ";
        target->right()->print(cout);
        cout    << "." << endl;
    #endif
    
        RBSTNode* r = target->right();
        int bs0 = r->size();
    
        // re-roder the sizes
        r->setSize(r->size() + (target->left() == NULL ? 0 : target->left()->size()) + 1);
        target->setSize(target->size() -bs0 + (r->left() == NULL ? 0 : r->left()->size()));
    
        // swap a's right (for b's left)
        target->setRight(r->left());
    
        // swap b's left (for a)
        r->setLeft(target);
    
    #ifdef DEBUG
        cout    << "Left-rotation done: a-node size: " << target->size() << ", b-node size: " << r->size() << "." << endl;
    #endif
    
        return r;
    };
    
    //
    /**
     * Adds a key to the tree and returns the new root of the tree.
     * If the key already exists doesn't add anything.
     * Increments m_size if the key didn't already exist and hence was added.
     *
     * This function is not called from public methods, it's a helper function.
     */
    RBSTNode* RBST::addRoot(RBSTNode* target, const Key& key)
    {
        countAdd++;
    
        if (target == NULL) return new RBSTNode(key);
    
    #ifdef DEBUG
        cout << "addRoot(";
        cout.flush();
        target->print(cout) << "," << key << ") called." << endl;
    #endif
    
        if (*target < key)
        {
            target->setRight( addRoot(target->right(), key) );
            target->incSize(); // Should I?
            RBSTNode* res = leftRotate(target);
    #ifdef DEBUG
            if (target->size() != size(target))
                BUG("in addRoot 1");
    #endif
            return res;
        }
    
        target->setLeft( addRoot(target->left(), key) );
        target->incSize(); // Should I?
        RBSTNode* res = rightRotate(target);
    #ifdef DEBUG
        if (target->size() != size(target))
            BUG("in addRoot 2");
    #endif
        return res;
    };
    
    /**
     * This function is called from the public add(key) function,
     * and returns the new root node.
     */
    RBSTNode* RBST::randomAdd(RBSTNode* target, const Key& key)
    {
        countAdd++;
    
        if (target == NULL)
        {
            m_size++;
            return new RBSTNode(key);
        }
    
    #ifdef DEBUG
        cout << "randomAdd(";
        target->print(cout) << ", \"" << key << "\") called." << endl;
    #endif
    
        int r = (rand() % target->size()) + 1;
    
        // here is where we add the target as root!
        if (r == 1)
        {
            m_size++;   // TODO: Need to lock.
            return addRoot(target, key);
        }
    
    #ifdef DEBUG
        printf("randomAdd recursion part, ");
    #endif
    
        // otherwise, continue recursing!
        if (*target <= key)
        {
    #ifdef DEBUG
        printf("target <= key\n");
    #endif
            target->setRight( randomAdd(target->right(), key) );
            target->incSize(); // TODO: Need to lock.
    #ifdef DEBUG
            if (target->right()->size() != size(target->right()))
                BUG("in randomAdd 1");
    #endif
        }
        else
        {
    #ifdef DEBUG
        printf("target > key\n");
    #endif
            target->setLeft( randomAdd(target->left(), key) );
            target->incSize(); // TODO: Need to lock.
    #ifdef DEBUG
            if (target->left()->size() != size(target->left()))
                BUG("in randomAdd 2");
    #endif
        }
    
    #ifdef DEBUG
        printf("randomAdd return part\n");
    #endif
    
        m_size++;       // TODO: Need to lock.
        return target;
    };
    
    /////////////////////////////////////////////////////////////
    /////////////////////  DEL FUNCTIONS ////////////////////////
    /////////////////////////////////////////////////////////////
    
    /**
     * Deletes a node with the passed key.
     * Returns the root node.
     * Decrements m_size if something was deleted.
     */
    RBSTNode* RBST::del(RBSTNode* target, const Key& key)
    {
        countDelete++;
    
        if (target == NULL) return NULL;
    
    #ifdef DEBUG
        cout << "del(";
        target->print(cout) << ", \"" << key << "\") called." << endl;
    #endif
    
        RBSTNode* ret = NULL;
    
        // found the node to delete
        if (*target == key)
        {
            ret = join(target->left(), target->right());
    
            m_size--;
            delete target;
    
            return ret; // return the newly built joined subtree!
        }
    
        // store a temporary size before recursive deletion.
        unsigned int size = m_size;
    
        if (*target < key)  target->setRight( del(target->right(), key) );
        else                target->setLeft( del(target->left(), key) );
    
        // if the previous recursion changed the size, we need to decrement the size of this target too.
        if (m_size < size) target->decrSize();
    
    #ifdef DEBUG
        if (RBST::size(target) != target->size())
            BUG("in del");
    #endif
    
        return target;
    };
    
    /**
     * Joins the two subtrees represented by left and right
     * by randomly choosing which to make the root, weighted on the
     * size of the sub-tree.
     */
    RBSTNode* RBST::join(RBSTNode* left, RBSTNode* right)
    {
        if (left == NULL) return right;
        if (right == NULL) return left;
    
    #ifdef DEBUG
        cout << "join(";
        left->print(cout);
        cout << ",";
        right->print(cout) << ") called." << endl;
    #endif
    
        // Find the chance that we use the left tree, based on its size over the total tree size.
        // 3 s.d. randomness :-p e.g. 60.3% chance.
        bool useLeft = ((rand()%1000) < (signed)((float)left->size()/(float)(left->size() + right->size()) * 1000.0));
    
        RBSTNode* subtree = NULL;
    
        if (useLeft)
        {
            subtree = join(left->right(), right);
    
            left->setRight(subtree)
                ->setSize((left->left() == NULL ? 0 : left->left()->size())
                            + subtree->size() + 1 );
    
    #ifdef DEBUG
            if (size(left) != left->size())
                BUG("in join 1");
    #endif
    
            return left;
        }
    
        subtree = join(right->left(), left);
    
        right->setLeft(subtree)
             ->setSize((right->right() == NULL ? 0 : right->right()->size())
                        + subtree->size() + 1);
    
    #ifdef DEBUG
        if (size(right) != right->size())
            BUG("in join 2");
    #endif
    
        return right;
    };
    
    /////////////////////////////////////////////////////////////
    /////////////////////  FIND FUNCTIONS ///////////////////////
    /////////////////////////////////////////////////////////////
    
    /**
     * Tries to find the key in the tree starting
     * search from target.
     *
     * Returns NULL if it was not found.
     */
    RBSTNode* RBST::find(RBSTNode* target, const Key& key)
    {
        countFind++; // Could use private method only counting the first call.
        if (target == NULL) return NULL; // not found.
        if (*target == key) return target; // found (does string override ==?)
        if (*target < key) return find(target->right(), key); // search for gt to the right.
        return find(target->left(), key); // search for lt to the left.
    };
    
    #ifdef DEBUG
    
    unsigned int RBST::size(RBSTNode* x)
    {
        if (x == NULL) return 0;
        return 1 + size(x->left()) + size(x->right());
    }
    
    #endif
    

    我将再次保存 SkipList,因为已经可以从链接中找到好的 SkipList 实现,而且我的版本并没有太大的不同。

    测试文件生成的图表如下:

    图表显示为 BloomFilter、RBST 和 SkipList 添加新项目所用的时间。 graph http://haf.se/content/dl/addtimer.png

    图表显示为 BloomFilter、RBST 和 SkipList 查找项目所用的时间 graph http://haf.se/content/dl/findtimer.png

    图表显示删除 BloomFilter、RBST 和 SkipList 项所用的时间 graph http://haf.se/content/dl/deltimer.png

    如您所见,随机二叉搜索树比 SkipList 好很多。布隆过滤器不辜负它的 O(k)。

    【讨论】:

    • 你知道为什么 C5 不是 .NET 标准集合的一部分吗?
    • 它是在.Net创建之后创建的。它来自第三方,.Net 集合适用于大多数用途。
    • Henrik,你能告诉我你是否从布隆过滤器访问/检索项目吗?还是第二张图只是检测存在?
    • 只是检测不存在。如果你得到一个真实的背部,那么它可能(可能)存在。如果你得到一个假的,那么它肯定不存在。如您所见,时间很短,大约 1.5 毫秒。
    • 哇,好棒的答案!通常是“快速插入/删除?”问题的最佳答案。只是“使用标准Dictionary&lt;Key,Value&gt;”或“使用标准HashSet&lt;T&gt;”。但是既然我们无论如何都在谈论第三方库,如果内置集合缺少您需要的东西,请考虑是否可以使用替代集合类型在LoycCore 可能会满足您的需求,例如BDictionary&lt;Key,Value&gt; or BMultiMap&lt;Key,Value&gt;
    【解决方案3】:

    好吧,你需要查询多少?链接列表具有快速插入/删除(在任何位置),但不如(例如)字典/排序列表那么快搜索。或者,每个中都有一个位/值对的直接列表 - 即“仍然具有价值”。只需在追加之前重新使用 逻辑上 空单元格。删除只是清除单元格。

    对于引用类型,这里可以使用“null”。对于值类型,Nullable&lt;T&gt;

    【讨论】:

      【解决方案4】:

      您可以使用 Hashtable 或强类型 Dictionary。客户端类可能会覆盖 GetHashCode 以提供更快的哈希码生成,或者如果使用 Hashtable,您可以选择使用 IHashCodeProvider。

      【讨论】:

        【解决方案5】:

        您需要如何找到客户?元组/字典是否必要?您很有可能在 Jeffrey Richter 的 Power Collections 库中找到解决问题的方法,该库包含列表、树和大多数您能想到的数据结构。

        【讨论】:

          【解决方案6】:

          Channel9 对 Peter Sestoft 的采访给我留下了深刻的印象:

          channel9.msdn.com/shows/Going+Deep/Peter-Sestoft-C5-Generic-Collection-Library-for-C-and-CLI/

          他是哥本哈根 IT 大学的教授,曾帮助创建 C5 通用集合库:

          www.itu.dk/research/c5/

          这可能是矫枉过正,或者它可能只是你正在寻找的快速收集......

          第,

          -迈克

          【讨论】:

          • 是的!打败你30秒! :D
          猜你喜欢
          • 2017-07-17
          • 2011-07-09
          • 2017-06-06
          • 2014-09-24
          • 1970-01-01
          • 2012-01-09
          • 2014-03-04
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多