【问题标题】:Fastest way to sort two vectors (key/values) at the same time?同时对两个向量(键/值)进行排序的最快方法?
【发布时间】:2014-01-21 07:10:10
【问题描述】:

出于超级计算模拟的目的,我有一个包含两个大(数十亿元素)std::vector 的结构:一个 std::vector 的“键”(64 位整数)和一个 std::vector 的“值”。我不能使用std::map,因为在我考虑的模拟中,向量比std::map 优化得多。此外,由于单独的向量提供了一些优化和缓存效率,我不能使用对向量。而且我不能使用任何额外的内存。

那么,考虑到这些约束,通过增加键的值对两个向量进行排序的最优化方法是什么? (欢迎使用模板元编程和疯狂的编译时技巧)

【问题讨论】:

  • 您能否提供更多详细信息:这些向量是如何填充的?您需要多久对它们进行一次分类? “不能使用任何额外的内存”是否意味着没有额外的内存,一切都必须就地发生,还是非常有限?
  • 友好的建议:如果你不再说你所做的一切都是为了超级计算,那么得到有用的答案可能会更容易。不仅很难认真对待这种说法,而且任何没有使用过超级计算机的人也很难知道正确的答案应该是什么。我自己从未使用过超级计算机,但我的理解是,为了让超级计算机发挥出色的性能,您必须为 那个特定架构 编写代码,而这似乎不是您想要的。重新做。我可能是错的,不确定,但无论如何你应该避免把那句话放进去。
  • 另一个友好的建议:在提问之前开始做更多的研究。如果你这样做了,你会从重复的 Q 链接中找到 this quasi-solution

标签: c++ sorting c++11 vector stl


【解决方案1】:

我脑子里冒出两个想法:

  • 采用快速排序实现并将其应用于“关键”向量;但是修改代码,以便每次对键向量进行交换时,它也会对值向量执行相同的交换。

  • 或者,也许更符合 C++ 的精神,编写一个自定义的“包装器”迭代器,它一次迭代两个向量(取消引用时返回 std::pair)。也许Boost有一个?然后,您可以将其与 std::sort 和仅考虑“键”的自定义比较函数结合使用。

编辑:

我在过去作为 C 程序员时曾使用过这里的第一个建议来解决类似的问题。由于显而易见的原因,它远非理想,但它可能是让事情顺利进行的最快方式。

我没有用std::sort 尝试过这样的包装迭代器,但是 cmets 中的 TemplateRex 说它不起作用,我很高兴在那个问题上听从他。

【讨论】:

  • +1 表示第二个建议(包装迭代器)。这才是正确的 C++ 方式。
  • zip_iteratorBoost 中有一个zip_iterator,但尚不清楚它是否可以在这种情况下工作(是否是 RandomAccessIterator)。
  • @MatthieuM。来自文档:“zip_iterator 的 iterator_category 成员可转换为 IteratorTuple 参数中迭代器类型的遍历类别的最小值。例如,如果 zip_iterator 仅包含向量迭代器,则 iterator_category 可转换为 boost::random_access_traversal_tag。”
  • @MatthieuM。 zip_iterators 不适合排序:stackoverflow.com/a/9343991/819272
  • 对不起,我不得不以目前的形式对此投反对票。首先,修改快速排序源非常脆弱。此外,如果没有一个实例,我不会接受这样一种说法,即编写一个可以与std::sort 一起使用的包装迭代器很容易。例如。如果*WrapIt 产生pair 的引用,则不能保证这有效,因为std::sort 不是强制使用swapiter_swap。 Boost Mailinglists 对此进行了长时间的讨论,而且 AFAIK,没有简单的解决方案。
【解决方案2】:

我认为问题可能分为两个独立的部分:

  1. 如何为虚拟地图制作有效的迭代器
  2. 使用哪种排序算法

迭代器

实现迭代器的主要问题是如何返回未创建的键/值对 不必要的副本。我们可以通过对value_typereference 使用不同的类型来实现它。我的实现在这里。

template <typename _Keys, typename _Values>
class virtual_map
{
public:
    typedef typename _Keys::value_type key_type;
    typedef typename _Values::value_type mapped_type;
    typedef std::pair<key_type, mapped_type> value_type;
    typedef std::pair<key_type&, mapped_type&> proxy;
    typedef std::pair<const key_type&, const mapped_type&> const_proxy;

    class iterator : 
        public boost::iterator_facade < iterator, value_type, boost::random_access_traversal_tag, proxy >
    {
        friend class boost::iterator_core_access;

    public:
        iterator(virtual_map *map_, size_t offset_) :
            map(map_), 
            offset(offset_)
        {}

        iterator(const iterator &other_) 
        {
            this->map = other_.map;
            this->offset = other_.offset;
        }

    private:
        bool equal(const iterator &other) const
        {
            assert(this->map == other.map);
            return this->offset == other.offset;
        }

        void increment() { ++offset; }
        void decrement() { --offset; }

        void advance(difference_type n) { offset += n; }

        reference dereference() const { return reference(map->keys[offset], map->values[offset]); }

        difference_type distance_to(const iterator &other_) const { return other_.offset - this->offset; }

    private:
        size_t offset;
        virtual_map *map;
    };

public:
    virtual_map(_Keys &keys_, _Values &values_) :
        keys(keys_), 
        values(values_) 
    {
        if(keys_.size() != values_.size())
            throw std::runtime_error("different size");
    }

public:
    iterator begin() { return iterator(this, 0); }
    iterator end() { return iterator(this, keys.size()); }

protected:
    _Keys &keys;
    _Values &values;
};

使用示例:

int main(int argc, char* const argv[]) 
{
    std::vector<int> keys_ = { 17, 2, 13, 4, 51, 78, 49, 37, 1 };
    std::vector<std::string> values_ = { "17", "2", "13", "4", "51", "78", "49", "37", "1" };

    typedef virtual_map<std::vector<int>, std::vector<std::string>> map;

    map map_(keys_, values_);

    std::sort(std::begin(map_), std::end(map_), [](map::const_proxy left_, map::const_proxy right_)
    {
        return left_.first < right_.first;
    });

    return 0;
}

排序算法

如果没有额外的细节,很难推断出哪种方法更好。你有什么内存限制?是否可以使用并发?

【讨论】:

  • 先生,您之前应该要求澄清,而不是提供不完整的答案。
【解决方案3】:

有一些问题:

  • 同时迭代两个序列需要一对表示 对序列元素的引用 - 该对本身不是 参考。因此,处理引用的算法将不起作用。
  • 性能会退化(序列松散耦合)-

使用一对引用和 std::sort 的实现:

// Copyright (c) 2014 Dieter Lucking. Distributed under the Boost
// software License, Version 1.0. (See accompanying file
// LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)

#include <algorithm>
#include <chrono>
#include <memory>
#include <iostream>

// None
// ============================================================================

/// A void type
struct None {

    None()
    {}

    /// Explicit conversion to None.
    template <typename T>
    explicit None(const T&)
    {}

    template <typename T>
    None& operator = (const T&) {
        return *this;
    }

    /// Never null.
    None* operator & () const;
};

extern None& none();
inline None* None::operator & () const { return &none(); }

None& none() {
    static None result;
    return result;
}


// IteratorAdaptorTraits
// ============================================================================

namespace Detail {

    // IteratorAdaptorTraits
    // =====================

    template <typename Iterator, typename ReturnType, bool IsReference>
    struct IteratorAdaptorTraits;

    // No reference
    // ============

    template <typename Iterator, typename ReturnType>
    struct IteratorAdaptorTraits<Iterator, ReturnType, false>
    {
        typedef Iterator iterator_type;
        typedef ReturnType return_type;
        typedef ReturnType value_type;
        typedef None reference;
        typedef None pointer;

        static_assert(
            ! std::is_base_of<None, return_type>::value,
            "None as return type.");

        template <typename Accessor>
        static return_type iterator_value(const Accessor& accessor, const Iterator& iterator) {
            return accessor.value(iterator);
        }

        template <typename Accessor>
        static pointer iterator_pointer(const Accessor& accessor, const Iterator& iterator) {
            return &none();
        }
    };

    // Reference
    // =========

    template <typename Iterator, typename ReturnType>
    struct IteratorAdaptorTraits<Iterator, ReturnType, true>
    {
        typedef Iterator iterator_type;
        typedef ReturnType return_type;
        typedef typename std::remove_reference<ReturnType>::type value_type;
        typedef ReturnType reference;
        typedef value_type* pointer;

        static_assert(
            ! std::is_base_of<None, return_type>::value,
            "None as return type.");

        template <typename Accessor>
        static return_type iterator_value(const Accessor& accessor, const Iterator& iterator) {
            return accessor.value(iterator);
        }

        template <typename Accessor>
        static pointer iterator_pointer(const Accessor& accessor, const Iterator& iterator) {
            return &accessor.value(iterator);
        }
    };
} // namespace Detail


// RandomAccessIteratorAdaptor
// ============================================================================

/// An adaptor around a random access iterator.
/// \ATTENTION The adaptor will not fulfill the standard iterator requierments,
///            if the accessor does not support references: In that case, the 
///            reference and pointer type are None.
template <typename Iterator, typename Accessor>
class RandomAccessIteratorAdaptor
{
    // Types
    // =====

    private:
    static_assert(
        ! std::is_base_of<None, Accessor>::value,
        "None as accessor.");

    static_assert(
        ! std::is_base_of<None, typename Accessor::return_type>::value,
        "None as return type.");

    typedef typename Detail::IteratorAdaptorTraits<
        Iterator,
        typename Accessor::return_type,
        std::is_reference<typename Accessor::return_type>::value
    > Traits;

    public:
    typedef typename Traits::iterator_type iterator_type;
    typedef Accessor accessor_type;
    typedef typename std::random_access_iterator_tag iterator_category;
    typedef typename std::ptrdiff_t difference_type;
    typedef typename Traits::return_type return_type;
    typedef typename Traits::value_type value_type;
    typedef typename Traits::reference reference;
    typedef typename Traits::pointer pointer;

    typedef typename accessor_type::base_type accessor_base_type;
    typedef RandomAccessIteratorAdaptor<iterator_type, accessor_base_type> base_type;

    // Tag
    // ===

    public:
    struct RandomAccessIteratorAdaptorTag {};

    // Construction
    // ============

    public:
    explicit RandomAccessIteratorAdaptor(
        iterator_type iterator, const accessor_type& accessor = accessor_type())
    :   m_iterator(iterator), m_accessor(accessor)
    {}

    template <typename IteratorType, typename AccessorType>
    explicit RandomAccessIteratorAdaptor(const RandomAccessIteratorAdaptor<
        IteratorType, AccessorType>& other)
    :   m_iterator(other.iterator()), m_accessor(other.accessor())
    {}

    // Element Access
    // ==============

    public:
    /// The underlaying accessor.
    const accessor_type& accessor() const { return m_accessor; }
    /// The underlaying iterator.
    const iterator_type& iterator() const { return m_iterator; }
    /// The underlaying iterator.
    iterator_type& iterator() { return m_iterator; }
    /// The underlaying iterator.
    operator iterator_type () const { return m_iterator; }

    /// The base adaptor.
    base_type base() const {
        return base_type(m_iterator, m_accessor.base());
    }

    // Iterator
    // ========

    public:
    return_type operator * () const {
        return Traits::iterator_value(m_accessor, m_iterator);
    }
    pointer operator -> () const {
        return Traits::iterator_pointer(m_accessor, m_iterator);
    }

    RandomAccessIteratorAdaptor increment() const {
        return ++RandomAccessIteratorAdaptor(*this);
    }
    RandomAccessIteratorAdaptor increment_n(difference_type n) const {
        RandomAccessIteratorAdaptor tmp(*this);
        tmp.m_iterator += n;
        return tmp;
    }

    RandomAccessIteratorAdaptor decrement() const {
        return --RandomAccessIteratorAdaptor(*this);
    }
    RandomAccessIteratorAdaptor decrement_n(difference_type n) const {
        RandomAccessIteratorAdaptor tmp(*this);
        tmp.m_iterator -= n;
        return tmp;
    }

    RandomAccessIteratorAdaptor& operator ++ () {
        ++m_iterator;
        return *this;
    }
    RandomAccessIteratorAdaptor operator ++ (int) {
        RandomAccessIteratorAdaptor tmp(*this);
        ++m_iterator;
        return tmp;

    }
    RandomAccessIteratorAdaptor& operator += (difference_type n) {
        m_iterator += n;
        return *this;
    }

    RandomAccessIteratorAdaptor& operator -- () {
        --m_iterator;
        return *this;
    }
    RandomAccessIteratorAdaptor operator -- (int) {
        RandomAccessIteratorAdaptor tmp(*this);
        --m_iterator;
        return tmp;
    }

    RandomAccessIteratorAdaptor& operator -= (difference_type n) {
        m_iterator -= n;
        return *this;
    }


    bool equal(const RandomAccessIteratorAdaptor& other) const {
        return this->m_iterator == other.m_iterator;
    }
    bool less(const RandomAccessIteratorAdaptor& other) const {
        return this->m_iterator < other.m_iterator;
    }
    bool less_equal(const RandomAccessIteratorAdaptor& other) const {
        return this->m_iterator <= other.m_iterator;
    }
    bool greater(const RandomAccessIteratorAdaptor& other) const {
        return this->m_iterator > other.m_iterator;
    }
    bool greater_equal(const RandomAccessIteratorAdaptor& other) const {
        return this->m_iterator >= other.m_iterator;
    }

    private:
    iterator_type m_iterator;
    accessor_type m_accessor;
};


template <typename Iterator, typename Accessor>
inline RandomAccessIteratorAdaptor<Iterator, Accessor> operator + (
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& i,
    typename RandomAccessIteratorAdaptor<Iterator, Accessor>::difference_type n) {
    return i.increment_n(n);
}

template <typename Iterator, typename Accessor>
inline RandomAccessIteratorAdaptor<Iterator, Accessor> operator - (
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& i,
    typename RandomAccessIteratorAdaptor<Iterator, Accessor>::difference_type n) {
    return i.decrement_n(n);
}

template <typename Iterator, typename Accessor>
inline typename RandomAccessIteratorAdaptor<Iterator, Accessor>::difference_type
operator - (
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& a,
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& b) {
    return a.iterator() - b.iterator();
}

template <typename Iterator, typename Accessor>
inline bool operator == (
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& a,
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& b) {
    return a.equal(b);
}

template <typename Iterator, typename Accessor>
inline bool operator != (
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& a,
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& b) {
    return ! a.equal(b);
}

template <typename Iterator, typename Accessor>
inline bool operator <  (
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& a,
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& b) {
    return a.less(b);
}

template <typename Iterator, typename Accessor>
inline bool operator <= (
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& a,
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& b) {
    return a.less_equal(b);
}

template <typename Iterator, typename Accessor>
inline bool operator >  (
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& a,
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& b) {
    return a.greater(b);
}

template <typename Iterator, typename Accessor>
inline bool operator >= (
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& a,
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& b) {
    return a.greater_equal(b);
}


// ElementPair
// ============================================================================

/// A pair of references which can mutate to a pair of values.
/// \NOTE If the key is one or two the pair is less comparable
///       regarding the first or second element. 
template <typename First, typename Second, unsigned Key = 0>
class ElementPair
{
    // Types
    // =====

    public:
    typedef First first_type;
    typedef Second second_type;

    // Construction
    // ============

    public:
    /// Reference
    /// \POSTCONDITION reference() returns true
    ElementPair(first_type& first, second_type& second)
    :   m_first(&first), m_second(&second)
    {}

    /// Copy construction
    /// \POSTCONDITION reference() returns false
    ElementPair(const ElementPair& other)
    :   m_first(new(m_first_storage) first_type(*other.m_first)),
        m_second(new(&m_second_storage) second_type(*other.m_second))
    {}

    /// Move construction
    /// \POSTCONDITION reference() returns false
    ElementPair(ElementPair&& other)
    :   m_first(new(m_first_storage) first_type(std::move(*other.m_first))),
        m_second(new(m_second_storage) second_type(std::move(*other.m_second)))
    {}

    ~ElementPair() {
        if( ! reference()) {
            reinterpret_cast<first_type*>(m_first_storage)->~first_type();
            reinterpret_cast<second_type*>(m_second_storage)->~second_type();
        }
    }

    // Assignment
    // ==========

    public:
    /// Swap content.
    void swap(ElementPair& other) {
        std::swap(*m_first, *other.m_first);
        std::swap(*m_second, *other.m_second);
    }

    /// Assign content.
    ElementPair& operator = (const ElementPair& other) {
        if(&other != this) {
            *m_first = *other.m_first;
            *m_second = *other.m_second;
        }
        return *this;
    }

    /// Assign content.
    ElementPair& operator = (ElementPair&& other) {
        if(&other != this) {
            *m_first = std::move(*other.m_first);
            *m_second = std::move(*other.m_second);
        }
        return *this;
    }

    // Element Access
    // ==============

    public:
    /// True if the pair holds references to external elements.
    bool reference() {
        return (m_first != reinterpret_cast<first_type*>(m_first_storage));
    }
    const first_type& first() const { return *m_first; }
    first_type& first() { return *m_first; }

    const second_type& second() const { return *m_second; }
    second_type& second() { return *m_second; }

    private:
    first_type* m_first;
    typename std::aligned_storage<
        sizeof(first_type),
        std::alignment_of<first_type>::value>::type
        m_first_storage[1];

    second_type* m_second;
    typename std::aligned_storage<
        sizeof(second_type),
        std::alignment_of<second_type>::value>::type
        m_second_storage[1];
};

// Compare
// =======

template <typename First, typename Second>
inline bool operator < (
    const ElementPair<First, Second, 1>& a,
    const ElementPair<First, Second, 1>& b)
{
    return (a.first() < b.first());
}


template <typename First, typename Second>
inline bool operator < (
    const ElementPair<First, Second, 2>& a,
    const ElementPair<First, Second, 2>& b)
{
    return (a.second() < b.second());
}

// Swap
// ====

namespace std {
    template <typename First, typename Second, unsigned Key>
    inline void swap(
        ElementPair<First, Second, Key>& a,
        ElementPair<First, Second, Key>& b)
    {
        a.swap(b);
    }
}

// SequencePairAccessor
// ============================================================================

template <typename FirstSequence, typename SecondSequence, unsigned Keys = 0>
class SequencePairAccessor
{
    // Types
    // =====

    public:
    typedef FirstSequence first_sequence_type;
    typedef SecondSequence second_sequence_type;
    typedef typename first_sequence_type::size_type size_type;
    typedef typename first_sequence_type::value_type first_type;
    typedef typename second_sequence_type::value_type second_type;
    typedef typename first_sequence_type::iterator iterator;

    typedef None base_type;
    typedef ElementPair<first_type, second_type, Keys> return_type;

    // Construction
    // ============

    public:
    SequencePairAccessor(first_sequence_type& first, second_sequence_type& second)
    :   m_first_sequence(&first), m_second_sequence(&second)
    {}

    // Element Access
    // ==============

    public:
    base_type base() const { return base_type();    }
    return_type value(iterator pos) const {
        return return_type(*pos, (*m_second_sequence)[pos - m_first_sequence->begin()]);
    }

    // Data
    // ====

    private:
    first_sequence_type* m_first_sequence;
    second_sequence_type* m_second_sequence;
};

该测试显示(在我的系统上)性能退化了 const char* 的 1.5 倍和 std::string 的 3.4 倍(与保存 std::pair(s) 的单个向量相比) .

// Test
// ============================================================================

#define SAMPLE_SIZE 1e1
#define VALUE_TYPE const char*

int main() {
    const unsigned samples = SAMPLE_SIZE;

    typedef int key_type;
    typedef VALUE_TYPE value_type;
    typedef std::vector<key_type> key_sequence_type;
    typedef std::vector<value_type> value_sequence_type;

    typedef SequencePairAccessor<key_sequence_type, value_sequence_type, 1> accessor_type;
    typedef RandomAccessIteratorAdaptor<
        key_sequence_type::iterator,
        accessor_type>
        iterator_adaptor_type;

    key_sequence_type keys;
    value_sequence_type values;
    keys.reserve(samples);
    values.reserve(samples);
    const char* words[] = { "Zero", "One", "Two", "Three", "Four", "Five", "Six", "Seven", "Eight", "Nine" };
    for(unsigned i = 0; i < samples; ++i) {
        key_type k = i % 10;
        keys.push_back(k);
        values.push_back(words[k]);
    }

    accessor_type accessor(keys, values);
    std::random_shuffle(
        iterator_adaptor_type(keys.begin(), accessor),
        iterator_adaptor_type(keys.end(), accessor)
    );

    if(samples <= 10) {
        std::cout << "\nRandom:\n"
                  <<   "======\n";
        for(unsigned i = 0; i < keys.size(); ++i)
            std::cout << keys[i] << ": "  << values[i] << '\n';
    }

    typedef std::pair<key_type, value_type> pair_type;
    std::vector<pair_type> ref;
    for(const auto& k: keys) {
        ref.push_back(pair_type(k, words[k]));
    }

    struct Less {
        bool operator () (const pair_type& a, const pair_type& b) const {
            return a.first < b.first;
        }
    };
    auto ref_start = std::chrono::system_clock::now();
    std::sort(ref.begin(), ref.end(), Less());
    auto ref_end = std::chrono::system_clock::now();
    auto ref_elapsed = double((ref_end - ref_start).count())
                     / std::chrono::system_clock::period::den;

    auto start = std::chrono::system_clock::now();
    std::sort(
        iterator_adaptor_type(keys.begin(), accessor),
        iterator_adaptor_type(keys.end(), accessor)
    );
    auto end = std::chrono::system_clock::now();
    auto elapsed = double((end - start).count())
                 / std::chrono::system_clock::period::den;;

    if(samples <= 10) {
        std::cout << "\nSorted:\n"
                  <<   "======\n";
        for(unsigned i = 0; i < keys.size(); ++i)
            std::cout << keys[i] << ": "  << values[i] << '\n';
    }

    std::cout << "\nDuration sorting " << double(samples) << " samples:\n"
              <<   "========\n"
              << " One Vector: " << ref_elapsed << '\n'
              << "Two Vectors: " << elapsed << '\n'
              << "     Factor: " << elapsed/ref_elapsed << '\n'
              << '\n';
}

(请调整 SAMPLE_SIZE 和 VALUE_TYPE)

我的结论是对未排序数据序列的排序视图可能更合适(但这违反了问题的要求)。

【讨论】:

    猜你喜欢
    • 2011-01-19
    • 1970-01-01
    • 2021-07-20
    • 2011-10-09
    • 2020-11-16
    • 1970-01-01
    • 2018-07-23
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多