C++11：将结构数组重新解释为结构成员数组答案

【问题标题】：C++11: reinterpreting array of structs as array of struct's memberC++11：将结构数组重新解释为结构成员数组
【发布时间】：2016-12-04 18:32:39
【问题描述】：

考虑以下类型：

struct S
{
    char v;
};

给定一个const S 的数组，是否有可能以标准一致的方式，将其重新解释为const char 的数组，其元素对应于成员v 的值对于每个原始数组的元素，反之亦然？例如：

const S a1[] = { {'a'}, {'4'}, {'2'}, {'\0'} };
const char* a2 = reinterpret_cast< const char* >(a1);

for (int i = 0; i < 4; ++i)
    std::cout << std::boolalpha << (a1[i].v == a2[i]) << ' ';

上面的代码是可移植的，它会打印true true true true吗？如果没有，还有其他方法可以实现吗？

显然，可以创建一个新数组并使用原始数组的每个元素的成员v 对其进行初始化，但整个想法是避免创建一个新数组。

【问题讨论】：

问题归结为包含char 的struct 是否需要没有特殊对齐。
@SamVarshavchik：如果它的对齐 > 1，那么它必须有填充，因为如果它的大小为 1，那么在数组中第二项将是未对齐的。所以填充是真正的问题。解决填充问题后，对齐不再重要。

标签： c++ c++11 reinterpret-cast

【解决方案1】：

简单地说，不 - struct 可能有填充。这完全打破了对数组的任何重新解释。

【讨论】：

如果从其中显式删除填充怎么办？
@набиячлэвэли ：这需要编译器特定的扩展；则不适用一般规则。
@набиячлэвэли 你仍然会违反别名规则。当编译器进行更智能的优化时，许多打破别名规则并可以正常工作多年的真实代码开始出现问题。假设它会起作用是一个糟糕的主意。

【解决方案2】：

形式上，struct 可能有填充，使其大小大于 1。

也就是说，正式地你不能reinterpret_cast 并且拥有完全可移植的代码，除了¹只有一个项目的数组。

但是在实践中，几年前有人问现在是否有任何编译器默认情况下会为struct T{ char x; }; 提供sizeof(T) > 1。我还没有看到任何例子。因此，在实践中，只需 static_assert 大小为 1，完全不用担心 static_assert 在某些系统上会失败。

即，

S const a1[] = { {'a'}, {'4'}, {'2'}, {'\0'} };
static_assert( sizeof( S ) == 1, "!" );

char const* const a2 = reinterpret_cast<char const*>( a1 );

for( int i = 0; i < 4; ++i )
{
    assert( a1[i].v == a2[i] );
}

由于可能以索引具有未定义行为的方式解释 C++14 及更高版本的标准，基于对“数组”的特殊解释，即指代某个原始数组，一个可能会以更笨拙和冗长但保证有效的方式编写此代码：

// I do not recommend this, but it's one way to avoid problems with some compiler that's
// based on an unreasonable, impractical interpretation of the C++14 standard.
#include <assert.h>
#include <new>

auto main() -> int
{
    struct S
    {
        char v;
    };

    int const compiler_specific_overhead    = 0;    // Redefine per compiler.
    // With value 0 for the overhead the internal workings here, what happens
    // in the machine code, is the same as /without/ this verbose work-around
    // for one impractical interpretation of the standard.
    int const n = 4;
    static_assert( sizeof( S ) == 1, "!" );
    char storage[n + compiler_specific_overhead]; 
    S* const a1 = ::new( storage ) S[n];
    assert( (void*)a1 == storage + compiler_specific_overhead );

    for( int i = 0; i < n; ++i ) { a1[i].v = "a42"[i]; }    //  Whatever

    // Here a2 points to items of the original `char` array, hence no indexing
    // UB even with impractical interpretation of the C++14 standard.
    // Note that the indexing-UB-free code from this point, is exactly the same
    // source code as the first code example that some claim has indexing UB.
    char const* const a2 = reinterpret_cast<char const*>( a1 );

    for( int i = 0; i < n; ++i )
    {
        assert( a1[i].v == a2[i] );
    }
}

^{注意事项：

¹ 该标准保证struct 的开头没有填充。}

【讨论】：

我认为a2[i] 中隐含的指针算法会导致未定义的行为。
@T.C.我不知道索引，但如前所述，已经正式的 UB。一个人不能让它更UB，就像一个人不能只是有点怀孕一样。 :) 然而，在实践中唯一担心的是某个编译器可能会使用它来破坏-“优化”代码......
“不可移植”并不意味着 UB。 @T.C.正确的是 UB 来自指针运算而不是 reinterpret_cast 的使用，请参阅 [expr.add] p6。
这似乎是一种生成代码的秘诀，由于基于别名假设的优化而巧妙地失败了。所以static_assert 是不够的。您必须验证编译器、每个编译器以及选项或版本的每次更改都不会破坏代码。
@DavidSchwartz：你是对的，但你的说法具有误导性，因为（1）编译器编译普通代码的能力是通过用一个人使用的编译器测试代码来验证的，无论如何都要这样做，即有没有额外的事情可做，并且（2）如果编译器（比如 g++）无法合理地编译它，那么这是放弃该编译器的一个很好的理由。编译器是我们的工具，而不是我们的主人。使用能提高工作效率的工具，放弃那些让工作更难的工具。

【解决方案3】：

a2[i]中的指针算法未定义，见C++14 5.7 [expr.add] p7:

对于加法或减法，如果表达式P 或Q 的类型为“pointer to cv T”，其中T 和数组元素类型不相似（4.5 )，行为未定义。 [注意：特别是，当数组包含派生类类型的对象时，指向基类的指针不能用于指针算术。 ——尾注]

由于这条规则，即使没有填充并且大小匹配，基于类型的别名分析允许编译器假设a1[i] 和a2[i] 不重叠（因为指针运算仅在@ 987654329@ 确实是 char 的数组，而不仅仅是具有相同大小和对齐方式的数组，如果它真的是 char 的数组，它必须是与 S 数组分开的对象。

【讨论】：

这句话断章取义，在 C++11 中显然没有。
我在 C++14 中找到了它。它至少可以用两种方式来解释，一种是有目的的合理的，另一种是不合理的，没有目的的。值得注意的是，C++14 是标准的第一个版本，其中包含了质量非常低的部分：这个不明确的措辞就是一个例子。
措辞来自DR 1504，它具有 DR 状态，因此解决了 C++11 中的一个缺陷
Re“基于类型的别名分析允许编译器假设 a1[i] 和 a2[i] 不重叠”，不，只有在您将“array”不合理的解释为原始时才会如此数组，而不是手头的数组。但是，这种解释可以（可能）被不正当的编译器采用。
放置新表达式开始对象的生命周期。 reinterpret_cast 没有。

【解决方案4】：

如果源数据是恒定的，我想我会倾向于使用编译时转换：

#include <iostream>
#include <array>

struct S
{
    char v;
};

namespace detail {
    template<std::size_t...Is>
    constexpr auto to_cstring(const S* p, std::index_sequence<Is...>)
    {
        return std::array<char, sizeof...(Is)> {
            p[Is].v...
        };
    }
}

template<std::size_t N>
constexpr auto to_cstring(const S (&arr)[N])
{
    return detail::to_cstring(arr, std::make_index_sequence<N>());
}

int main()
{
    const /*expr if you wish*/ S a1[] = { {'a'}, {'4'}, {'2'}, {'\0'} };

    const /*expr if you wish*/ auto a2 = to_cstring(a1);


    for (int i = 0; i < 4; ++i)
        std::cout << std::boolalpha << (a1[i].v == a2[i]) << ' ';
}

输出：

true true true true

即使数据不是 constexpr，gcc 和 clang 也非常擅长像这样不断折叠复杂序列。

【讨论】：

嗯，正在复制。数百万个或更多的数组呢？
@Cheersandhth.-Alf 一如既往，这取决于...我们做了多少次？编译器可以省略副本吗？它是恒定折叠的候选者吗？等等。在一个紧密的循环中，使用可变数据，可能不是。但在许多情况下，即使编写了副本，在优化通过之后，它实际上也不会发生。今天的编译器相当不错。