【Game Engine Architecture 5】
1、Memory Ordering Semantics
These mysterious and vexing problems can only occur on a multicore machine with a multilevel cache.
A cache coherency protocol is a communication mechanism that permits cores to share data between their local L1 caches in this way. Most CPUs use either the MESI or MOESI protocol.
2、The MESI Protocol
• Modified. This cache line has been modified (written to) locally.
• Exclusive. The main RAM memory block corresponding to this cache line exists only in this core’s L1 cache—no other core has a copy of it.
• Shared. The main RAM memory block corresponding to this cache line exists in more than one core’s L1 cache, and all cores have an identical copy of it.
• Invalid. This cache line no longer contains valid data—the next read will need to obtain the line either from another core’s L1 cache, or from main RAM.
The MOESI protocol adds another state named Owned, which allows cores to share modified data without writing it back to main RAM first.
Under the MESI protocol, all cores’ L1 caches are connected via a special bus called the interconnect bus (ICB).
2.1、How MESI Can Go Wrong
As with compiler optimizations and CPU out-of-order execution optimizations, MESI optimizations are carefully crafted so as to be undetectable by a single thread.
和 compiler 优化、CPU OOO优化一样,MESI精心设计以使单线程程序无法察觉。
Under certain circumstances, optimizations in the MESI protocol can cause the new value of g_ready to become visible to other cores within the cache coherency domain before the updated value of g_data becomes visible.
MESI的优化,可能导致某值先写的值,更晚传递到其他Cache,从而导致其他Core的线程提前运行,而读取到错误数据的问题。
3、Memory Fences / memory barriers
To prevent the memory effects of a read or write instruction passing other reads and/or writes.
There are four ways in which one instruction can pass another:
1. A read can pass another read,
2. a read can pass a write,
3. a write can pass another write, or
4. a write can pass a read.
We could imagine a CPU that provides twelve distinct fence instructions—a bidirectional, forward, and reverse variant of each of the four basic fence types listed above.
All fence instructions have two very useful side-effects:
1)They serve as compiler barriers, and
2)they prevent the CPU’s out-of-order logic from reordering instructions across the fence.
4、Acquire and Release Semantics
Memory ordering semantics are really properties of read or write instructions:
• Release semantics. This semantic guarantees that a write to shared memory can never be passed by any other read or write that precedes it in program order.
write 前面的 rw 保证在前面, write 后面的 rw 不管。
• Acquire semantics. This semantic guarantees that a read from shared memory can never be passed by any other read or write that occurs after it in program order.
read 后面的 rw 保证在后面,read 前面的 rw不管。
• Full fence semantics.
5、When to Use Acquire and Release Semantics
A write-release is most often used in a producer scenario,performs two consecutive writes (e.g., writing to g_data and then g_ready), and we need to ensure that all other threads will see the two writes in the correct order. We can enforce this ordering by making the second of these two writes a write-release.
A read-acquire is typically used in a consumer scenario—in which a thread performs two consecutive reads in which the second is conditional on the first (e.g., only reading g_data after a read of the flag g_ready comes back true). We enforce this ordering by making sure that the first read is a read-acquire.
int32_t g_data = 0; int32_t g_ready = 0; void ProducerThread() // running on Core 1 { g_data = 42; // make the write to g_ready into a write-release // by placing a release fence *before* it RELEASE_FENCE(); g_ready = 1; } void ConsumerThread() // running on Core 2 { // make the read of g_ready into a read-acquire // by placing an acquire fence *after* it while (!g_ready) PAUSE(); ACQUIRE_FENCE(); // we can now read g_data safely... ASSERT(g_data == 42); }