On 5/30/22 21:17, Jin-oh Kang wrote:
The point of a memory barrier is that if you have two pairs of memory operations (read or write) which must be perceived as happening "in order" across threads, you place a memory barrier in the middle of each pair. That is, if thread 1 executes ops A and B, and thread 2 executes ops C and D, you need a barrier between A and B, but also a barrier between C and D.
The following image (stolen and adapted from the Linux kernel documentation) may be illustrative:
CPU 1 CPU 2 =============== =============== WRITE_ONCE(a, 1); x = READ_ONCE(b); <write barrier> <--------> <read barrier> WRITE_ONCE(b, 2); y = READ_ONCE(a);
That said, the *_ONCE macros would be quite helpful for Wine too. Ideally, however, we should just use the stdatomic.h primitives whenever possible (old GCC has bugs in them though), shimming in our own implementation on older compilers.
We've been preferring to use win32 atomic definitions instead.
I believe win32 guarantees that all access to suitably aligned types is atomic (not targeting the Alpha has its advantages), so there's no need for READ_ONCE or WRITE_ONCE. On the other hand, ReadNoFence() and WriteNoFence() also exist in recent winnt.h. They lack documentation, of course, and so do ReadAcquire() and WriteRelease()...
You need to do this because the compiler—and, more importantly, the CPU—is free to reorder the instructions on *both* sides. If, as in the above example, you need "x == 2" to imply "y == 1", the write barrier alone is not enough to achieve this, because the CPU can *also* reorder the reads.
The same applies to atomic operations that aren't just RMW. E.g. a store-release (to Status) would be correct here, and more efficient than a barrier, but it has to be paired with a read-acquire (also of Status) in GetOverlappedResult().
The problem is that the Wine codebase doesn't have one (yet). Otherwise an acquire/release barrier pair would be much more efficient than a full barrier (at least on weak-memory-order architectures).
We don't have any way to encode store-release and write-acquire, but they would be easy enough to add that I think it wouldn't delay this patch too much.