This MR fixes 16-bit PFD_DRAW_TO_BITMAP rendering, as tested with SimGolf and Civilization 3. The fix isn't super performant since it involves a lot of back-and-forth between CPU and GPU, which is mostly noticeable when moving staff around in SimGolf — it slows to around two or three fps —, but on the plus side it does work, which is a straight upgrade over the current version of Wine.
Couple of notes:
* The third commit touches several different subsystems because I had to add a constant that's visible from both opengl32 and win32u. I wasn't sure how to write this down in the commit message.
* There's a bug in LLVMpipe where a 32-bit pbuffer will cause it to crash. I opened an issue for it here: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13890. I needed a 32-bit pbuffer to make this work, so I added a special check that effectively disables 16-bit bitmap rendering for LLVMpipe.
* I also had to use a workaround to make blitting & OpenGL rendering work together. I wrote about it pretty extensively below.
## Combining blitting & OpenGL rendering
Something that's difficult about bitmap rendering is that it allows programs to combine OpenGL drawing with direct blitting onto bitmap memory. That's difficult to imitate with the pbuffer-based implementation because there's no way to replicate memory edits on the bitmap to the GPU-based pbuffer (since there's no API calls involved in the memory edit). This had to work to make the games actually work, so I used this workaround: Each time after overwriting the bitmap from the pbuffer, we clear the pbuffer with transparent. Also instead of "overwriting" the bitmap, we instead blend the pbuffer's pixels onto the bitmap. That way we're effectively overlaying the most recent OpenGL drawings on top of the bitmap.
I'm pretty confident that that approach would also help with 32-bit and 24-bit bitmap rendering, but I don't know of any programs that use 32-bit or 24-bit bitmap rendering, so I wasn't able to check. I've just left 32-bit and 24-bit bitmaps as they were.
While the workaround works really well for the games, I did have to edit another test because it used glReadPixels on 16-bit bitmaps. glReadPixels will grab pixels from the transparent-background pbuffer instead of the bitmap, which gives the wrong result. I'm not under the impression that any consumer programs use bitmap rendering together with glReadPixels (it doesn't make as much sense when you could just directly read the pixels from the bitmap), but it's possible to add a wrap_glReadPixels that grabs pixels from the bitmap in case of a 16-bit memory DC.
It's not important to read but I wanted to write it down for the record: I also tried another workaround to make combining blitting & OpenGL work. The thought was to overwrite the pbuffer with the bitmap just before OpenGL draws to the pbuffer, to keep them in sync. It probably would have worked out if it was possible to add something to the front of the OpenGL command buffer in a glFinish/glFlush call — then I could have inserted glDrawPixels in the front, before all of the program's rendering operations —, but it's only possible to add things to the end of the command buffer unfortunately. I tried adding a wrapper to glBegin and glClear, which checks if they were the first render operation after a glFlush or glFinish. Unfortunately that didn't work at all. I'm guessing that Civ 3 and SimGolf still do direct bitmap edits after the first call to glBegin, so by the time the program gets to glFinish/glFlush the pixels that were uploaded in the glDrawPixels command are already stale
(although I didn't really check). Anyhow, that approach was pretty fruitless.
--
v2: win32u: Add support for 16-bit bitmaps to set_dc_pixel_format and flush_memory_dc.
opengl32: Return a fake pixel format when requesting 16-bit bitmap rendering.
opengl32/tests: Extend 16-bit bitmap rendering test.
https://gitlab.winehq.org/wine/wine/-/merge_requests/8969
A much cleaner result (written by hand) that we can reuse where needed later (_i.e._ `GetNumaProcessorNode` which also crashes with [a similar MAV](https://gist.github.com/wasertech/f894ce8d6250e72a01a861c0e4eb6064) on multi-node systems).
Took me a while to understand where I should put everything, not even sure I got it right. Let me know.
I thought I was going to need `FileNumaNodeInformation` but turns out it's not really needed. I can remove it if you want.
This is what I see when I try to get a node count read:
```log
0024:fixme:ntdll:init_numa_info node affinity; using node 0.
GetNumaHighestNodeNumber: 1
```
I only see this fixme if I try to access `FILE_NUMA_NODE_INFORMATION`. I would really prefer such a behavior for our compatibility layer on any multi-node system.
A big thanks to @besentv and @zfigura for their invaluable feedback on this.
--
v4: ntdll: only show fixme about affinity on multi-node systems
https://gitlab.winehq.org/wine/wine/-/merge_requests/8995
A much cleaner result (written by hand) that we can reuse where needed later (_i.e._ `GetNumaProcessorNode` which also crashes with [a similar MAV](https://gist.github.com/wasertech/f894ce8d6250e72a01a861c0e4eb6064) on multi-node systems).
Took me a while to understand where I should put everything, not even sure I got it right. Let me know.
I thought I was going to need `FileNumaNodeInformation` but turns out it's not really needed. I can remove it if you want.
This is what I see when I try to get a node count read:
```log
0024:fixme:ntdll:init_numa_info node affinity; using node 0.
GetNumaHighestNodeNumber: 1
```
I only see this fixme if I try to access `FILE_NUMA_NODE_INFORMATION`. I would really prefer such a behavior for our compatibility layer on any multi-node system.
A big thanks to @besentv and @zfigura for their invaluable feedback on this.
--
v3: kernelbase: update GetNumaHighestNodeNumber to use SystemNumaProcessorMap
ntdll: switch case from SystemNumaProximityNodeInformation to SystemNumaProcessorMap and fixed overwritten return values
ntdll: add fix me for node affinity
kernelbase: refactor GetNumaHighestNodeNumber to use SystemNumaProximityNodeInformation
ntdll: initialize information file management for NUMA nodes
kernelbase: implement test for `GetNumaHighestNodeNumber`
kernelbase: implement `GetNumaHighestNodeNumber` to report accurate node count
https://gitlab.winehq.org/wine/wine/-/merge_requests/8995
I'm afraid the macOS workarounds are still needed for 10.15 Catalina and earlier (Wine still officially supports 10.12 and later, and CrossOver supports Catalina). I think it can be slightly simplified though, in my tests the "fallback 2" case is never needed.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/8983#note_116304
A much cleaner result (written by hand) that we can reuse where needed later (_i.e._ `GetNumaProcessorNode` which also crashes with [a similar MAV](https://gist.github.com/wasertech/f894ce8d6250e72a01a861c0e4eb6064) on multi-node systems).
Took me a while to understand where I should put everything, not even sure I got it right. Let me know.
I thought I was going to need `FileNumaNodeInformation` but turns out it's not really needed. I can remove it if you want.
This is what I see when I try to get a node count read:
```log
0024:fixme:ntdll:init_numa_info node affinity; using node 0.
GetNumaHighestNodeNumber: 1
```
I only see this fixme if I try to access `FILE_NUMA_NODE_INFORMATION`. I would really prefer such a behavior for our compatibility layer on any multi-node system.
A big thanks to @besentv and @zfigura for their invaluable feedback on this.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/8995