This merge request implements several NUMA functions previously stubbed in kernel32 and kernelbase, adds a basic NUMA node discovery/topology layer, and enriches the associated tests. It also improves the traceability of SetThreadGroupAffinity.
## Context / Motivation
Some Windows applications (game engines, middleware, runtimes) query the NUMA API to adapt memory allocation or thread distribution. The lack of an implementation returned errors (ERROR_CALL_NOT_IMPLEMENTED) or unhelpful values, which could degrade the internal heuristics of these programs. This first implementation provides:
- A logical topology derived from GetLogicalProcessorInformation.
- A reasonable approximation of available memory per node.
- Consistent processor masks for the present nodes.
It prepares for future optimizations (targeted memory allocation, better scheduling strategies) without modifying the existing behavior of generic allocations.
## Main Changes
- `kernel32/process.c`:
- Implementation of GetNumaNodeProcessorMask, GetNumaAvailableMemoryNode / Ex, GetNumaProcessorNode / Ex, GetNumaProximityNode.
- Parameter validation and consistent error propagation (ERROR_INVALID_PARAMETER).
- `kernelbase/memory.c`:
- New NUMA infrastructure (topology cache, lazy initialization, dedicated critical lock).
- Topology reading via GetLogicalProcessorInformation.
- Runtime options via environment variables:
- WINE_NUMA_FORCE_SINGLE: Force a single logical node.
- WINE_NUMA_CONTIG: Remap masks to produce contiguous blocks.
- Implementations of GetNumaHighestNodeNumber, GetNumaNodeProcessorMaskEx, GetNumaProximityNodeEx.
- Robust fallback: if no NUMA info → single node.
- `kernelbase/thread.c`:
- Added detailed traces in SetThreadGroupAffinity (removed the redundant DECLSPEC_HOTPATCH here).
- Tests (`dlls/kernel32/tests/process.c`):
- Added a new test, test_NumaBasic, covering:
- GetNumaHighestNodeNumber
- GetNumaNodeProcessorMaskEx (nodes 0 and 1)
- GetNumaProximityNodeEx
- Tolerant behavior: accepts `ERROR_INVALID_FUNCTION` / `ERROR_INVALID_PARAMETER` depending on the platform.
- Added the `WINE_DEFAULT_DEBUG_CHANNEL(numa)` debug channel for the subsystem.
## Assumptions / Limitations
- Support for a single processor group (Group = 0) for now.
- Memory approximation: equal division of available physical memory (improvable later with internal counters per node).
- Proximity = node (simplistic direct mapping).
- No impact yet on VirtualAlloc / Heap allocation by node.
## Security / Concurrency
- Initialization protected by dedicated critical section (numa_cs).
- Thread-safe lazy read.
- Table bounded to 64 nodes (historical Windows limit).
## Compatibility Impact
- Improves compatibility with software probing the NUMA API.
- Low risk of regression: previously failed paths now return TRUE with consistent data.
- In case of topology collection failure → single-node fallback (conservative behavior).
## Validation / Tests
- New test_NumaBasic added and integrated into the process suite.
- Traces (numa channel) allow for detection diagnostics.
- Invalid parameters tested (NULL, nodes out of range).
- Works in environments without real NUMA via fallback.
## Environment Variables (quick documentation)
- WINE_NUMA_FORCE_SINGLE=1: Forces a single node (mask covering all CPUs).
- WINE_NUMA_CONTIG=1: Reallocates compact bit blocks per node (useful if the topology returns sparse masks).
## Potential Next Steps (not included)
- Implement true memory tracking per node (via allocation hooks).
- Multi-group support (PROCESSOR_GROUP_INFO).
- Improved VirtualAllocExNuma / First-touch implementation.
- More accurate proximity-to-node mapping on complex NUMA platforms. - Dedicated tests for environment variables.
## Potential Risks / Regressions
- Applications relying on the absence of an API may slightly change their strategy (low).
- Masks remapped with WINE_NUMA_CONTIG could surprise a profiling tool (opt-in option).
- Memory approximation too coarse for very fine-grained heuristics (no functional regression expected).
## Request for Review
- Verify logging conventions and TRACE_(numa) usage.
- Verify the relevance of removing DECLSPEC_HOTPATCH on SetThreadGroupAffinity (alignment with local conventions).
- Opinion on error granularity (ERROR_INVALID_PARAMETER vs. ERROR_INVALID_FUNCTION) for more accurate mimicry.
Once the kernel can handle those functions directly (in a NUMA module i.e.) we could use this implementation as a fallback when the kernel doesn't support NUMA natively (when the module cannot be loaded).
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/8970
Giving this one final shot. If this is not deemed acceptable, I will abandon the effort and just move on to other things, no hard feelings.
This is IMO a cleaner, more conservative/less invasive change.
What is fixed:
- Bug #56381, "TYPE c:\windows\winhelp.exe >foo", i.e. binary mode operation. I would probably consider this the main reason for this change. I'm trying to get the compiler mentioned in the bug report working.
- Ctrl-Z termination of TYPE output to the console.
- "TYPE con >foo", with Ctrl-Z handling, functionally equivalent to "COPY con foo".
Wine-Bug: https://bugs.winehq.org/show_bug.cgi?id=56381
--
v5: cmd: Fix TYPE behavior (now uses WCMD_copy_loop).
cmd: Refactor WCMD_copy_loop out of WCMD_ManualCopy, and stop copy loop at EOF for /a mode.
cmd/tests: Add test to check for TYPE truncation in binary mode.
https://gitlab.winehq.org/wine/wine/-/merge_requests/8920
This MR fixes 16-bit PFD_DRAW_TO_BITMAP rendering, as tested with SimGolf and Civilization 3. The fix isn't super performant since it involves a lot of back-and-forth between CPU and GPU, which is mostly noticeable when moving staff around in SimGolf — it slows to around two or three fps —, but on the plus side it does work, which is a straight upgrade over the current version of Wine.
Couple of notes:
* The third commit touches several different subsystems because I had to add a constant that's visible from both opengl32 and win32u. I wasn't sure how to write this down in the commit message.
* There's a bug in LLVMpipe where a 32-bit pbuffer will cause it to crash. I opened an issue for it here: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13890. I needed a 32-bit pbuffer to make this work, so I added a special check that effectively disables 16-bit bitmap rendering for LLVMpipe.
* I also had to use a workaround to make blitting & OpenGL rendering work together. I wrote about it pretty extensively below.
## Combining blitting & OpenGL rendering
Something that's difficult about bitmap rendering is that it allows programs to combine OpenGL drawing with direct blitting onto bitmap memory. That's difficult to imitate with the pbuffer-based implementation because there's no way to replicate memory edits on the bitmap to the GPU-based pbuffer (since there's no API calls involved in the memory edit). This had to work to make the games actually work, so I used this workaround: Each time after overwriting the bitmap from the pbuffer, we clear the pbuffer with transparent. Also instead of "overwriting" the bitmap, we instead blend the pbuffer's pixels onto the bitmap. That way we're effectively overlaying the most recent OpenGL drawings on top of the bitmap.
I'm pretty confident that that approach would also help with 32-bit and 24-bit bitmap rendering, but I don't know of any programs that use 32-bit or 24-bit bitmap rendering, so I wasn't able to check. I've just left 32-bit and 24-bit bitmaps as they were.
While the workaround works really well for the games, I did have to edit another test because it used glReadPixels on 16-bit bitmaps. glReadPixels will grab pixels from the transparent-background pbuffer instead of the bitmap, which gives the wrong result. I'm not under the impression that any consumer programs use bitmap rendering together with glReadPixels (it doesn't make as much sense when you could just directly read the pixels from the bitmap), but it's possible to add a wrap_glReadPixels that grabs pixels from the bitmap in case of a 16-bit memory DC.
It's not important to read but I wanted to write it down for the record: I also tried another workaround to make combining blitting & OpenGL work. The thought was to overwrite the pbuffer with the bitmap just before OpenGL draws to the pbuffer, to keep them in sync. It probably would have worked out if it was possible to add something to the front of the OpenGL command buffer in a glFinish/glFlush call — then I could have inserted glDrawPixels in the front, before all of the program's rendering operations —, but it's only possible to add things to the end of the command buffer unfortunately. I tried adding a wrapper to glBegin and glClear, which checks if they were the first render operation after a glFlush or glFinish. Unfortunately that didn't work at all. I'm guessing that Civ 3 and SimGolf still do direct bitmap edits after the first call to glBegin, so by the time the program gets to glFinish/glFlush the pixels that were uploaded in the glDrawPixels command are already stale
(although I didn't really check). Anyhow, that approach was pretty fruitless.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/8969
--
v10: maintainers: Add a section for Windows.Devices.Enumeration.
windows.devices.enumeration: Support parsing AQS filters in IDeviceInformationStatics::FindAllAsyncAqsFilter.
https://gitlab.winehq.org/wine/wine/-/merge_requests/8890
Add support for D3DFMT_CxV8U8, which is required for the game Biorage.
Also add support for loading DDS files with header flag values of `DDS_PF_FOURCC` alongside other flag values.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/8966