http://bugs.winehq.org/show_bug.cgi?id=59027
Bug ID: 59027 Summary: "Rebased" NTSync is broken: massive performance regression Product: Wine Version: 10.19 Hardware: x86-64 OS: Linux Status: UNCONFIRMED Severity: major Priority: P2 Component: -unknown Assignee: wine-bugs@list.winehq.org Reporter: virtuousfox@gmail.com Distribution: ---
Last time I had good performance in wine it was wine-staging-1.10 with the original NTSync patch (MR 7226). But after it was "rebased" in smaller chunks and officially adopted, it as if performance is even worse than before it existed (possibly due to losing esync too). Is it still not fully merged or something, being broken in half-state?
This is evident in the biggest offender I've found: https://bugs.winehq.org/show_bug.cgi?id=54693 - Freedom Planet 2 (and its demo) is back to 20 fps (it should have no problem to get 200 even with CPU-only rendering, like vulkan:llvmpipe). I also see that in Dishonored 2 fps is often stuck at also around 20-40 (previously: 50-75) while GPU is underloaded at 50-75% and 12-core CPU - <10%. At least it's not eating up 70% of all CPU cores, like it did before (or was it only esync's thing?).
But /dev/ntsync is with 666 permissions and I don't see any obvious errors and warnings. Perhaps, it's silently ignored at all or there is other massive regression.
Tested recently with dxvk+app-emulation/vkd3d-proton using DXVK_HUD="devinfo,fps,frametimes,submissions,drawcalls,pipelines,memory,gpuload,api,scale=1.2" but wine's native rendering with mesa's overlay should show the same, last time I've checked. Mesa overlay can be used via: VK_INSTANCE_LAYERS="VK_LAYER_MESA_overlay" VK_LOADER_LAYERS_ENABLE+=",VK_LAYER_MESA_overlay" VK_LAYER_MESA_OVERLAY_CONFIG="fps_sampling_period=80,width=480,position=top-left,submit,draw,pipeline_graphics,vert_invocations,geom_invocations,clip_invocations,frag_invocations,tess_eval_invocations,compute_invocations"
If everything work well, either your fps will be capped at maximum or you should see either CPU/GPU compute load or RAM/VRAM usage at near-100%, being a bottleneck. Otherwise, system is underutilized due to bad timing of something. It this timing is particularly bad.
http://bugs.winehq.org/show_bug.cgi?id=59027
--- Comment #1 from FoX virtuousfox@gmail.com --- After trying to figure this out for months I've just stumbled on a massive breakthrough: it appears that all sync methods in both wine and proton are severely crippled by threading - the more cores they get, the worse they perform but they always try to get all cores.
In place where I get 24-26 fps with NTsync on current wine-staging, I've tried: 1) WINE_CPU_TOPOLOGY=2 wine-proton FP2.exe 2) taskset -c 2-3 wine FP2.exe 3) WINE_CPU_TOPOLOGY=4 wine-proton FP2.exe 4) taskset -c 2-5 wine FP2.exe
The results are astonishing: 1) 110-120 fps; 2) 55-65 fps; 3) 50-60 fps; 4) 24-26 fps.
Meaning that 2 threads (1 core) was the sweet-spot, despite that single core being maxed out on load. I have 12 cores, so you can imagine how bad it's by default. Ironically, proton aced the test in the end but it has started with the worst results by default: 9 fps with default sync and <5 fps for esync & fsync.
However, limiting all wine processes and apps themselves is a bad workaround in general. At least, there should be a way to limit only sync processes. Even pinning everything of entire sync unto a single thread by default does not seem like a bad idea.
http://bugs.winehq.org/show_bug.cgi?id=59027
Zeb Figura z.figura12@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |z.figura12@gmail.com
--- Comment #2 from Zeb Figura z.figura12@gmail.com --- (In reply to FoX from comment #1)
After trying to figure this out for months I've just stumbled on a massive breakthrough: it appears that all sync methods in both wine and proton are severely crippled by threading - the more cores they get, the worse they perform but they always try to get all cores.
In place where I get 24-26 fps with NTsync on current wine-staging, I've tried:
- WINE_CPU_TOPOLOGY=2 wine-proton FP2.exe
- taskset -c 2-3 wine FP2.exe
- WINE_CPU_TOPOLOGY=4 wine-proton FP2.exe
- taskset -c 2-5 wine FP2.exe
This doesn't make any sense. Sync methods don't, by themselves, "try to get all cores". Applications might, but that shouldn't make ntsync worse.
I also can't reproduce these results. I have a fairly high-powered computer, and with ntsync I reach even the highest FPS limit available (288 FPS). But without ntsync, performance gets worse, and if I limit it to 2 cores with taskset, performance gets worse still. That's more or less what I'd expect.
Can you please test with unmodified upstream non-staging wine, in a fresh prefix, without any external components including dxvk?