This is a step in the direction of continuously polling the queue fd on the server side, removing the need for ntsync to notify wineserver before / after waiting on the queue.
For this my plan is to make win32u also notify wineserver to update the queue access time before waiting (through set_queue_mask), if it's been long enough. Then decide that a queue is hung based on whether it is signaled but access time is more than 5s ago (ie: neither get_message nor set_queue_mask are called regularly even though queue is signaled).
On the server side, the queue fd would stop being polling once it gets ready, and win32u would notify wineserver after all the driver events have been processed, to start polling it again, as a separate request from the wait.
-- v4: win32u: Check the queue access time from the shared memory. server: Use monotonic_time as queue access time base. win32u: Move the message queue access time to shared memory. server: Move hooks_count to the end of the queue_shm_t struct. server: Return early if there's no queue in queue requests.
From: Rémi Bernon rbernon@codeweavers.com
--- server/queue.c | 52 +++++++++++++++++++++++--------------------------- 1 file changed, 24 insertions(+), 28 deletions(-)
diff --git a/server/queue.c b/server/queue.c index b6cff69f38f..b62e557aa18 100644 --- a/server/queue.c +++ b/server/queue.c @@ -3155,24 +3155,22 @@ DECL_HANDLER(set_queue_fd) DECL_HANDLER(set_queue_mask) { struct msg_queue *queue = get_current_queue(); + queue_shm_t *queue_shm;
- if (queue) - { - queue_shm_t *queue_shm = queue->shared; - - SHARED_WRITE_BEGIN( queue_shm, queue_shm_t ) - { - shared->wake_mask = req->wake_mask; - shared->changed_mask = req->changed_mask; - } - SHARED_WRITE_END; - - reply->wake_bits = queue_shm->wake_bits; - reply->changed_bits = queue_shm->changed_bits; + if (!queue) return; + queue_shm = queue->shared;
- if (!get_queue_status( queue )) reset_queue_sync( queue ); - else signal_queue_sync( queue ); + SHARED_WRITE_BEGIN( queue_shm, queue_shm_t ) + { + shared->wake_mask = req->wake_mask; + shared->changed_mask = req->changed_mask; + reply->wake_bits = shared->wake_bits; + reply->changed_bits = shared->changed_bits; } + SHARED_WRITE_END; + + if (!get_queue_status( queue )) reset_queue_sync( queue ); + else signal_queue_sync( queue ); }
@@ -3180,22 +3178,20 @@ DECL_HANDLER(set_queue_mask) DECL_HANDLER(get_queue_status) { struct msg_queue *queue = current->queue; - if (queue) - { - queue_shm_t *queue_shm = queue->shared; - - reply->wake_bits = queue_shm->wake_bits; - reply->changed_bits = queue_shm->changed_bits; + queue_shm_t *queue_shm;
- SHARED_WRITE_BEGIN( queue_shm, queue_shm_t ) - { - shared->changed_bits &= ~req->clear_bits; - } - SHARED_WRITE_END; + if (!queue) return; + queue_shm = queue->shared;
- if (!get_queue_status( queue )) reset_queue_sync( queue ); + SHARED_WRITE_BEGIN( queue_shm, queue_shm_t ) + { + reply->wake_bits = shared->wake_bits; + reply->changed_bits = shared->changed_bits; + shared->changed_bits &= ~req->clear_bits; } - else reply->wake_bits = reply->changed_bits = 0; + SHARED_WRITE_END; + + if (!get_queue_status( queue )) reset_queue_sync( queue ); }
From: Rémi Bernon rbernon@codeweavers.com
Avoid field alignment issues as NB_HOOKS is odd. --- dlls/win32u/hook.c | 4 +--- server/hook.c | 4 ---- server/protocol.def | 4 +++- 3 files changed, 4 insertions(+), 8 deletions(-)
diff --git a/dlls/win32u/hook.c b/dlls/win32u/hook.c index b1a154c6fb7..b0f49e84b65 100644 --- a/dlls/win32u/hook.c +++ b/dlls/win32u/hook.c @@ -31,9 +31,7 @@
WINE_DEFAULT_DEBUG_CHANNEL(hook);
-#define WH_WINEVENT (WH_MAXHOOK+1) - -static const char * const hook_names[WH_WINEVENT - WH_MINHOOK + 1] = +static const char * const hook_names[NB_HOOKS] = { "WH_MSGFILTER", "WH_JOURNALRECORD", diff --git a/server/hook.c b/server/hook.c index e53ba4514bf..fe3c0459651 100644 --- a/server/hook.c +++ b/server/hook.c @@ -58,10 +58,6 @@ struct hook data_size_t module_size; };
-#define WH_WINEVENT (WH_MAXHOOK+1) - -#define NB_HOOKS (WH_WINEVENT-WH_MINHOOK+1) - struct hook_table { struct object obj; /* object header */ diff --git a/server/protocol.def b/server/protocol.def index 0c8be12314a..d5f86070c76 100644 --- a/server/protocol.def +++ b/server/protocol.def @@ -978,6 +978,8 @@ struct obj_locator };
#define MAX_ATOM_LEN 255 +#define WH_WINEVENT (WH_MAXHOOK + 1) +#define NB_HOOKS (WH_WINEVENT - WH_MINHOOK + 1)
struct shared_cursor { @@ -998,11 +1000,11 @@ typedef volatile struct
typedef volatile struct { - int hooks_count[WH_MAX - WH_MIN + 2]; /* active hooks count */ unsigned int wake_mask; /* wakeup mask */ unsigned int wake_bits; /* wakeup bits */ unsigned int changed_mask; /* changed wakeup mask */ unsigned int changed_bits; /* changed wakeup bits */ + int hooks_count[NB_HOOKS]; /* active hooks count */ } queue_shm_t;
typedef volatile struct
From: Rémi Bernon rbernon@codeweavers.com
--- server/protocol.def | 1 + server/queue.c | 11 +++++++---- 2 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/server/protocol.def b/server/protocol.def index d5f86070c76..79d507e193b 100644 --- a/server/protocol.def +++ b/server/protocol.def @@ -1000,6 +1000,7 @@ typedef volatile struct
typedef volatile struct { + timeout_t access_time; /* last time the queue was accessed */ unsigned int wake_mask; /* wakeup mask */ unsigned int wake_bits; /* wakeup bits */ unsigned int changed_mask; /* changed wakeup mask */ diff --git a/server/queue.c b/server/queue.c index b62e557aa18..1942a0d6d34 100644 --- a/server/queue.c +++ b/server/queue.c @@ -133,7 +133,6 @@ struct msg_queue struct timeout_user *timeout; /* timeout for next timer to expire */ struct thread_input *input; /* thread input descriptor */ struct hook_table *hooks; /* hook table */ - timeout_t last_get_msg; /* time of last get message call */ int keystate_lock; /* owns an input keystate lock */ int waiting; /* is thread waiting on queue */ queue_shm_t *shared; /* queue in session shared memory */ @@ -319,7 +318,6 @@ static struct msg_queue *create_msg_queue( struct thread *thread, struct thread_ queue->timeout = NULL; queue->input = (struct thread_input *)grab_object( input ); queue->hooks = NULL; - queue->last_get_msg = current_time; queue->keystate_lock = 0; queue->waiting = 0; list_init( &queue->send_result ); @@ -334,6 +332,7 @@ static struct msg_queue *create_msg_queue( struct thread *thread, struct thread_ SHARED_WRITE_BEGIN( queue->shared, queue_shm_t ) { memset( (void *)shared->hooks_count, 0, sizeof(shared->hooks_count) ); + shared->access_time = current_time; shared->wake_mask = 0; shared->wake_bits = 0; shared->changed_mask = 0; @@ -1284,7 +1283,7 @@ static void cleanup_results( struct msg_queue *queue ) /* check if the thread owning the queue is hung (not checking for messages) */ static int is_queue_hung( struct msg_queue *queue ) { - if (current_time - queue->last_get_msg <= 5 * TICKS_PER_SEC) + if (current_time - queue->shared->access_time <= 5 * TICKS_PER_SEC) return 0; /* less than 5 seconds since last get message -> not hung */ return !queue->waiting; } @@ -3361,7 +3360,11 @@ DECL_HANDLER(get_message) return; }
- queue->last_get_msg = current_time; + SHARED_WRITE_BEGIN( queue_shm, queue_shm_t ) + { + shared->access_time = current_time; + } + SHARED_WRITE_END;
/* first check for sent messages */ if ((ptr = list_head( &queue->msg_list[SEND_MESSAGE] )))
From: Rémi Bernon rbernon@codeweavers.com
--- server/queue.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/server/queue.c b/server/queue.c index 1942a0d6d34..3f06c15d006 100644 --- a/server/queue.c +++ b/server/queue.c @@ -332,7 +332,7 @@ static struct msg_queue *create_msg_queue( struct thread *thread, struct thread_ SHARED_WRITE_BEGIN( queue->shared, queue_shm_t ) { memset( (void *)shared->hooks_count, 0, sizeof(shared->hooks_count) ); - shared->access_time = current_time; + shared->access_time = monotonic_time; shared->wake_mask = 0; shared->wake_bits = 0; shared->changed_mask = 0; @@ -1283,7 +1283,7 @@ static void cleanup_results( struct msg_queue *queue ) /* check if the thread owning the queue is hung (not checking for messages) */ static int is_queue_hung( struct msg_queue *queue ) { - if (current_time - queue->shared->access_time <= 5 * TICKS_PER_SEC) + if (monotonic_time - queue->shared->access_time <= 5 * TICKS_PER_SEC) return 0; /* less than 5 seconds since last get message -> not hung */ return !queue->waiting; } @@ -3362,7 +3362,7 @@ DECL_HANDLER(get_message)
SHARED_WRITE_BEGIN( queue_shm, queue_shm_t ) { - shared->access_time = current_time; + shared->access_time = monotonic_time; } SHARED_WRITE_END;
From: Rémi Bernon rbernon@codeweavers.com
--- dlls/win32u/message.c | 42 ++++++++++++++++++++++++++++++++---- dlls/win32u/ntuser_private.h | 1 - 2 files changed, 38 insertions(+), 5 deletions(-)
diff --git a/dlls/win32u/message.c b/dlls/win32u/message.c index 3a3f79a2ee8..d3ef83785bb 100644 --- a/dlls/win32u/message.c +++ b/dlls/win32u/message.c @@ -27,6 +27,8 @@ #include <assert.h> #include "ntstatus.h" #define WIN32_NO_STATUS +#include "winternl.h" +#include "ddk/wdm.h" #include "win32u_private.h" #include "ntuser_private.h" #include "winnls.h" @@ -40,6 +42,40 @@ WINE_DEFAULT_DEBUG_CHANNEL(msg); WINE_DECLARE_DEBUG_CHANNEL(key); WINE_DECLARE_DEBUG_CHANNEL(relay);
+static const struct _KUSER_SHARED_DATA *user_shared_data = (struct _KUSER_SHARED_DATA *)0x7ffe0000; + +static LONG atomic_load_long( const volatile LONG *ptr ) +{ +#if defined(__i386__) || defined(__x86_64__) + return *ptr; +#else + return __atomic_load_n( ptr, __ATOMIC_SEQ_CST ); +#endif +} + +static ULONG atomic_load_ulong( const volatile ULONG *ptr ) +{ +#if defined(__i386__) || defined(__x86_64__) + return *ptr; +#else + return __atomic_load_n( ptr, __ATOMIC_SEQ_CST ); +#endif +} + +static UINT64 get_tick_count(void) +{ + ULONG high, low; + + do + { + high = atomic_load_long( &user_shared_data->TickCount.High1Time ); + low = atomic_load_ulong( &user_shared_data->TickCount.LowPart ); + } + while (high != atomic_load_long( &user_shared_data->TickCount.High2Time )); + /* note: we ignore TickCountMultiplier */ + return (UINT64)high << 32 | low; +} + #define MAX_WINPROC_RECURSION 64
#define WM_NCMOUSEFIRST WM_NCMOUSEMOVE @@ -2767,7 +2803,7 @@ static BOOL check_queue_bits( UINT wake_mask, UINT changed_mask, UINT signal_bit { *wake_bits = queue_shm->wake_bits; *changed_bits = queue_shm->changed_bits; - skip = TRUE; + skip = get_tick_count() - (UINT64)queue_shm->access_time / 10000 < 3000; /* avoid hung queue */ } }
@@ -2822,8 +2858,7 @@ int peek_message( MSG *msg, const struct peek_message_filter *filter ) thread_info->client_info.msg_source = prev_source; wake_mask = filter->mask & (QS_SENDMESSAGE | QS_SMRESULT);
- if (NtGetTickCount() - thread_info->last_getmsg_time < 3000 && /* avoid hung queue */ - check_queue_bits( wake_mask, filter->mask, wake_mask | signal_bits, filter->mask | clear_bits, + if (check_queue_bits( wake_mask, filter->mask, wake_mask | signal_bits, filter->mask | clear_bits, &wake_bits, &changed_bits )) res = STATUS_PENDING; else SERVER_START_REQ( get_message ) @@ -2837,7 +2872,6 @@ int peek_message( MSG *msg, const struct peek_message_filter *filter ) req->wake_mask = wake_mask; req->changed_mask = filter->mask; wine_server_set_reply( req, buffer, buffer_size ); - thread_info->last_getmsg_time = NtGetTickCount(); if (!(res = wine_server_call( req ))) { size = wine_server_reply_size( reply ); diff --git a/dlls/win32u/ntuser_private.h b/dlls/win32u/ntuser_private.h index 643a01c6fba..2de11465f87 100644 --- a/dlls/win32u/ntuser_private.h +++ b/dlls/win32u/ntuser_private.h @@ -102,7 +102,6 @@ struct user_thread_info { struct ntuser_thread_info client_info; /* Data shared with client */ HANDLE server_queue; /* Handle to server-side queue */ - DWORD last_getmsg_time; /* Get/PeekMessage last request time */ LONGLONG last_driver_time; /* Get/PeekMessage driver event time */ WORD hook_call_depth; /* Number of recursively called hook procs */ WORD hook_unicode; /* Is current hook unicode? */
v4: Use atomic loads when reading from USD, we can't assume x86 on the unix side. (and NtGetTickCount() is arguably incorrect BTW)
This is a step in the direction of continuously polling the queue fd on the server side, removing the need for ntsync to notify wineserver before / after waiting on the queue.
Do you have performance data that shows an improvement in applications?
I don't know if this makes the code simpler or better in other ways—great if so—but if not, and there's no instance where avoiding that server call is known to be a bottleneck, I wouldn't be inclined to do it personally.
In any case I hope this isn't going to be considered necessary for ntsync to be upstreamed.
Do you have performance data that shows an improvement in applications?
I don't.
I don't know if this makes the code simpler or better in other ways—great if so—but if not, and there's no instance where avoiding that server call is known to be a bottleneck, I wouldn't be inclined to do it personally.
It removes all need for message queue handling in ntdll, and it doesn't have to know about message queues at all. Also removes the extra server requests, including when waiting only on the queue. On the server side it also gets rid of the awkward dual server + inproc queue sync.
It also brings improvements in win32u, with better handling of driver events, and eventually even internal messages. Right know we blindly poll for events, and that could be done only when necessary.
So I think it makes it better.
In any case I hope this isn't going to be considered necessary for ntsync to be upstreamed.
Well I don't know that it is, the other MR has been stalled for a while without much feedback and usually that indicates that something isn't quite right and needs to be tried differently. I can only take guesses and the queue handling in ntdll looks like a possible reason.
So I think it makes it better.
Great, no need for any comment from me then.
In any case I hope this isn't going to be considered necessary for ntsync to be upstreamed.
Well I don't know that it is, or even what is necessary. The other MR has been stalled for a while without much feedback and usually that indicates that something isn't quite right and needs to be tried differently. I can only take guesses and the queue handling in ntdll looks like a possible reason.
Ah yes, if it's necessary in that sense so be it. It would be nice if we could at least have some confirmation of what part of a patch set is distasteful, if any part is...