I sent the following patch recently:
commit ee6856d874d687c4504914e61bcde3e6b8823bca Author: Vincent Povirk madewokherd@gmail.com Date: Fri Oct 23 13:57:42 2009 -0500
ole32: Don't use IEnumSTATSTG to search for elements of storages.
We use it to do a linear search of a binary tree, which is overkill. Replace it with a simple binary search.
It was accepted and caused this bug: http://bugs.winehq.org/show_bug.cgi?id=20477
It actually turned out that the mistake was in a previous patch, but it didn't matter for reading files until we started to depend on the tree to be valid. Nonetheless, I've done a test to make sure we can really rely on the binary trees in storage files to be valid.
I wrote two programs: one that reads the directory entries of a storage file and prints some information to stdout (useful for debugging), and one that swaps the left and right child of every directory entry, corrupting the trees in the file.
It turns out that corrupting an msi file in this way makes it fail on both windows and on wine (although it crashes on wine; msi's error handling code is broken). Running the program again (thus swapping the children back to their original, correct position) makes it work.
However, on an older version of Wine without my change, the corrupted msi file loads fine.
This means that Windows does depend on the tree in a storage file to be correct, and Wine's behavior is now closer to Windows, even though it now fails in some cases when it used to succeed.
(I think that current Wine without the fix for bug 20477 will load only the corrupted files, but I didn't test this.)
I've attached my test programs so they're in the public record.
On 27.10.2009 22:50, Vincent Povirk wrote:
I sent the following patch recently:
commit ee6856d874d687c4504914e61bcde3e6b8823bca Author: Vincent Povirk madewokherd@gmail.com Date: Fri Oct 23 13:57:42 2009 -0500
ole32: Don't use IEnumSTATSTG to search for elements of storages. We use it to do a linear search of a binary tree, which is overkill. Replace it with a simple binary search.
It was accepted and caused this bug: http://bugs.winehq.org/show_bug.cgi?id=20477
It actually turned out that the mistake was in a previous patch, but it didn't matter for reading files until we started to depend on the tree to be valid. Nonetheless, I've done a test to make sure we can really rely on the binary trees in storage files to be valid.
[...]
This means that Windows does depend on the tree in a storage file to be correct, and Wine's behavior is now closer to Windows, even though it now fails in some cases when it used to succeed.
If I've not mistaken, that uses lstricmp internally for comparing keys.
Note, that wine lstr[i]cmp and ms windows lstr[i]cmp uses different sort order (see last [disabled] test in dlls/kernel32/tests/locale.c).
So binary search using wine lstricmp on presored array (used windows lstricmp) will certainly fail on some files.
PS looking at code one more time, current code seems already depends on "lstricmp working same" [in updatePropertyChain()], so already broken. It's pure luck that it is not failed already.
Disclaimer: I'm not expert in wine code, looked at code very quickly, and may be misread something :-\
If I've not mistaken, that uses lstricmp internally for comparing keys.
It uses lstrcmpiW, which according to MSDN can behave differently based on locale, but I don't think the Wine version does.
The comparison is in the propertyNameCmp function.
I think that the MS storage code probably does something else.
PS looking at code one more time, current code seems already depends on "lstricmp working same" [in updatePropertyChain()], so already broken. It's pure luck that it is not failed already.
Well, yes, now it just depends on it even more (we'll fail reading files too instead of just writing invalid ones).
I've checked the compound file spec from MS and the one from OpenOffice. Neither of them specify the comparison beyond "character by character
I think what saves us is that programs rarely expose storage and stream names directly to users, so the set of characters we have to deal with in practice is limited.
Still, we really should use a comparison function that's at least guaranteed to be consistent.
On 28.10.2009 20:30, Vincent Povirk wrote:
If I've not mistaken, that uses lstricmp internally for comparing keys.
It uses lstrcmpiW, which according to MSDN can behave differently based on locale, but I don't think the Wine version does.
Note, that, it is not only difference in >0x7f "nls" chars (somewhat "expected"), there are difference in ascii range too (see that test from test/locale.c). [I'm saying about difference between wine vs. windows; I'm not sure if windows lstricmpW /really/ work differently in different locales]
[...]
I think what saves us is that programs rarely expose storage and stream names directly to users, so the set of characters we have to deal with in practice is limited.
I've looked at some .msi files - stream names contain problematic characters: '.' vs. '_' (mismatches between wine and windows; there are more such chars), '_' vs. '0' (mismatches between strcasecmp and strcoll|lstricmp).
So, while "problematic" files maybe very rare, but not impossible.
Still, we really should use a comparison function that's at least guaranteed to be consistent.