On 26.09.2012 12:17, Nozomi Kodama wrote:
+ ta =0.28209479f * a[0]+ 0.14567312f * a[8]+ 0.12615663f * a[6]; + tb =0.28209479f * b[0]+ 0.14567312f * b[8]+ 0.12615663f * b[6]; + out[13] += ta * b[13] + tb * a[13];
<> + t = a[13] * b[13];
+ out[0] +=0.28209479f *t;
Please use spaces with care. The preferred way is "x = 123 * a[0] + 456 * b[234]; Mostly one space is enough, only use two or more if >you like to indent something, that's mostly not the case when calculating a value. Even then, please be >consistent across
your patch.
oops!!! I will take care of this one.
+ D3DXSHMultiply4(c, a, b); What happens if you use something like D3DXSHMultiply4(c, c, c). Is that allowed?
Tests in native show that D3DXSHMultiply gives what I implemented. We can not reuse a input variable as output (we don't obtain what we expect logically).
Also is there a reason why it uses slightly different values than e.g. D3DXSHMultiply3? Why don't we >use defines? A problem might be to find good names... Maybe the Multiply functions could share the >same base? To me it looks like they do share a lot of calculations, And if the order doesn't matter (see >comment above) then it may be possible to rearrange the calculation.
ta = 0.28209479f * a[0] - 0.12615663f * a[6] - 0.21850969f * a[8]; vs ta = 0.28209479f * a[0] - 0.12615662f * a[6] - 0.21850968f * a[8];
I do the computations with a formal computation software. Unfortunately I did not use the same for D3DXSHMultiply4 and D3DXSHMultiply3 ..... (Xcas vs Maple). And obtain the exact values look beyond the capacities of these softwares.
And even the worst, since the integrated functions oscillate, the internal algorithm of the software are not robust to compute the integrals. For a given number of significative digits, that's why we obtain slightly different values for the both softwares.
Addind defines will add tons on variables for no gain. But I can simplify slightly the function by calling D3DXSHMultiply3.
I will give it a try. I worry about efficiency.
Best regards
Nozomi
Cheers Rico
On 26.09.2012 19:09, Nozomi Kodama wrote:
- D3DXSHMultiply4(c, a, b);
What happens if you use something like D3DXSHMultiply4(c, c, c). Is
that allowed?
Tests in native show that D3DXSHMultiply gives what I implemented. We can not reuse a input variable as output (we don't obtain what we expect logically).
I gave it a try, and your implementation seems to be correct. I tried: D3DXSHMultiply4(c, a, c); D3DXSHMultiply4(c, c, b); // used c[i] = ((float)i) / 20; to not give always -nan D3DXSHMultiply4(c, c, c);
They look pretty much the same for wine and native. Why then not add a test for this? Sure, you don't get what you logically expect, but we need to reproduce the results and that has not always something to do with logic :-) . It is a little bit contrary to the part below, because when you change the order and reuse code, the calculation for this corner cases might get wrong (I'm not sure, haven't tried it).
Also is there a reason why it uses slightly different values than e.g.
D3DXSHMultiply3? Why don't we >use defines? A problem might be to find good names... Maybe the Multiply functions could share the >same base? To me it looks like they do share a lot of calculations, And if the order doesn't matter (see >comment above) then it may be possible to rearrange the calculation.
ta = 0.28209479f * a[0] - 0.12615663f * a[6] - 0.21850969f * a[8]; vs ta = 0.28209479f * a[0] - 0.12615662f * a[6] - 0.21850968f * a[8];
I do the computations with a formal computation software. Unfortunately I did not use the same for D3DXSHMultiply4 and D3DXSHMultiply3 ..... (Xcas vs Maple). And obtain the exact values look beyond the capacities of these softwares.
And even the worst, since the integrated functions oscillate, the internal algorithm of the software are not robust to compute the integrals. For a given number of significative digits, that's why we obtain slightly different values for the both softwares.
Addind defines will add tons on variables for no gain. But I can simplify slightly the function by calling D3DXSHMultiply3.
Ok, no problem. See above. I'd say compatibility before simplicity. It was just a thought, because there was no test / comment in the code that makes my suggestion a stupid idea, but the test above makes this idea obsolete.
Cheers Rico