2009/12/22 Stefan Dösinger stefan@codeweavers.com:
- conv = ((FVF & WINED3DFVF_POSITION_MASK) == WINED3DFVF_XYZRHW ) || (FVF & (WINED3DFVF_DIFFUSE | WINED3DFVF_SPECULAR)); hr = buffer_init(object, This, Size, Usage, WINED3DFMT_VERTEXDATA,
Pool, GL_ARRAY_BUFFER_ARB, NULL, parent, parent_ops);
Pool, GL_ARRAY_BUFFER_ARB, NULL, parent, parent_ops, conv);
This looks questionable, we use the FVF to determine that the buffer is going to need conversion, but don't pass that FVF to the buffer itself? Shouldn't this just use the existing code in buffer.c to determine when we need conversion in the first place, and just drop the VBO when the overhead becomes too large? Note that if we have EXT_vertex_array_bgra we don't need conversion for the color data in the first place.
On Tue, Dec 22, 2009 at 6:44 PM, Henri Verbeet hverbeet@gmail.com wrote:
2009/12/22 Stefan Dösinger stefan@codeweavers.com:
- conv = ((FVF & WINED3DFVF_POSITION_MASK) == WINED3DFVF_XYZRHW ) || (FVF & (WINED3DFVF_DIFFUSE | WINED3DFVF_SPECULAR));
hr = buffer_init(object, This, Size, Usage, WINED3DFMT_VERTEXDATA,
- Pool, GL_ARRAY_BUFFER_ARB, NULL, parent, parent_ops);
- Pool, GL_ARRAY_BUFFER_ARB, NULL, parent, parent_ops, conv);
This looks questionable, we use the FVF to determine that the buffer is going to need conversion, but don't pass that FVF to the buffer itself? Shouldn't this just use the existing code in buffer.c to determine when we need conversion in the first place, and just drop the VBO when the overhead becomes too large? Note that if we have EXT_vertex_array_bgra we don't need conversion for the color data in the first place.
The code mentions that when conversion is needed no VBO is created because conversion on the VBO memory in combination with uploading and drawStridedFast is slower than drawStridedSlow. The buffer object extensions discourage to perform much operations on buffer memory because typically it is uncached. Have you tried to perform conversion on a normal memory buffer and compared performance to doing the same on VBO memory? It might make sense to do the conversion on a normal memory buffer and memcpy that contents to a VBO? That way you still profit from the async uploads to the GPU and the conversion itself.
Roderick
Am 22.12.2009 um 19:23 schrieb Roderick Colenbrander:
On Tue, Dec 22, 2009 at 6:44 PM, Henri Verbeet hverbeet@gmail.com wrote:
2009/12/22 Stefan Dösinger stefan@codeweavers.com:
- conv = ((FVF & WINED3DFVF_POSITION_MASK) == WINED3DFVF_XYZRHW ) || (FVF & (WINED3DFVF_DIFFUSE | WINED3DFVF_SPECULAR)); hr = buffer_init(object, This, Size, Usage, WINED3DFMT_VERTEXDATA,
Pool, GL_ARRAY_BUFFER_ARB, NULL, parent, parent_ops);
Pool, GL_ARRAY_BUFFER_ARB, NULL, parent, parent_ops, conv);
This looks questionable, we use the FVF to determine that the buffer is going to need conversion, but don't pass that FVF to the buffer itself? Shouldn't this just use the existing code in buffer.c to determine when we need conversion in the first place, and just drop the VBO when the overhead becomes too large? Note that if we have EXT_vertex_array_bgra we don't need conversion for the color data in the first place.
This is for d3d7. d3d7's Vertex buffer lock method doesn't have a parameter to specify the locked ranges properly. Some apps(3dmark 2000, max payne) use vertex buffers in the d3d9 D3DUSAGE_DYNAMIC fashion, by putting some vertices there, drawing, then putting new vertices in them. Since we always reconvert the whole buffer(no range hints from the app), this ends up slowing things down a lot.
Using code similar to the buffer drop on conversion type changes sounds tempting, but we need a heuristic for that. There will be no conversion description changes in case of d3d7(static buffer declaration), but we could catch full buffer conversions vs buffer uses, and drop the buffer if there are e.g. less than 3 or 4 draws per full conversion.
This code indeed fails to take notice of EXT_vertex_array_bgra.
The code mentions that when conversion is needed no VBO is created because conversion on the VBO memory in combination with uploading and drawStridedFast is slower than drawStridedSlow. The buffer object extensions discourage to perform much operations on buffer memory because typically it is uncached. Have you tried to perform conversion on a normal memory buffer and compared performance to doing the same on VBO memory?
We convert in HeapAlloc'ed memory and upload the final data. PreLoad doesn't operate on glMap()ed memory.
2009/12/23 Stefan Dösinger stefan@codeweavers.com:
Am 22.12.2009 um 19:23 schrieb Roderick Colenbrander:
On Tue, Dec 22, 2009 at 6:44 PM, Henri Verbeet hverbeet@gmail.com wrote:
2009/12/22 Stefan Dösinger stefan@codeweavers.com:
- conv = ((FVF & WINED3DFVF_POSITION_MASK) == WINED3DFVF_XYZRHW ) || (FVF & (WINED3DFVF_DIFFUSE | WINED3DFVF_SPECULAR));
hr = buffer_init(object, This, Size, Usage, WINED3DFMT_VERTEXDATA,
- Pool, GL_ARRAY_BUFFER_ARB, NULL, parent, parent_ops);
- Pool, GL_ARRAY_BUFFER_ARB, NULL, parent, parent_ops, conv);
This looks questionable, we use the FVF to determine that the buffer is going to need conversion, but don't pass that FVF to the buffer itself? Shouldn't this just use the existing code in buffer.c to determine when we need conversion in the first place, and just drop the VBO when the overhead becomes too large? Note that if we have EXT_vertex_array_bgra we don't need conversion for the color data in the first place.
This is for d3d7. d3d7's Vertex buffer lock method doesn't have a parameter to specify the locked ranges properly. Some apps(3dmark 2000, max payne) use vertex buffers in the d3d9 D3DUSAGE_DYNAMIC fashion, by putting some vertices there, drawing, then putting new vertices in them. Since we always reconvert the whole buffer(no range hints from the app), this ends up slowing things down a lot.
Using code similar to the buffer drop on conversion type changes sounds tempting, but we need a heuristic for that. There will be no conversion description changes in case of d3d7(static buffer declaration), but we could catch full buffer conversions vs buffer uses, and drop the buffer if there are e.g. less than 3 or 4 draws per full conversion.
This code indeed fails to take notice of EXT_vertex_array_bgra.
Perhaps we should pass a usage flag to indicate the client library always does full locks to help us decide that we need to drop the VBO earlier. Just detecting full locks when they're done probably works as well though.