felix
The functions in src/render/SDL_yuv_mmx.c contain the following inline assembly snippet:
/* tap dance to workaround the inability to use %%ebx at will... */
/* move one thing to the stack... */
"pushl $0\n" /* save a slot on the stack. */
"pushl %%ebx\n" /* save %%ebx. */
"movl %0, %%ebx\n" /* put the thing in ebx. */
"movl %%ebx,4(%%esp)\n" /* put the thing in the stack slot. */
"popl %%ebx\n" /* get back %%ebx (the PIC register). */
Here's how it ended up in a binary on my old laptop:
0xb5c17dbd <ColorRGBDitherYV12MMX1X+93>: push $0x0
0xb5c17dbf <ColorRGBDitherYV12MMX1X+95>: push %ebx
0xb5c17dc0 <ColorRGBDitherYV12MMX1X+96>: mov 0xc(%esp),%ebx
0xb5c17dc4 <ColorRGBDitherYV12MMX1X+100>: mov %ebx,0x4(%esp)
0xb5c17dc8 <ColorRGBDitherYV12MMX1X+104>: pop %ebx
Apparently the compiler, oblivious to the fact that the assembly snippet manipulates the %esp register, decided to refer to the operand via that same register instead of via %ebp (I believe -fomit-frame-pointer enables this). This causes %ebx to be loaded with the wrong value, which later leads to a null pointer dereference.
Recent GCC can use the %ebx register normally: <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47602#c16>. There is even an explicit constraint "b" for allocating it.
This allows me to reduce the set of formats supported by the renderers to the most optimal set, for a nice speed boost.
--HG--
rename : src/video/SDL_yuv_mmx.c => src/render/SDL_yuv_mmx.c
rename : src/video/SDL_yuv_sw.c => src/render/SDL_yuv_sw.c
rename : src/video/SDL_yuv_sw_c.h => src/render/SDL_yuv_sw_c.h
rename : src/video/mmx.h => src/render/mmx.h