Re: [wlug] movsl alignment

14 May 2006

...
...
So short answer:
Align memory.  It's faster that way on Intel machines.
Good post! TY that helps a lot :)
Just to iron out some thoughts, how useful is 8 byte alignment with modern 
CPUs, in this case a P4. In context, in linux kernel code, the movsl 
alignment mask is defined as 8 bytes (arch/i386/cpu/intel.c).
Right, after reading the source I think I've figured that out.  There is
an optimised intel version of the copy code which unrolls a short part
of the copy (first 64bytes I think) before using rep movsl for the rest.
 If it doesn't meet the alignment/minimum lengths it uses rep movsl
without the unrolled copy first.

I'm not sure why it needs to be aligned to 8 bytes, from my
understanding a 4 byte alignment should be sufficient.  It could be that
on a 64bit machine it's copying 8 bytes, not 4.
...
Without actually performing some low level benchmarks, is it likely that larger or 
smaller alignment values might generate better overhead conditions for the 
CPU? Special cases perhaps such as audio.
Generally larger alignment is faster.  Also as you start getting closer
to the hardware you start getting stricter requirements for alignment.
Early DMA controllers for instance had to be "page" aligned (16 bytes
IIRC).  I have no idea if this is still true today.

The downside of large alignment is that it uses more memory, and if you
end up having to touch that alignment padding then you end up wasting
resources.  The trick is trading off this "Wasted" space, vs the speedup
you get from the alignment.

What exactly are you trying to discover here?  Are you trying to figure
out how to deglitch some audio?

Re: [wlug] movsl alignment

Perry Lorier