
Glenn Enright wrote:
On Monday 15 May 2006 9:58 am, Perry Lorier wrote:
Generally larger alignment is faster. Also as you start getting closer to the hardware you start getting stricter requirements for alignment. Early DMA controllers for instance had to be "page" aligned (16 bytes IIRC). I have no idea if this is still true today.
The downside of large alignment is that it uses more memory, and if you end up having to touch that alignment padding then you end up wasting resources. The trick is trading off this "Wasted" space, vs the speedup you get from the alignment.
Right, pretty much matches what I was thinking. I will look at dma stuff next. I should probably start doing some profiling to see what the objective results are, rather than subjective ;-p. might look into ck series kernels to see differences.
Unfortunately most of my assembly language coding was done in the dark ages in the 16 bit dos days. (Segmentation, 640k and TSR's oh my!), and I've really not bothered about what's happening under the hood since then. Goodness knows how APIC/IOMMU/ACPI work[1] :)
My understanding is that P4 chips have quite bad latency for some common io instructions, which is where AMD makes ground. For example more here... http://answers.google.com/answers/threadview?id=321522
Intel chips have always been slow at various things, they've always just done better with clock cycles. A memorable example of this was the LOOP opcode on pentium chips. It was very slow on Intel machines, so programs used it as a timing delay. On AMD chips it was extremely fast (1 cycle IIRC), so those timing loops effectively became noops causing lots of programs to fail in amusing and/or spectacular fashion.
I traced the movsl mask back to L1_CACHE_BYTES, which makes sense.
What exactly are you trying to discover here? Are you trying to figure out how to deglitch some audio?
Not really, just using that as an example (although ac97 drivers do still have a bad io related bug).
only one? Miracle!
I'm hacking round in i386 arch trying to learn a bit more about how io is handled and increasing my knowledge of system programming at the same time. Been attempting to absorb Intel docs on this. Kinda hobby type thing.
Ah, I remember the good ol' days of doing this myself :) Although documentation wasn't quite as free flowing as it is these days.
So far recent testing versions built created about 5% decrease in core code size (subtle bugs aside) using gcc 3.4.6, just by manually optimizing kernel code for a p4 2.6 (stepping 9) which I'm running. Building with 'march=pentium4' has worked nicely so far.
Nice. I guess an important lesson here that when compiling a kernel, compile it for your CPU, it'll be spiffier!
Also the MB (Abit IS7) appears to have really good IO subsystems, which fascinates me :). Learning what kernel devs do when things break has been fun. I realise that newer 64bit offerings do many things differently, but this is what I have to play with for now.
Yeah, I've not looked that closely at the 64bit stuff other than the more, bigger, registers. --- [1] I Realise that implicit in this statement I assume they do *actually* work.