Linus Torvalds Calls Blogger's Linux Scheduler Tests 'Pure Garbage'

6 Jan 2020

      'On Wednesday Phoronix cited a blog post by C++ game developer Malte
Skarupke claiming his spinlocks experiments had discovered the Linux
kernel had a scheduler issue affecting developers bringing games to
Linux for Google Stadia.

Linus Torvalds has now responded:

The whole post seems to be just wrong, and is measuring something
completely different than what the author thinks and claims it is
measuring.

First off, spinlocks can only be used if you actually know you're not
being scheduled while using them. But the blog post author seems to be
implementing his own spinlocks in user space with no regard for
whether the lock user might be scheduled or not. And the code used for
the claimed "lock not held" timing is complete garbage.

It basically reads the time before releasing the lock, and then it
reads it after acquiring the lock again, and claims that the time
difference is the time when no lock was held. Which is just inane and
pointless and completely wrong...

[T]he code in question is pure garbage. You can't do spinlocks like
that. Or rather, you very much can do them like that, and when you do
that you are measuring random latencies and getting nonsensical
values, because what you are measuring is "I have a lot of busywork,
where all the processes are CPU-bound, and I'm measuring random points
of how long the scheduler kept the process in place".

And then you write a blog-post blamings others, not understanding that
it's your incorrect code that is garbage, and is giving random garbage
values...

You might even see issues like "when I run this as a foreground UI
process, I get different numbers than when I run it in the background
as a batch process". Cool interesting numbers, aren't they?

No, they aren't cool and interesting at all, you've just created a
particularly bad random number generator...

[Y]ou should never ever think that you're clever enough to write your
own locking routines.. Because the likelihood is that you aren't (and
by that "you" I very much include myself -- we've tweaked all the
in-kernel locking over decades, and gone through the simple
test-and-set to ticket locks to cacheline-efficient queuing locks, and
even people who know what they are doing tend to get it wrong several
times).

There's a reason why you can find decades of academic papers on
locking. Really. It's hard.

"It really means a lot to me that Linus responded," the blogger wrote
later, "even if the response is negative." They replied to Torvalds'
1,500-word post on the same mailing list -- and this time received a
1900-word response arguing "you did locking fundamentally wrong..."

The fact is, doing your own locking is hard. You need to really
understand the issues, and you need to not over-simplify your model of
the world to the point where it isn't actually describing reality any
more...

Dealing with reality is hard. It sometimes means that you need to make
your mental model for how locking needs to work a lot more
complicated...'

-- source: https://linux.slashdot.org/story/20/01/06/012251

Cheers, Peter
-- 
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/

Peter Reutemann

tags

participants (1)