Gotta Go Fast hosts "Algorithms for Modern Hardware", an "an upcoming high performance computing book [...] by Sergey Slotin". I found it while looking for material on optimising matrix multiplication, since mathematical software is so much faster than just writing the formula into a program. While the book itself is incomplete, it looks like that only means that some sections haven't been written yet. The existing sections are perfectly well fleshed out and definitely useful.

While I'm at it I'll also link "Is Parallel Programming Hard, And, If So, What Can You Do About It?" by Paul E. McKenny. It's very readable — I'd almost call it light reading. It never feels like anything is made more complicated than it has to be, which is a welcome relief compared to some other material. If you've ever wondered "why couldn't the author just have said that?" after understanding a simple idea underlying some complicated prose, you'll be happy to receive the simple idea directly for once.