Optimization
Spread out over two days I started with the optimization videos from the Handmade Hero series.
There was a lot of theory.
I hoped the theory would come in handy for my day job as well.
Fortunately I think some small bits do.
We’ll have to see tomorrow if that is really the case 😛
In the videos he started bit by bit converting the math over to SIMD operations. I already understood SIMD enough so I wanted to skip the little steps. So during the video I just tried to convert everything at once. However some bits were actually hard to get right and that’s where the videos came to the rescue. In the end I managed to at least watch all the videos about this topic. Or at least about optimizing the software renderer.
Unfortunately for me the game was now broken. No compiler errors, so it seemed I did it correctly. But the game told me otherwise. After I got back from soccer training I found the issue. Once again some stupid typo. I already had a hunch it had something to do with the Y coordinates of the pixels. That was exactly where it went wrong. Just one variable that should be using the Y variant was using the X variant of a variable. Easy fix and therefore easy performance gain. As the cycles went from 350 originally to only 35 cycles per pixel.
That’s huge! And Casey says it can be even improved. As we didn’t align the memory yet. And there are aways other techniques you can use. But this was just basic flattening the algorithm. That means just no function calls, placing the code directly into the algorithm. Together with SIMD instructions you get this amazing result.
This just shows again why I think SIMD should be taught more by developers. Not enough software writers are using SIMD. Heck, I think the lead programmer at my day job doesn’t even know how to use SIMD. Even though he has definitely more than 10 years of experience.
The last time I checked you couldn’t easily use SIMD instructions in C#. I will definitely look at it again and hopefully find an easy and solid way to use SIMD intrinsics. As most CPUs we write games for these days have SIMD instructions build in. But we just aren’t using them enough as Unity/C# game devs.