rentzsch.com: tales from the red shed

May You Live in Interesting `time`

Notes

While researching an article, I came up with some “interesting” software performance numbers. There seems to be a performance blog streak going on, so I’ll jump into the fray with my numbers.

The program is purposely simple: multiply two arrays of 64-bit integers into a third array. 50 million of ‘em. I wanted to quantify what type of performance enhancement you get using native 64-bit registers. I ran it on my Quad, but for kicks also ran it on my 2.0 GHz MacBook Pro:

All numbers in seconds. This is the result of the average of the user number from running repeat 10 time mayYouLiveInInterestingTime under zsh (is there anything like repeat under bash? The man page makes my eyes bleed).

As expected, the full-width 64-bit integer registers on the G5 helped it handily win the contest. But that was a ppc64 binary. When I force the Quad to use standard-width 32-bit integer registers (ppc) it slows down into a dead heat with the MacBook Pro (the MBP was very slightly faster, but I was actually using the Quad (iTunes, email, NNW, etc.) while the MBP wasn’t doing anything else, so I call it a draw).

But! that was time’s user output — the amount of time spent on the “user” side of the process. The system output tells a different story:

Whoa! The MacBook Pro spent almost a whole second less on the “system” side of the process versus the Quad ppc64. Worse, the ppc version — by far the more common runtime variant on PowerMacs — was about a second and a half slower still.

When you add up user+system time spent, the tables turn and the MacBook Pro edges out the Quad on its home turf:

It appears straight-away the Quad is able to keep up with the MacBook Pro when running common 32-bit code, and able to take it to the curb when allowed to run exotic 64-bit native code, however on the system side it’s much slower.

I haven’t looked any deeper than the system and user numbers handed to me by time. In fact, I’m not even sure where those numbers come from. Worse, time’s total number never adds up — it’s always around 30 to 50 milliseconds greater than the sum of its own user+system. Perhaps it’s process bring-up/tear-down that doesn’t get factored in? I don’t know.

Update: Grady Haynes writes:

Worse, time’s total number never adds up — it’s always around 30 to 50 milliseconds greater than the sum of its own user+system. Perhaps it’s process bring-up/tear-down that doesn’t get factored in? I don’t know.

From man time (which sounds way more kinky than it is):

BUGS
The granularity of seconds on microprocessors is crude and can result in times being reported for CPU usage which are too large by a second.

Damn these newfangled microprocessors and their seconds of low granularity!

Aha, thanks for the pointer to the Fine Man page, Grady. Still, this doesn’t excuse your most “recent” blog posting. Gaaack, I hope that doesn’t run through my head.

Saturday, March 18, 2006
12:00 AM