If you are fortunate enough to have a spare DAC on board, you can send the current task number to the DAC on a context switch. Not only do you get % utilization, but you also get a very handy real time execution graph on your scope. Granted this increases the length of a context switch (by ten instructions for me), the visibility you gain into your system usually makes it worth the cost.
P.S. I modify the "tskTCB" structure so that "uxTCBNumber" is directly after "pxTopOfStack". That way my context switch code can easily grab the task number after reading the stack pointer.