traceTASK_SWITCHED_OUT() should be called first. When SWICTHED_OUT is called pxCurrentTCB points to the TCB of the task that was running prior to the context switch. When SWITCHED_IN is called pxCurrentTCB points to the TCB of the task that is selected to run when next.
But the very first task will not be preceeded by an SWITCHED_IN call, I think it would be good to get an call also before the first task is started, this way i can measure the execution time of the first run for the first task aswell.
The scheduler is started from the port layer - that is - the code that is not common to every microcontroller. It would not be easy to go through all 31 ports and update the code to make a call to the SWITCHED_IN macro.
You know which task is running first, because the SWITCHED_OUT macro that gets called first will tell you. The only information you are missing is the exact time the first task executed for. It would be easier to get this by adding in a new macro in vTaskStartScheduler(). The new macro could then be used to start the run time stats counter - if it was already running before the scheduler started. Please add a feature request in the SourceForge feature request tracker if you would like that added.