The problem is that the hardware just doesn’t support that level of abstraction. The use of Tightly Coupled Memory doesn’t work in the concept of not caring which processor runs a task. A processor designed for that sort of operation would replace that TCM with a large cache, probably with 2 levels, and a perhaps a larger global memory.