Constant I2C reads inside a task causes the program to get into an unknown state

MasterSil · June 24, 2021, 4:10am

I am doing an I2C read inside a task, and at some point, I see the program seems to get into a seemingly weird/unknown state.

Following is sort of what I have. I’m using nRF driver APIs under the hood.

SystemTask::SystemTask(Uart &pUart, Tmp &tmpSensor, NotificationManager &notificationManager, QueueHandle_t &systemQueue) : 
		   uart(pUart), _tmpSensor(tmpSensor), _notificationManager(notificationManager), systemTaskQueue(systemQueue)
{   
    // create a task
    if (xTaskCreate(SystemTask::process, "PROCESS", 256, this, 0, &taskHandle) != pdPASS)	  // TODO - think about stack size!
    {
        APP_ERROR_HANDLER(NRF_ERROR_NO_MEM);
    }   
}


void SystemTask::process(void *instance)
{
    auto pInstance = static_cast<SystemTask*>(instance);
    pInstance->mainThread();
}

void SystemTask::mainThread() {
  _tmpSensor.xferData();
  while(true) { 
     value = _tmpSensor.read();   // i2c read with IRQ enabled

Here’s the call stack. There’s a better way around debugging it?

RAc · June 24, 2021, 5:21am

There is nothing like an “unknown state.” Either your code has caused an exception, so your pc points to somewhere in a fault handler, or you are stuck in an infinite loop, or all your tasks are suspended or blocked. In either case, you first look at the pc value and determine which code belongs to it.

MasterSil · June 25, 2021, 5:00am

right, but 0x00000a60 being the PC value I reckon, but would you track what it corresponds to? I’m using Segger IDE and I entered this address in the memory search bar but it doesn’t quite help

RAc · June 25, 2021, 6:23am

use the disassembly window.

MasterSil · June 25, 2021, 6:38am

it takes me to a some load command in ASM

hs2 · June 25, 2021, 6:43am

Is it reproducible or does it happen sporadically ?
Is it possible that e.g. the sensor object gets corrupted/deleted ?
Are you sure that the (rather small) task stack size of 250 is really sufficient ?

MasterSil · June 25, 2021, 7:03am

now I see a slightly different call stack.

It does seem to happen every time. If not right away, it would happen a few seconds later.

How would the object getting corrupted at a random instance? Note that the object is created in global scope of main and its reference gets stored inside SystemTask so I doubt if it’s getting corrupted.

For now I still see a fair amount of memory hasn’t been used

hs2 · June 25, 2021, 7:23am

Seems something bad happens when using/reading the sensor (?). Perhaps a problem with the underlying driver. The higher level task/object setup as you described seems to be ok.

RAc · June 25, 2021, 7:34am

It’s not “random,” rather “indeterministic.” That’s the very nature of concurrent programming and the daily work of all of us.

Assume that some task’s stack gets overwritten, corrupting memory that “belongs to someone else.” Now the problem only occurs once this “someone else” attempts to do something with that memory. That may happen rather soon, but then again there may be multiple interrupts being serviced in the meantime, higher pri tasks being scheduled and so on. It is very typical and normal that crashes or other problems manifests themselves in different ways and places depending on the concurrent (unpredictable) sequence of events.

If you’ve been doing RTOS work for y few years, you’ll develop a fairly reliable gut feeling for the usual suspects for the root cause instead of trying to look at the crash itself.