Hello community,
sometimes I get the wrong value pushed to the LUA stack.
This value was in the register R3 and gets pushed via lua_pushnumber().
But in the memory the wrong value is seen afterwards.
I’m working on an older project.
It’s running on an Xilinx Zynq 7000 Cortex A9 with the 2017.1 Version of Xilinx SDK (GCC).
RTOS has the version 9.0.
There are around 15 task running.
Most of them are waiting for a message from the bus in the corresponding queue.
While the LUA task is actively running it can happen that one task from a high performance bus gets to interrupt the LUA task and after returning it, LUA is writing the wrong value. Most notably if the floor() function tries to floor a value. On integer it just get changed to 0 but on floats it throws an exception because “the value given is not a number”.
I changed the config and excluded all defines which are just for statistics and the like.
I activated FPU for all tasks.
I checked the current demo with my version and got across 1 line which differs from the implementation of vApplicationIRQHandler(), which is this:
/* Re-enable interrupts. */
__asm ( “cpsie i” );
But even after adding this nothing changed.
I painted the stack boundaries and monitored them, increased heap and stack size of the whole project and RTOS tasks to no avail.
The interrupts are only using RTOS functions with the ending fromISR and no memmove()s/memcpy()s.
The tasks use queues but no malloc()s and some use memmove()s.
The priority of the task which interrupts LUA is 5 and LUA is 1.
Changing this or wrapping this in a critical section did help, but putting the LUA script over the task which handles the bus messages doesn’t seem right to me.
Since the bus task has a harder real time requirement than LUA.
Plastering the LUA library with critical section seems stupid as well.
With an oscillator I could see that it’s always the same behaviour.
LUA task get’s selected from a random task prior (either bus or even IDLE task).
LUA reads the value in the array which holds the bus value.
High performance bus kicks in for couple of us (mostly checking the queue and seeing that the message is not of importance).
Returning to LUA and LUA pushes a wrong value to it’s stack and either crashing or sending 0 afterwards.
Sometimes the system is running 45 min without error, sometimes 1 minute.
My guess would be that something messes up the pop of the correct values.
Since R3 sometimes holds 0606060 of R6 afterwards.
Currently it is possible to force the error by using a breakpoint at these 2 lines of code and running them via play instead of stepping. Stopping 1 line above does not force the error to occur (math_floor() of mathlib.c):
lua_Number d = l_mathop(floor)(luaL_checknumber(L, 1));
pushnumint(L, d);
Has someone experienced something like this before or could pinpoint me where to look or share their experience how you would debug something like this?