Lost semaphore on SAM4N

tomwbarclay wrote on Thursday, March 19, 2015:

Hi, I need ideas on how to further investigate a suspected FreeRTOS failure. Problem is that a semaphore is fired from an ISR but is not caught by the background thread. Code snippet follows

// init the control block
rblk	= reg_block;
success = TRUE;
state	= CMX7164_SEQ_INT0;
 
#ifdef DEBUG_SDR_THRU_CBUS
debug_blip_debug_line(DEBUG_LINE5);
#endif

// set the SPI and CMX7164 interrupt call back routines
spi_io_set_interrupt_callback_function(sdr_cbus_wr_thru_txfr_callback);
cmx7164_set_interrupt_callback_function(sdr_cbus_interrupt_callback);
timer_io_set_hw_alarm_time(SDR_CBUS_TIMER_CHANNEL, tout_us, sdr_io_cbus_wr_timeout);

// create a software interrupt as if the SPI txfr has happened
// this will start the entire sequence
NVIC->STIR = SPI_IRQn;

// wait till the program sequence completed event arrives
event_wait(txfr_cbus_wr_event);

#ifdef DEBUG_SDR_THRU_CBUS
debug_blip_debug_line(DEBUG_LINE6);
#endif
 
// reset the Timer,  SPI and CMX7164 interrupt call back routines
timer_io_clear_hw_alarm(SDR_CBUS_TIMER_CHANNEL);
spi_io_set_interrupt_callback_function(NULL);
cmx7164_set_interrupt_callback_function(NULL);

Code enters at the top … sets up some interrupt callbacks, kicks off the interrupt processes
with the NVIC-> command . (Interrupt processes take from 150 to 500us) then waits on semaphore from final interrupt process (event_wait … my FreeRTOS wrapper for xSemaphoreTake.)

Mostly this works… the interrupt driven state machine does its thing and once its finished it fires the semaphore which is caught by the event wait so the thread can continue and exit the routine.

I can break the execution just before the ISR throws the semaphore and then I can view the semaphore state in FreeRTOS viewer. This shows its value is 0 (not yet thrown) and it has a task waiting on it. Just as you would expect.

When it goes wrong the hardware timeout timer actions first. This fires the semaphore but when I look at FreeRTOS Viewer it shows that there is NO waiting task. MMMM!!!

To verify this I added the debug_blip… lines. These toggle io pins so I can capture real events on my analyzer.

When good the entry toggle (DEBUG5) is first, The ISR completed toggle (not shown here) is next and the exit toggle (DEBUG6) is next. No timeout toggle shows … all as anticipated.

When it goes wrong the entry toggle is first, the ISR completion toggle is next, followed by a long delay to the timer timeout toggle. No exit toggle is seen.

My conclusion (so far) is that either
a) The thread never executes the event_wait() … but I can see all the ISR traffic on my analyser so I know that it must have executed the line before (NVIC->).
b)event_wait() (wrapped xSemaphoreTake) somehow lost my request.
c)Another process was executed in between the NVIC-> line and the event_wait() line

The processor is a SAM4N @ 100MHz. I am using Atmel Studio6.2. I am running about 20 to 50 transactions a second and the fault occurs between 1 and 30 seconds of execution time.

Also another independent thread stops at the same point in time … I can see no connection between them. Plus all of the other threads, as far as I can see, run unperturbed, so the whole RTOS has not crashed or gone crazy.

FYI … setting configUSE_PORT_OPTIMISED_TASK_SELECTION in ConfigFreeRTOS.h fails to compile cleanly.

Any suggestions welcomed … I am running out of ideas

rtel wrote on Thursday, March 19, 2015:

Have you set the priority of the interrupts so they are at or below the priority specified by configMAX_SYSCALL_INTERRUPT_PRIORITY (which means a number the same or higher than the priority specified by that constant)? See http://www.freertos.org/RTOS-Cortex-M3-M4.html - your description would be a classic symptom of this being wrong.

Which version of FreeRTOS are you using (relevant to the next question)?

Do you have configASSERT() defined?

Regards.

tomwbarclay wrote on Friday, March 20, 2015:

Hi… I have gathered all the interrupt priority settings into a single conf header file to make sure that I don’t break the Cortex-M rules and I can see at a glance the relative priorities … see snippet below

// FreeRTOSConfig.h sets this value to
// #define configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY	10

// UART interrupts
#define SERIAL_UART_INTERRUPT_PRIORITY		11

// USART
#define SERIAL_USART_INTERRUPT_PRIORITY		12

// SPI interrupts
#define SPI_INTERRUPT_PRIORITY				10

// USART-SPI interrupts
#define USART_SPI_INTERRUPT_PRIORITY		13

// ADC interrupt priorities
#define ADC_INTERRUPT_PRIORITY				14

// PIOs
#define PIOA_INTERRUPT_PRIORITY				13
#define PIOB_INTERRUPT_PRIORITY				13
#define PIOC_INTERRUPT_PRIORITY				13

// TC 
#define TIMER_COUNTER_INTERRUPT_PRIORITY	12

// TWI
#define TWI_INTERRUPT_PRIORITY				15

//RTT
#define RTT_INTERRUPT_PRIORITY				14

The FreeRTOS version is 7.3.0 (as supported by Atmel  ASF 3.19.0)

Following is code snippet from FreeRTOSConfig.h

/* Normal assert() semantics without relying on the provision of an assert.h
header file. */
#define configASSERT( x ) if( ( x ) == 0 ) { taskDISABLE_INTERRUPTS(); for( ;; ) __asm volatile( "NOP" ); }

rtel wrote on Friday, March 20, 2015:

The priorities you list do look ok - assuming they are actually being
set in the code. To double check I would recommend dropping the latest
FreeRTOS files (v8.2.0 files from the FreeRTOS/source directory, and in
particular the FreeRTOS/source/portable/gcc/ARM_CM3 directory) into your
project as V7.3.0 does not contain the asserts that will catch an
interrupt priority assignment issue.

You should be able to simply overwrite your old FreeRTOS files with the
new ones and still end up with a compiling project - but naturally do
take a backup of your existing project before doing so.

Regards.

tomwbarclay wrote on Friday, March 20, 2015:

Thanks for info … will install latest RTOS… though it’ll take me a little bit of time.

FYI …
I replaced the event-wait code with a classic pre-RTOS wait on boolean to be set by interrupt. That works fine … > 1 hour run time and so far no failures…

If nothing else it weighs in the same direction as the other evidence.

tomwbarclay wrote on Saturday, March 21, 2015:

Have now ported application to FreeRTOS 8.2.0. After a few naming niggles I hit the the following line of code after starting the scheduler.

configASSERT( ( portNVIC_INT_CTRL_REG & portVECTACTIVE_MASK ) 

in function

 void vPortEnterCritical( void )

The comment is that this is caused by using a non FromISR method in an interrupt routine. So far I cant find it.

Can there be another explanation ?

rtel wrote on Saturday, March 21, 2015:

When you are stopped on that line, have a look at the call stack in the
debugger to see what is calling the taskENTER_CRTICAL() macro, and where
it is being called from. Even if this is not the root cause of your
issue it would be best to fix it.

Regards.

tomwbarclay wrote on Monday, March 23, 2015:

The code stops on the first call to xQueueReceive() function on the highest priority task. It stops there immediately after the startScheduler. I have checked the FreeRTOSConfig .h file against the one in the SAM4S demo program (the most alike processor in the 8.2.0 demo pack) and can see no important difference. The error suggest that the call to xQueueReceive() is being made from an interrupt routine without the use of the FromISR but that is not the case. Its in the for(;:wink: loop of a task.

Any help gratefully received.

rtel wrote on Monday, March 23, 2015:

Can you please post a screen shot that has both the call stack and
system registers visible in the debugger window when the error occurs so
we can take a look - thanks.

tomwbarclay wrote on Monday, March 23, 2015:

Hi, attached is PDF with screen dumps of what I believe are the relevant windows.

rtel wrote on Monday, March 23, 2015:

The memory window shows the highest priority bit of the VECTACTIVE bits
in the Interrupt Control and State Register being set (that is the ‘1’
of the ‘41’) - which would seem to indicate you are in an interrupt.

The call stack goes back as far as ulPortSetInterruptMask() - but what
called that?

Regards.

tomwbarclay wrote on Monday, March 23, 2015:

Hi, more grist to the mill … In the search for an answer I have compared the FreeRTOS8.2.0 demo program SAM4S and found that the definition for configCPU_CLOCK_HZ was different so I modified my FreeRTOSConfig.h file to match. See snippet below …

// modified to match SAM4S demo on FreeRTOS8.2.0
extern uint32_t SystemCoreClock;

// set the RTOS version as rev8 and above changed names, refs etc etc 
#define configFreeRTOSversion			8

#define configUSE_PREEMPTION			1
#define configUSE_IDLE_HOOK				1
#define configUSE_TICK_HOOK				0
// modified to match SAM4S demo on FreeRTOS8.2.0
//#define configCPU_CLOCK_HZ				( sysclk_get_cpu_hz() )
#define configCPU_CLOCK_HZ				(SystemCoreClock )

Apart from getting multiple warnings as below

Warning	1	redundant redeclaration of 'SystemCoreClock' [-Wredundant-decls]	C:\localworkspace\SAM4N\SAM4N_SDR\src\config\FreeRTOSConfig.h	69	17	SAM4N_SDR

The code compiles OK. (I don’t get these warning in the demo project, and as far as I can see the project compiler warning flags are identical.)

The main point is that when I execute the code it behaves completely differently … not sure how such a simple change could cause this. Anyway …

I am now getting a halt when a non ISR call to wait on a Q is made from a an ISR callback function. Just as one would expect.

I have also removed all the

portENTER_CRITICAL/portEXIT_CRITICAL

statements from my ISRs after reading that they should not be placed there as it causes
ISR nesting. (FYI … Its been OK from FreeRTOS 5 to 7.3.0) Perhaps you could confirm this is correct.

will report back after I chase down the isues I have outlind bove.

rtel wrote on Monday, March 23, 2015:

Was SystemCoreClock ever being set to a sensible value?

Critical sections in interrupt should use

portSET_INTERRUPT_MASK_FROM_ISR()

and

portCLEAR_INTERRUPT_MASK_FROM_ISR()

Search for these macros in /FreeRTOS/source/queue.c to see how they are
used.

Regards.

tomwbarclay wrote on Monday, March 23, 2015:

Hi, I have no idea what the SystemCoreClock was being set to … I have gone past that stage .

I have however pulled out all the ‘illegal’ cases where non ISR calls were being made in ISR situations.

I have also completed the port of FreeRTOS8.2.0 from 7.3.0. I still have a few annoying warnings about redeclaring the port.c methods
xPortPendSVHandler, xPortSysTickHandler, vPortSVCHandler ,/pre>
but it all seem to hang together.

Now back to the original issue … lost semaphore…

I have run with the code patched with the Boolean code to check its all still functional, then I have replaced the patch with the original semaphore wait and send code. Now it seems to work OK. I have only run it for 10 minutes but looks like its fixed. Before it used to crash somewhere between 1 sec and 1 minute.

The use of configAssert in the low level scheduler functions seem pretty powerful, especially when used with the call stack view. It pulled out all the funnies in couple of hours of editing and testing.

Thanks for your all your help.