Yield from ISR and Tick Interrupt Collision?

amurdoch wrote on Monday, January 30, 2012:

I’m using FreeRTOS V7.1.0 on an MSP430F5528.

My problem is that my application will run perfectly for a while and then mysteriously jump to random memory or reset altogether.  I’ve checked thoroughly for buffer overrun and stack overflow and haven’t seen any issues.  Although I admit I only mostly understand how the FreeRTOS stack usage works.  I also rewrote my entire program, stripping away FreeRTOS and it worked perfectly without it, so I’m convinced that my bug is FreeRTOS related.

I’ve noticed that when I comment out “portYIELD_FROM_ISR( xHigherPriorityTaskWoken );” at the end of my interrupt my application no longer crashes.  It simply context switches the next time the tick interrupt fires, but I’d rather not have to run the tick interrupt that often.

I’ve also noticed that when I leave “portYIELD_FROM_ISR( xHigherPriorityTaskWoken );” uncommented but change my tick rate to 1Hz my application no longer crashes.  But running the tick interrupt at 1Hz is absurd, my tasks would never take turns.

I’m running a custom application that periodically samples from the ADC and processes that data. Here are the general details:
I multiple interrupts running:
    the tick interrupt on the TIMER_B0_VECTOR
my sample initiation interrupt on the TIMER_B1_VECTOR
and a sample complete interrupt on the ADC12_VECTOR - this stores data in a buffer and sends a buffer pointer to the main sampling task using a FreeRTOS Queue
I have two tasks running:
the main sample processing task, wakes when a data buffer (sent by the ADC interrupt) is received
a second task that writes to external memory, uses a FreeRTOS semaphore for syncronization
I have two MCLK rates:
1MHz sourced from XT1 - connected to an external 32KHz crystal - multiplied using the FLL
20MHz sourced from XT2
I switch to 20MHz every time the main sample processing task begins processing.  I switch back to 1MHz in the IDLE task.
I go to low power mode 3 in the IDLE task

Any thoughts?  What are some easy ways to hung up when lots of interrupts are firing?  When using queues within ISRs? When going to low power mode?  I’m quite stumped at the moment.

Here is my code with the actual application specific code removed:

/// SD Write Task
portTASK_FUNCTION( vSDWriteTask, pvParameters ) {
static UINT8* blocks[BQ_LEN];
	/* The parameters are not used. */
	( void ) pvParameters;
	vPortFree(pvParameters);
	
	for( ;; ) {
		// wait to write
		xSemaphoreTake(mainSDWriteSemphr, portMAX_DELAY);
		GREEN_ON();
		// turn on high speed clock
		vHal_TurnOnXT2();
		
		UINT8 full_blocks;
		// read full blocks from sdfifo
		if (xSemaphoreTake(mainDataFifoMutex, 1000) == pdTRUE) {
			full_blocks = sbroDataFifo_readFullBlocks(&sdfifo, blocks);
			xSemaphoreGive(mainDataFifoMutex);
		} else {
			xTaskHandle current_task = xTaskGetCurrentTaskHandle();
			vApplicationError(GENERIC_ERROR, "SDFifo Inaccessable!\n");
		}
		
		// write to sd
		...
		...
		
		TotalBlocksWritten += full_blocks;
		// free blocks in sdfifo
		if (xSemaphoreTake(mainDataFifoMutex, 1000) == pdTRUE) {
			sbroDataFifo_freeBlocks(&sdfifo, full_blocks);
			xSemaphoreGive(mainDataFifoMutex);
		} else {
			xTaskHandle current_task = xTaskGetCurrentTaskHandle();
			vApplicationError(GENERIC_ERROR, "SDFifo Inaccessable!\n");
		}
	}
}
/// Sample Processing Task
portTASK_FUNCTION(vSampleProcessingTask, pvParameters)
{
	/* The parameters are not used. */
	( void ) pvParameters;
	vPortFree(pvParameters);
	// ------ start sampling mainState machine ------
	TB0CCR1 = TB0R + LOSAMPLECOUNTS;
	TB0CCTL1 = CCIE;
	
	// ------ setup adc ------
	vHal_SetupADC12ForSequentialSampling();
	for( ;; ) {
		event = E_NO_EVENT;
		if(xQueueReceive(EventQueue, &event, 1000) == pdTRUE) {
			// turn on high freq clock
			vHal_TurnOnXT2();
			
			switch(event) {
				case ...:
					...
					break;
			}
			
			// check for sd write
			if (xSemaphoreTake(mainDataFifoMutex, 1000) == pdTRUE) {
				// check from sd write
				if (sdfifo.full_blocks >= BLOCKS_TO_WRITE) {
					xSemaphoreGive(mainSDWriteSemphr);
				}
				xSemaphoreGive(mainDataFifoMutex);
			} else {
				vApplicationError(GENERIC_ERROR, "data fifo timeout!\n");
			}
		} else {
			vApplicationError(GENERIC_ERROR, "event queue timeout!\n");
		}			
	}	
}
void main() {
	// app specific stuff
	...
	// ------ FreeRTOS specific ------
	
	// create queues
	EventQueue = xQueueCreate(10, 1);
	
	// create mutex semaphore for sdfifo
	mainDataFifoMutex = xSemaphoreCreateMutex();
	// create semaphore for triggering the sd write task
	vSemaphoreCreateBinary(mainSDWriteSemphr);
	// take semaphore so sd write task blocks immediately (because there isn't any data to write);
	xSemaphoreTake(mainSDWriteSemphr, 1);
	
	// create tasks
	xTaskCreate( vSampleProcessingTask, ( signed char * ) "SamplePro", configMINIMAL_STACK_SIZE*3, NULL, mainSAMPLE_PROC_PRIORITY, &mainSampProcTask);
	xTaskCreate( vSDWriteTask, ( signed char * ) "SDWrite", configMINIMAL_STACK_SIZE*2, NULL, mainSD_WRITE_PRIORITY, &mainSDWriteTask);
	
	/* Start the scheduler. */
	vTaskStartScheduler();
}
void vApplicationTickHook( void )
{
const unsigned short usACLK_Frequency_Hz = 32768;
	TB0CCR0 += usACLK_Frequency_Hz / configTICK_RATE_HZ;
	
}
const unsigned short usACLK_Frequency_Hz = 32768;
	/* Ensure the timer is stopped. */
	TB0CTL = 0;
	/* Run the timer from the ACLK. */
	TB0CTL = TBSSEL_1;
	/* Clear everything to start with. */
	TB0CTL |= TBCLR;
	/* Set the compare match value according to the tick rate we want. */
	TB0CCR0 = usACLK_Frequency_Hz / configTICK_RATE_HZ;
	/* Enable the interrupts. */
	TB0CCTL0 = CCIE;
	/* Start up clean. */
	TB0CTL |= TBCLR;
	/* Continuous mode. */
	TB0CTL |= MC_2;
}
void vApplicationIdleHook( void )
{
	/* Called on each iteration of the idle task.  In this case the idle task
	just enters a low power mode. */
	TURN_OFF_XT2();
	__bis_SR_register( LPM3_bits + GIE );
	__no_operation();
}
/// ADC12 interrupt routine, process data once the A/D has sampled all 8 channels
#pragma vector = ADC12_VECTOR
static __interrupt void ADC12ISR (void)
{
	// RED_ON();
	
	// Clear interupts and disable ADC.
	ADC12IFG = 0;			// clear ADC12 interrupt
	ADC12CTL0 = 0;			//stop conversion
	ADC12CTL0 = 0;
	
	// get data
	// ...
	
	portBASE_TYPE xHigherPriorityTaskWoken = pdFALSE;
	if (option_1) {
		// save data to buffer
		...
		...
		
		// send for processing every 100 times
		if (finished) {
			// some processing
			...
			event = 0
			xQueueSendFromISR(EventQueue, &event, &xHigherPriorityTaskWoken);
			if (done) {
				// decision
				if (decision) {
					event = 3;
					xQueueSendFromISR(EventQueue, &event, &xHigherPriorityTaskWoken);
				}
			}
			__bic_SR_register_on_exit(LPM3_bits);
		}
	} else {
		// save data to buffer
		...
		// send for processing every 50 times
		if (finished) {
			// some processing
			...
		
			UINT8 event = 1;
			xQueueSendFromISR(EventQueue, &event, &xHigherPriorityTaskWoken);
			if (done) {
				xQueueSendFromISR(EventQueue, &event, &xHigherPriorityTaskWoken);
				if (change rate) {
					event = 0;
					xQueueSendFromISR(EventQueue, &event, &xHigherPriorityTaskWoken);
				}
			}
			// wake up main from sleep
			__bic_SR_register_on_exit(LPM3_bits);
		}
	}
	// THIS MUST BE THE LAST THING DONE IN THE ISR.
	portYIELD_FROM_ISR( xHigherPriorityTaskWoken );
}
#pragma vector=TIMER0_B1_VECTOR
static __interrupt void Timer_B1(void) 
{	
	switch(__even_in_range(TB0IV, 16)) {
		case 0: break;  // No Interrupt
		case 2: 
			// TB0CCR1 - sampling
			// initiate sample
			...
			break;
		case 4:  // TB0CCR2 
		  	break; 
		case 6:
			// TB0CCR3
			break;
		case 8: // TB0CCR4 
			break;
		case 10: // TB0CCR5
			break;
		case 12: // TB0CCR6 
			break;
		case 14: // TB0CCR7
			break;
		case 16:  break;  // TB0CTL TBIFG, timer overflow
	}
}

rtel wrote on Tuesday, January 31, 2012:

There is not anything immediately that jumps out at me in your code.  What is your tick hook function doing?  It might not be relevant to your problem but it looks like it is doing a divide on each tick interrupt, which might not be a good idea - but with understanding what it is doing I’m not sure.

The context switch on the MSP430 is quite simple, so it is interesting that everything runs ok when you don’t call portYIELD_FROM_ISR().  That macro resolves to a function call, which will increase the stack usage.  I know you have mentioned you have checked the stack, but I would still think that was a good candidate for the cause.  Do you have the stack checking switch on? 

Regards.

amurdoch wrote on Tuesday, January 31, 2012:

My tick interrupt hook is actually setting the timer for the next tick interrupt.  I’m doing it this way so that  I can use Timer B in continuous mode (counts all the way up to 2**16 wraps around and keeps going).  Since that divide was a const / a macro I expected the compiler to compute it at compile time.  I was wrong it was in fact dividing it every time.  I fixed that but I’m having the same problem.

Yes I do have stack checking switched on.  But I’ll look closer at all my stacks.

richard_damon wrote on Tuesday, January 31, 2012:

My first thought of this sort of crashes is that they often relate to an interrupt priority or nesting issue. I don’t know if the MSP ports are subject to that. What can happen is you very occasionally get in interrupt that occurs during a critical section maniplating the FreeRTOS structures and the interrupt also manipulates those same structure leading to corruption and crashing.

One other thing that I see that may or may not be a problem, is that vSDTaskWrite ups the CPU speed and then possibly waits on a semaphore. If the idle task might run during that block, then it will drop the CPU speed, and that might cause problems. It may be that you know that this take will not block, or if it blocks some other task will be running and not blocked, but it seems to be a bit fragile here. Personally, If I needed to do this type of clock changes, I would have a function called when a task needed the higher speed, and another function called when it no longer needed it. These functions would have a critical section, and inside that increase or decrease a count of high speed requests, and if needed the appropriate speed shift. For safety, the idle task might have a small critical section that check if the CPU speed matches the request counter, fixing and logging the problem.

amurdoch wrote on Tuesday, January 31, 2012:

Ok, I am definitely not having a stack overflow problem.  I checked all four of my stacks, looked at the 0xA5 bytes:
IDLE stack usage: 70 / 440 bytes
SamplePro stack usage: 156 / 880 bytes
SDWrite stack usage: 92 / 440 bytes
main stack: 100ish / 256 bytes (is it true that the application never returns here unless the scheduler is ended)

Here is my FreeRTOSConfig.h, maybe it will shed some light on things:

#define configUSE_PREEMPTION			1
#define configUSE_IDLE_HOOK				1
#define configUSE_TICK_HOOK				1
#define configCPU_CLOCK_HZ				( 20000000UL )	
#define configTICK_RATE_HZ				( ( portTickType ) 100 )
#define configMAX_PRIORITIES			( ( unsigned portBASE_TYPE ) 5 )
#define configTOTAL_HEAP_SIZE			( ( size_t ) ( 4 * 512 ) )
#define configMAX_TASK_NAME_LEN			( 10 )
#define configUSE_TRACE_FACILITY		0
#define configUSE_16_BIT_TICKS			0
#define configIDLE_SHOULD_YIELD			1
#define configUSE_MUTEXES				1
#define configQUEUE_REGISTRY_SIZE		5
#define configGENERATE_RUN_TIME_STATS	0
#define configCHECK_FOR_STACK_OVERFLOW	2
#define configUSE_RECURSIVE_MUTEXES		0
#define configUSE_MALLOC_FAILED_HOOK	1
#define configUSE_APPLICATION_TASK_TAG	0
#if __DATA_MODEL__ == __DATA_MODEL_SMALL__
	#define configMINIMAL_STACK_SIZE		( ( unsigned short ) 220 )
#else
	#define configMINIMAL_STACK_SIZE		( ( unsigned short ) 80 )
#endif
/* Co-routine definitions. */
#define configUSE_CO_ROUTINES 		0
#define configMAX_CO_ROUTINE_PRIORITIES ( 2 )
/* Set the following definitions to 1 to include the API function, or zero
to exclude the API function. */
#define INCLUDE_vTaskPrioritySet		1
#define INCLUDE_uxTaskPriorityGet		1
#define INCLUDE_vTaskDelete				0
#define INCLUDE_vTaskCleanUpResources	0
#define INCLUDE_vTaskSuspend			1
#define INCLUDE_vTaskDelayUntil			1
#define INCLUDE_vTaskDelay				1
#define INCLUDE_pcTaskGetTaskName		1
#define INCLUDE_xTaskGetCurrentTaskHandle	1
/* The MSP430X port uses a callback function to configure its tick interrupt.
This allows the application to choose the tick interrupt source.
configTICK_VECTOR must also be set in FreeRTOSConfig.h to the correct
interrupt vector for the chosen tick interrupt source.  This implementation of
vApplicationSetupTimerInterrupt() generates the tick from timer B0, so in this
case configTICK__VECTOR is set to TIMERB0_VECTOR. */
#define configTICK_VECTOR				TIMER0_B0_VECTOR
/* Prevent the following definitions being included when FreeRTOSConfig.h
is included from an asm file. */
#ifdef __ICC430__
	extern void vConfigureTimerForRunTimeStats( void );
	extern volatile unsigned long ulStatsOverflowCount;
#endif /* __ICCARM__ */
/* Configure a 16 bit timer to generate the time base for the run time stats.
The timer is configured to interrupt each time it overflows so a count of
overflows can be kept - that way a 32 bit time value can be constructed from
the timers current count value and the number of overflows. */
#define portCONFIGURE_TIMER_FOR_RUN_TIME_STATS() vConfigureTimerForRunTimeStats()
	
/* Construct a 32 bit time value for use as the run time stats time base.  This
comes from the current value of a 16 bit timer combined with the number of times
the timer has overflowed. */
#define portALT_GET_RUN_TIME_COUNTER_VALUE( ulCountValue )						\
	{																			\
		/* Stop the counter counting temporarily. */							\
		TA1CTL &= ~MC__CONTINOUS;												\
																				\
		/* Check to see if any counter overflow interrupts are pending. */		\
		if( ( TA1CTL & TAIFG ) != 0 )											\
		{																		\
			/* An overflow has occurred but not yet been processed. */			\
			ulStatsOverflowCount++;												\
																				\
			/* Clear the interrupt. */											\
			TA1CTL &= ~TAIFG;													\
		}																		\
																				\
		/* Generate a 32 bit counter value by combinging the current peripheral	\
		counter value with the number of overflows. */							\
		ulCountValue = ( ulStatsOverflowCount << 16UL );						\
		ulCountValue |= ( unsigned long ) TA1R;									\
		TA1CTL |= MC__CONTINOUS;												\
	}
#endif /* FREERTOS_CONFIG_H */

amurdoch wrote on Tuesday, January 31, 2012:

richard_damon
My first thought of this sort of crashes is that they often relate to an interrupt priority or nesting issue.

This kind of thing seems plausible.

Here is what I’m seeing:
    My app is consistently getting lost in an assembly function called _CmpGe64fCc
_CmpGe64fCc is geographically located directly after prvUnlockQueue in memory
the call stack says its currently in prvUnlockQueue called from xQueueReceive in my Sample Processing Task
I traced some events by recording time at various points and it looks like
tick interrupt happens
app is in idle task
adc interrupts and the adc isr calls portYIELD
the sample processing task immediately receives on its queue, loops, and calls xQueueReceive again (where it gets stuck)
a tick interrupt is about to happend again but can’t because interrupts are disabled and we’re stuck in _CmpGe64fCc
not only are we stuck with interrupts disabled but the MCU is in low power mode

Its like the stack gets messed up right before it tries to context switch.

If I add an explicit portYIELD() to my idle task so that it yields every time it wakes up the problem seems to go away, mysterious, this shouldn’t be the real answer though

amurdoch wrote on Tuesday, January 31, 2012:

Scratch that an explicit portYIELD() doesn’t change anything.  It just took a little longer for things to hang up.

amurdoch wrote on Tuesday, January 31, 2012:

Ok, I may have found a FreeRTOS MSP430X port bug.  In portmacro.h there is a macro portENTER_CRITICAL.  This macro calls __disable_interrupt.  __disable_interrupt is simply a compiler macro for the dint assembly instruction.  The TI MSP430x5xx user manual says this:

“If any code sequence needs to be protected from interruption, DINT should be executed at
least one instruction before the beginning of the uninterruptible sequence, or it should be
followed by a NOP instruction.”

The portENTER_CRITICAL macro does not have a nop after the dint.  This is funny because portext.s43 does use a nop after dint but in portmacro.h it seems forgotten. 

I’m not sure if this will fix my problem but I think it is a bug nonetheless.  It may indeed have fixed my problem, we’ll see.

rtel wrote on Tuesday, January 31, 2012:

Note that portDISABLE_INTERRUPTS() is not intended to be a public API macro.  taskENTER_CRITICAL() should be used (although sometimes it gets through that portENTER_CRITICAL() is used, but the two are the same).

I’m not sure that the NOP would have an effect.  As you say, in the asm code it is there, but the portDISABLE_INTERRUTPS() macro is used only when starting the scheduler, and when entering a critical section (I think ?).

In the critical sectino case, the next thing in the code that is an increment on the nesting count variable - and that then is the only variable that could possible be effected.  However, I’m not sure it would be possible for two tasks to access that variable at the same time as the task entering the critical section would not even be running if another task was already in a critical section.  The critical (excuse the pun) case would be where a task called DINT, then started to access the nesting variable in the next clock slot while simultaneously an interrupt was taken.  If the interrupt was a tick interrupt, it would be possible that another task started to run before the task originally entering the critical section finished reading the nesting count variable.  I would have to think about that!

Regards.

amurdoch wrote on Thursday, February 02, 2012:

Well, after a couple of days I’m pretty sure that the missing nop was in fact my bug.  I haven’t seen any issues since I added it in.  I did notice while I was having problems that a tick interrupt was scheduled at the exact instant that my program got lost.  I set a breakpoint at the mystery memory location within the _CmpGe64fCc function so that it would break almost immediately as it crashed, and a tick interrupt was schedule at that same moment. 

I haven’t gone through the tick interrupt to see exactly why it might be crashing, but I’m pretty sure that the missing nop is to blame for my problem.

rtel wrote on Thursday, February 02, 2012:

I’m pleased to hear you have your system working satisfactory now.  I’m not sure why a nop in that particular position would make a difference, having not thought through the scenarios since my last post, but I have added them in anyway for good measure.  They will be in the next release.

Regards.