UART crashes everything on PIC32

enrad wrote on Saturday, May 06, 2017:

Processor is PIC32MX695F512L Dev-ID says 54300053 (from Real-ICE), so it should be silicon-rev 5.
I am using OPEN/FREE-RTOS

I have an ISR servicing UART1.
The UART is connected to a MAX3535, RS485 isolated tranceiver in full duplex mode (RS485-4-Wire).
My ISR detects if there is no transmission the bus, or if the transmission is incorrect data is trashed and buffers flushed.
If any of the above conditions is present for more than 600s I reset the UART.
Now this is where the problems starts:
If the RS485 bus is disconnected and un-terminated, I obviously will get a lot of random data into the UART.
My ISR will detect this as a fault and will reset the UART after 600s.
When the reset code is exicuted it seems that the Data-memory is trashed, and the RTOS seems to crash only running the communication task.

 if( last_message_time_cnt > global_inst.AQL_com_absence_time ) { // tid i ms
    //taskENTER_CRITICAL();
    //AQL_bus_RX_disable();
    //AQL_bus_TX_disable();
    last_message_time_cnt = 100000; // stop counting
    if(global_data.AutonomDrift_u16 == 0) {
     AQLB_Debug_Tx_Pointer_u16 = AQLB_Tx_Pointer_u16;
     AQLB_Debug_Rx_Pointer_u16 = AQLB_Rx_Pointer_u16;
     AQLB_Debug_Tx_Count_u16 = AQLB_Tx_Count_u16;
     AQLB_Debug_Rx_Count_u16 = AQLB_Rx_Count_u16;
     AQLB_Debug_fel_task_u32 = AQLB_fel_task_u32;
     AQLB_Debug_fel_int_u32 = AQLB_fel_int_u32;
     AQLB_Debug_Rx_int_u16 = INTGetEnable(INT_U1RX);
     AQLB_Debug_Tx_int_u16 = INTGetEnable(INT_U1TX);
     for(i = 0; i < 20; i++) {
      AQLB_Debug_Tx_Buffer_arr_u8[i] = AQLB_Tx_Buffer_arr_u8[i];
     } 
     for(i = 0; i < 24; i++) {
      AQLB_Debug_Rx_Buffer_arr_u8[i] = AQLB_Rx_Buffer_arr_u8[i];
     } 
    } 
    
    global_data.AutonomDrift_u16 = AUTONOM; // 1
    AQL_bus_Restart(); // KTL 20140728 Återställ UART om vi får fel.
   // taskEXIT_CRITICAL();
   } 

In the above code snippet, No crashes occur if the following lines is enabled:
taskENTER_CRITICAL();
or
AQL_bus_TX_disable();

If only the line AQL_bus_RX_disable(); is enabled, it will crash
Obviously the code is working now, since I currently disables both the transmitter and receiver before restarting the UART.
But I like to know why and what is happening, since I cannot see any logical reason to why it crashes

void AQL_bus_TX_disable(void) { 
 INTEnable(INT_U1TX, INT_DISABLED);
   U1STACLR=_U1STA_UTXEN_MASK;
   IFS0CLR =_IFS0_U1TXIF_MASK| _IFS0_U1EIF_MASK;
   PORTSetPinsDigitalOut(RS485_AQL_TX_ENABLE_PORT,RS485_AQL_TX_ENABLE_MASK); 
 PORTClearBits(RS485_AQL_TX_ENABLE_PORT,RS485_AQL_TX_ENABLE_MASK); 
 
}
//------------------------------------------------------------------------------------------------------------------------------------------

void AQL_bus_RX_disable(void) {
  INTEnable(INT_U1RX, INT_DISABLED); 
      U1STACLR=_U1STA_URXEN_MASK; 
      IFS0CLR=_IFS0_U1RXIF_MASK| _IFS0_U1EIF_MASK;
}

void AQL_bus_Restart(void)
 {
 U1STACLR = _U1STA_URXEN_MASK; 
 
  UARTConfigure(RS485_AQL_UART_PORT, UART_ENABLE_PINS_TX_RX_ONLY);

 UARTSetLineControl(RS485_AQL_UART_PORT, UART_DATA_SIZE_9_BITS | UART_PARITY_NONE | UART_STOP_BITS_1);
 UARTSetDataRate(RS485_AQL_UART_PORT, 40000000, AQL_bus_baud_arr[global_inst.AQL_BAUD_u16]);
 UARTSetFifoMode(RS485_AQL_UART_PORT, UART_INTERRUPT_ON_TX_NOT_FULL | UART_INTERRUPT_ON_RX_NOT_EMPTY);
 UARTEnable(RS485_AQL_UART_PORT, UART_ENABLE_FLAGS(UART_PERIPHERAL | UART_RX | UART_TX) );

 SetPriorityIntU1(UART_INT_PR2);
 SetSubPriorityIntU1(UART_INT_SUB_PR1); 
 
 AQL_bus_RX_enable();
 AQL_bus_TX_disable();
 
 AQLB_Tx_Count_u16 = 0;
 AQLB_Rx_Count_u16 = 0;
 
 AQLB_Rx_Pointer_u16 = 0;
 AQLB_Stat_u16 = AQLB_SMB_READY;
 AQLB_Rx_CRC_u16 = 0;
 AQLB_Do_Global_Save_u16 = 0; 
 } 

rtel wrote on Saturday, May 06, 2017:

I am using OPEN/FREE-RTOS

If you have an OpenRTOS license please use WITTENSTEIN’s ticketed
support service.

enrad wrote on Saturday, May 06, 2017:

currently freertos, since we still is in protype stage

rtel wrote on Saturday, May 06, 2017:

I don’t know what is going on inside the functions, but can I summaries as:

  1. Interrupt service routines access buffers.
  2. The buffers are also accessed when the UART is reset from a task.
  3. If the UART interrupts are disabled before the task accesses the
    buffers then everything is ok.

If so then I would guess that, in the case you are not disabling
interrupts, the ISR is attempting to access a buffer that is in an
inconsistent state due to the buffer being accessed simultaneously from
non-ISR code.

Or am I missing the point?

enrad wrote on Saturday, May 06, 2017:

In all honsty, I havent got a clue, what is actually happening, the only thing I’ll se is that everyting stops working, and the com-task is the only thing running.
The task itself looks like this, this is the code that actually crashes the RTOS.
When it is modified as above, no crashes (what I can see occurs)
When the crash occurs, the ISR is not running, and nothing else, exept the task below,

void ISRhandleTask(void *pvParameters){
//	int k, i, n, s, num;
	uint16 flushRX, time_100_u16, i;
	uint32 aql_old_time_u32, aql_seconds_diff_u32;
	
	aql_old_time_u32 = 0;
	time_100_u16 = 0;
	
	while(1) {
//		Print_adio("ISRHANDLE");
		if(AQLB_Do_Global_Save_u16) {
			AQLB_Do_Global_Save_u16 = 0;
	//		Global_Save();		
		}
	
		if(AQLB_Stat_u16 == AQLB_SMB_SEND) {
			// send a modbus command				
			AQLB_Stat_u16 = AQLB_SMB_SENDING;	
		}
		
		// if modbus command is retured 
		//	if(......( {
		// 	set AQLB_Stat_u16 = AQLB_SMB_READY;		
		//	}

		// om AQL-slaven fått ett meddelande
		// ktl autonom mod 20140321
		if (!(global_data.AutonomDrift_u16 == AUTONOM_MAN)) {
			if(message_flag == 1) {
				last_message_time_cnt = 0;
				message_flag = 0;
				global_data.AutonomDrift_u16 = NOT_AUTONOM;	
				global_data.AQL_aktiv_u16 = 1; // kontaktad av master
			}	
		}
		if(time_100_u16 > 200 ) {
			time_100_u16 = 0;
			
			aql_seconds_diff_u32 = RTC_Get_seconds_diff(&aql_old_time_u32);
			
			last_message_time_cnt += aql_seconds_diff_u32;

			// om AQL komunikationen har uteblivit i en viss tid (typ 10 min)
			// if( last_message_time_cnt > ( 10 )) {	
			if( last_message_time_cnt > global_inst.AQL_com_absence_time ) { // tid i ms
				last_message_time_cnt = 100000; // stop counting
				if(global_data.AutonomDrift_u16 == 0) {
					AQLB_Debug_Tx_Pointer_u16 	= AQLB_Tx_Pointer_u16;
					AQLB_Debug_Rx_Pointer_u16 	= AQLB_Rx_Pointer_u16;
					AQLB_Debug_Tx_Count_u16 	= AQLB_Tx_Count_u16;
					AQLB_Debug_Rx_Count_u16 	= AQLB_Rx_Count_u16;
					AQLB_Debug_fel_task_u32 	= AQLB_fel_task_u32;
					AQLB_Debug_fel_int_u32 		= AQLB_fel_int_u32;
					AQLB_Debug_Rx_int_u16 		= INTGetEnable(INT_U1RX);
					AQLB_Debug_Tx_int_u16 		= INTGetEnable(INT_U1TX);
					for(i = 0; i < 20; i++) {
						AQLB_Debug_Tx_Buffer_arr_u8[i] = AQLB_Tx_Buffer_arr_u8[i];
					}	
					for(i = 0; i < 24; i++) {
						AQLB_Debug_Rx_Buffer_arr_u8[i] = AQLB_Rx_Buffer_arr_u8[i];
					}	
				}	
				
				global_data.AutonomDrift_u16 = AUTONOM;  // 1
				AQL_bus_Restart();		// KTL 20140728 Återställ UART om vi får fel.
			}				
		}	
		

		if ( U1STA & AQLB_COM_ERRORS ) {     
			if (U1STAbits.FERR)	{
					global_data.AQL_Bus_stat.FRERR_u32++;
			}
			if (U1STAbits.OERR)	{
					global_data.AQL_Bus_stat.OWERR_u32++;
			}			
			if (U1STAbits.PERR)	{
					global_data.AQL_Bus_stat.PAERR_u32++;
			}
			//	Print232("OERR 241");
			//	Print232var3("AQL_Bus slave error: ",global_data.AQL_Bus_stat.FRERR_u32 
			//		,global_data.AQL_Bus_stat.OWERR_u32 ,global_data.AQL_Bus_stat.PAERR_u32 );
			
        	flushRX = U1RXREG;   // Flush register
        	flushRX = U1RXREG;   // Flush register
       	flushRX = U1RXREG;   // Flush register
        	flushRX = U1RXREG;   // Flush register         
      
        	flushRX = U1RXREG;   // Flush register
        	flushRX = U1RXREG;   // Flush register 
        	flushRX = U1RXREG;   // Flush register 
        	flushRX = U1RXREG;   // Flush register 
			// Print_adio_var("AQLB_Rx_Pointer_u16 ",AQLB_Rx_Pointer_u16);
        	AQLB_Rx_Pointer_u16 = 0;
        	AQLB_fel_task_u32++;				
       
         U1STACLR=_U1STA_OERR_MASK | _U1STA_FERR_MASK | _U1STA_PERR_MASK;   
      }

		delay_time = 5; // 2ms
		time_100_u16 +=delay_time;				
		vTaskDelay(delay_time);
	}	
}

rtel wrote on Sunday, May 07, 2017:

I’m not sure it is unexpected that resetting something without first
disabling its interrupts would cause a problem. Really you want to stop
the peripheral operating before resetting it, especially if you are
accessing the same data from the interrupt handler and a task.

Can you explain what you mean by that is the only task that is running.
Is it the highest priority task in the system? In which case if it is
stuck in a loop it will prevent lower priority tasks from running.

How are you determining that this is the only task that is still running.

Do you have configASSERT() defined to something that will sit in a loop?

If that really is the only task that is running, and you pause the
debugger, what is the task doing?

enrad wrote on Sunday, May 07, 2017:

All tasks have the same priority.
The other tasks do visual things, like display-update and blinking som LED’s amongst other things.
The ISR is also supposed to blink some LED’s
Since nothing of that is happening, i Would assume that no other tasks is running.
When I pause the debugger and singlestep/step over the only thing that happens is that the code is switching between the com-task and vTaskDelay.
The only configASSERT is the defaults.
For debug reason I have temporarly enabled “vApplicationStackOverflowHook()” but it never gets there.

Yes I know that one should stop the pheripial before changing it, obviously a miss from my side, however it shouldn’t have so dramatical effects, I would think.

rtel wrote on Sunday, May 07, 2017:

If vTaskDelay() is blocking for the correct time it would seem the OS is
still running. Is it possible the interrupt is continuously being
re-entered, so most processing time is taken up by repeated execution of
the handler?

enrad wrote on Sunday, May 07, 2017:

Well, I thought so to, but the handler is not running (no LED’s blinking).
Yes I think that the OS is till running, but the other tasks is destroyed, for some reason