Odd problem with queues

I have a queue which is behaving oddly.

it is a standard dynamic queue, 32 byte entities.

when limited to 20 entities, and fed fed 22, the system deadends at the following in lists.c

		/* *** NOTE *********************************************************** 		If you find your application is crashing here then likely causes are 		listed below.  In addition see https://www.freertos.org/FAQHelp.html for 		more tips, and ensure configASSERT() is defined! 		https://www.freertos.org/a00110.html#configASSERT  			1) Stack overflow - 			   see https://www.freertos.org/Stacks-and-stack-overflow-checking.html 			2) Incorrect interrupt priority assignment, especially on Cortex-M 			   parts where numerically high priority values denote low actual 			   interrupt priorities, which can seem counter intuitive.  See 			   https://www.freertos.org/RTOS-Cortex-M3-M4.html and the definition 			   of configMAX_SYSCALL_INTERRUPT_PRIORITY on 			   https://www.freertos.org/a00110.html 			3) Calling an API function from within a critical section or when 			   the scheduler is suspended, or calling an API function that does 			   not end in "FromISR" from an interrupt. 			4) Using a queue or semaphore before it has been initialised or 			   before the scheduler has been started (are interrupts firing 			   before vTaskStartScheduler() has been called?). 		**********************************************************************/  		for( pxIterator = ( ListItem_t * ) &( pxList->xListEnd ); pxIterator->pxNext->xItemValue <= xValueOfInsertion; pxIterator = pxIterator->pxNext ) /*lint !e826 !e740 !e9087 The mini list structure is used as the list end to save RAM.  This is checked and valid. *//*lint !e440 The iterator moves to a different value, not xValueOfInsertion. */ 		{ 			/* There is nothing to do here, just iterating to the wanted 			insertion position. */ 		} 	} 

Now just looking at it, there are other tasks set up and are running.

What puzzles me, is if I write 22 items to the queue, Tracalyzer shows that 2 writes are blocked, and the monitor says “two tasks are blocked writing to the queue” and I deadend at the above code. The iterator says that there is only one item in the list.

I’d understand that, but ask how…. (I see where the code deadends if there’s only one item). But if I expand the queue capacity to more than the 22 items I write, the whole thing works just fine without deadending that queue. And in the blocked case, it never tries to read the queue nor starts the task that reads it. Program never gets that far.

Any ideas? The program is complex enough to counterindicate big examples. The version is 10.3.1 running on an ST L562 processor.

Thanks

as the comment says, probably something else corrupting memory near your queue. Very typical.

Assuming that you have already defined configASSERT and stack overflow checking, the next step would be to start commenting out parts of your application to figure out the problematic part. Also, check all ISRs to ensure that you are not calling any non-FromISR API in an ISR.

ConfigASSERT is defined as far as I understand it. StackOverflow checking is enabled, and has not been triggered.

I’ve tried both static and dynamic allocation, and with both user heap allocation and normal heap allocation.

Trying to write more than the queue’s size results in the queue’s software crashing, and not blocking as it should.

An interesting point is that when the item size for the queue is less than the item trying to be written, there’s no error. I suspect that the program only writes X amount of data regardless of what the source size happens to be.

I did check again, and for whatever reason, the the program doesn’t handle situations where the writing ought to be blocking. The tasks are blocked, but the next iterator section of code just hangs. Increasing the queue size so that this behavior is not triggered works fine.

curious……….

Can you share your queue creation, send and receive code?

If write size is variable, may be you need different type of serialization, a stream buffer instead of queue, which element size is strictly fixed.

I have the compile time options of both static and dynamic allocation. Static is in a dedicated section of expansion memory, dynamic is (currently, due to compile time overruns) also in a separate part of memory (user heap defined as 1).

Definition of sizes

// *********************************************************************************************************
// *************************** PCA9634 QUEUE PARAMETERS ****************************************************
// *********************************************************************************************************

#define _DEFAULT_PCA9634_QUEUE_LENGTH 		30			// PCA9634 QUEUE
#define _DEFAULT_PCA9634_ITEM_SIZE			sizeof (struct I2C_PACKET_data)			// STANDARD PACKET SIZE
#define _DEFAULT_PCA9634_STACK_SIZE			14000		// task stack size 7000

This is the definition of the static buffers.

struct PCA9634_static
{
//	// BUSY
//	StaticSemaphore_t 				SemaphoreBuffer;

	// QUEUE TO TASK
	uint8_t							QueueStorageArea[(_DEFAULT_PCA9634_QUEUE_LENGTH) * sizeof (union PACKET_DATA)];
	StaticQueue_t	  				QueueBuffer;

	// TASK STACK
	StackType_t						stack[_DEFAULT_PCA9634_STACK_SIZE];
//	StackType_t*					stackptr = (uint8_t*)&stack[0];
    StaticTask_t 					TaskBuffer;
};



struct STATIC_MAP
{
	#ifdef _USART
		struct USART_static					USART_STATIC[_MAX_USART];
	#endif

	#ifdef _I2C
		struct I2C_static					I2C_STATIC[_MAX_I2C];
	#endif

	#ifdef _XHEAP
		struct XHEAP_static					XHEAP_STATIC;
	#endif

	#ifdef _PCA9634
		struct PCA9634_static				PCA9634_STATIC[_MAX_PCA9634];
	#endif

	#ifdef _SYSTEM_LED
		struct LED_static					LED_STATIC[_SYSTEM_LED];
	#endif

};

This change let me have static allocation mixed with dynamic in terms of code generation, normally the commented lines are in.


//#ifdef _FREERTOS_STATIC
		struct STATIC_MAP __attribute__ ((section (".freertos_static_data"))) static_map;
//#endif


This is the create routine. It’s been patched to force the queue to be static, and the task to be dynamic. That’s for debugging, but really, didn’t help.


	bool PCA9634::create (uint8_t which, int Interface, int Address)
	{
		chip_address = Address;
		chip_interface = Interface;

		if (which < 0) return false;
		if (which >= _MAX_PCA9634) return false;

		// do not try to initialize hardware on another board
		if ((chip_address == 0) || (chip_interface == 0)) return true;
//		#ifdef _FREERTOS_STATIC
			PCA9634_queue = xQueueCreateStatic(
				_DEFAULT_PCA9634_QUEUE_LENGTH,												// queue length
				_DEFAULT_PCA9634_ITEM_SIZE,											// item size
				static_map.PCA9634_STATIC[which].QueueStorageArea,		// pointer to queue storage byte array
				&static_map.PCA9634_STATIC[which].QueueBuffer);						// pointer to system queue variable
//			#else
//				PCA9634_queue = xQueueCreate(_DEFAULT_PCA9634_QUEUE_LENGTH,sizeof(union SYSTEM_PACKET_type));
//			#endif


		if (PCA9634_queue == nullptr) ERROR1(__FILE__, __LINE__, (char*)"PCA9634 QUEUE CREATE ERROR");

		vQueueAddToRegistry(PCA9634_queue, "PCA9634_QUEUE");

		#ifdef _FREERTOS_STATIC1
				task_handle = xTaskCreateStatic(
				(TaskFunction_t) &PCA9634_TASK,
				"PCA9634_Task",
				_DEFAULT_PCA9634_STACK_SIZE,
				(void*) this,
				_PCA_TASK_PRIORITY,
				static_map.PCA9634_STATIC[(int8_t)which].stack,
				&static_map.PCA9634_STATIC[(int8_t)which].TaskBuffer
				);
		#else
			xTaskCreate(
				(TaskFunction_t) &PCA9634_TASK,
				"PCA9634_Task",
				_DEFAULT_PCA9634_STACK_SIZE,
				(void*) this,
				_PCA_TASK_PRIORITY,
				&task_handle);
		#endif


		if (task_handle == nullptr) ERROR1(__FILE__, __LINE__, (char*)"PCA9634 TASK CREATE ERROR");

		return true;
	}


Here’s the send routine, which results in an 18 byte packet. I’ll explain that at the end.


	uint64_t SYSTEM_PCA9634::COMMAND
	(
		enum LED_command_type		command,
		uint8_t						which_led,
		uint8_t						argument[8],
		uint8_t						delay,			// 100 ms intervals
		union AT24X_ROM_DATA_type*	result_data
	)


	{
	uint64_t					error_code = 0;
	// code to format for packet based execution
	// chip can be local packet controlled or remote packet controlled

	uint32_t									i;
	// set up TX packet header address (format packet does rest of header)
	// force local network operations
	reply->TX_packet.R.MSG.MSSG.header.address = int (LOCAL_NETWORK) + ((subnet >> 4)  & 0xF00) + (node_address & 0xFF);

	// at_data_type is an overlay on data_packet.bin
	// build generic I2C interface packet, works for most I2C chips
	// zero out I2C packet (not chip specific) data
	// data is of form command/interface/address/chip/bytecount followed by possible chip specific data bytes
	// reply is chip's field

	for (i = 0; i < sizeof(reply->TX_packet.R.packet_data.BIN); i++) reply->TX_packet.R.packet_data.BIN[i] = 0;

	// build PCA9634 overlay for I2C specific data
	// other data will be overlaid on chip specific area
	// this is unused if no packets

	// I2C generic data
	reply->TX_packet.R.packet_data.I2C.interface = chip_interface;							// of chip
	reply->TX_packet.R.packet_data.I2C.address = chip_address;								// of chip
	reply->TX_packet.R.packet_data.I2C.chip = I2C_PCA9634;									// chip type
	reply->TX_packet.R.packet_data.I2C.byte_count = sizeof(union AT24X_ROM_DATA_type);		// of chip

	// PCA chip specific
	reply->TX_packet.R.packet_data.I2C.IC_DATA.PCA.command = command;									// duplicated?
	reply->TX_packet.R.packet_data.I2C.IC_DATA.PCA.data.which_led = which_led;
	for (i = 0; i < 8; i++)
	{
		reply->TX_packet.R.packet_data.I2C.IC_DATA.PCA.data.value[i] = argument[i];
	}
	reply->TX_packet.R.packet_data.I2C.IC_DATA.PCA.index = LED_NO_TYPE;		// may be wrong
	reply->TX_packet.R.packet_data.I2C.IC_DATA.PCA.subsystem = SUBSYSTEM_PCA9634;
	reply->TX_packet.R.packet_data.I2C.IC_DATA.PCA.data.delay = delay;
	switch (net)
	{
		// code to format for direct execution
		// take program data from calling routines

		case NO_NETWORK:						// No network (for direct local non-packet control where possible)
		{
			// this switch must enumerate all implemented chip commands, which propagate directly to the parent class (chip driver)
			// no packets are used, no queues are used, reply is not yet set up for response
			#ifdef _PERCEPIO_EVENT
				xTracePrint(Trace_Channel, "PCA9634 WRITE START");
			#endif

			// 32 bytes
			xQueueSend(PCA9634_queue, &reply->TX_packet.R.packet_data.I2C,portMAX_DELAY );
//			xQueueReceive(me->PCA9634_queue, &received_packet, portMAX_DELAY );


and finally, the receive task:



void PCA9634_TASK( void const * argument)
{
	PCA9634*						me;
	union PACKET_DATA 				received_packet;
	struct reply_data_type			reply;
	enum LED_command_type			command;
	uint8_t							which_led;
	uint8_t							leds[8];
	union AT24X_ROM_DATA_type*		result_data;
	int								i;
	uint64_t						error_code;
	uint16_t						delay_time;

	me = (PCA9634*) argument;
	#ifdef _STATISTICS
		statistics_tag(__FILE__, __LINE__, (char*) "Before PCA9634 task");
	#endif

	while (1)
	{
		if (uxQueueMessagesWaiting(me->PCA9634_queue) != 0)
		{
			xQueueReceive(me->PCA9634_queue, &received_packet, 0 );

			// message format is meant for I2C
			delay_time = 100 * received_packet.I2C.IC_DATA.PCA.data.delay;
			result_data = &received_packet.I2C.IC_DATA.AT.rom_data;
			command = (enum LED_command_type)received_packet.I2C.IC_DATA.PCA.command;
			which_led = received_packet.I2C.IC_DATA.PCA.data.which_led;
			for (i = 0; i < 8; i++) leds[i] = received_packet.I2C.IC_DATA.PCA.data.value[i];
			result_data->bytes[0] = received_packet.I2C.IC_DATA.PCA.data.value[0];

			error_code = me->direct_execute
			(
				which_led,
				command,
				&leds[0],
				&result_data->bytes[0]);

			reply.result = error_code;
			vTaskDelay(delay_time);
		}
		else
		{
			vTaskDelay(20);
		}
	}
}




HOW IT WORKS:

The “device”, a PCA9634 LED controller, is created with an assignment of either “NO_NETWORK, LOCAL_NETWORK, NRF_NETWORK…etc.

This assignment controls the transport layer. The device is created at the system driver level, where a command is sent in various ways. NO_NETWORK bypasses the transport layer and talks to the local driver. LOCAL_NETWORK puts the packet on the main queue, where it is removed and routed to the local driver (works wonderfully for debugging), NRF_NETWORK sends the packet over the NRF mesh network.

There’s a separate packet task to read the main queue and send the packet where it needs to go, local or otherwise. This task then puts a fraction of the packet (all that is needed for the chip) on a local queue. The local queue (the queue in question) reads from the PCA9634 queue, executes the command, then delays a designated amount. Without that delay, everything happens at once and you never see what the chip does other than a blink…

The design advantage with the whole system is that from the system standpoint, any device can be remote controlled on any board. All you need is the low level support drivers on the destination board. The user does not need to concern himself with how it happens, the transport layer handles that.

The local (no network) option gives you the fastest response at the cost of disallowing the network options.

The queue behaves identically in either static or dynamic allocation modes.

in list.c, it hangs at this statement.

		/* *** NOTE ***********************************************************
		If you find your application is crashing here then likely causes are
		listed below.  In addition see https://www.freertos.org/FAQHelp.html for
		more tips, and ensure configASSERT() is defined!
		https://www.freertos.org/a00110.html#configASSERT

			1) Stack overflow -
			   see https://www.freertos.org/Stacks-and-stack-overflow-checking.html
			2) Incorrect interrupt priority assignment, especially on Cortex-M
			   parts where numerically high priority values denote low actual
			   interrupt priorities, which can seem counter intuitive.  See
			   https://www.freertos.org/RTOS-Cortex-M3-M4.html and the definition
			   of configMAX_SYSCALL_INTERRUPT_PRIORITY on
			   https://www.freertos.org/a00110.html
			3) Calling an API function from within a critical section or when
			   the scheduler is suspended, or calling an API function that does
			   not end in "FromISR" from an interrupt.
			4) Using a queue or semaphore before it has been initialised or
			   before the scheduler has been started (are interrupts firing
			   before vTaskStartScheduler() has been called?).
		**********************************************************************/

		for( pxIterator = ( ListItem_t * ) &( pxList->xListEnd ); pxIterator->pxNext->xItemValue <= xValueOfInsertion; pxIterator = pxIterator->pxNext ) /*lint !e826 !e740 !e9087 The mini list structure is used as the list end to save RAM.  This is checked and valid. *//*lint !e440 The iterator moves to a different value, not xValueOfInsertion. */
		{
			/* There is nothing to do here, just iterating to the wanted
			insertion position. */
		}
	}

Stack overflow does not seem to be a problem, and stack overflow hook is enabled.

I’ve worked with interrupt priorities, and I’m not convinced that this is a problem in this case.

There aren’t any critical sections, and these are normal tasks, so no FROM_ISR problems that I can see.

This is done well after the OS is running.

Hope this helps

While this is a single producer/single consumer situation, the packet size is fixed, so a standard queue is fine, unless there’s another advantage.

What I meant is that if I set the maximum queue entries to 20, and then write 24 items to the queue, it crashes (hangs) in the situation described in list.c. Both queue beginning and queue end are identical, so the execution apparently can’t proceed.

It almost looks as if the queue pointers wrap and end up with the same value.

  xQueueReceive(me->PCA9634_queue, &received_packet, 0);

This will always copy x bytes to the destination. x is determined by how you created the queue which happens here:

  	PCA9634_queue = xQueueCreateStatic(
  		_DEFAULT_PCA9634_QUEUE_LENGTH,												// queue length
  		_DEFAULT_PCA9634_ITEM_SIZE,											// item size
  		static_map.PCA9634_STATIC[which].QueueStorageArea,		// pointer to queue storage byte array
  		&static_map.PCA9634_STATIC[which].QueueBuffer);						// pointer to system queue variable

// #else
// PCA9634_queue = xQueueCreate(_DEFAULT_PCA9634_QUEUE_LENGTH,sizeof(union SYSTEM_PACKET_type));
// #endif

thus, the receive routine writes either sizeof(union SYSTEM_PACKET_type) or _DEFAULT_PCA9634_ITEM_SIZE to to your local variable received_packet which has size sizeof(union PACKET_DATA). If either of the two is larger in size than sizeof(union PACKET_DATA), you will overwrite your receiver array. Did you ensure that this does not happen?

Thanks, I did change the code slightly:

#define _DEFAULT_PCA9634_QUEUE_LENGTH 		20			// PCA9634 QUEUE
#define _DEFAULT_PCA9634_ITEM_SIZE			sizeof (struct I2C_PACKET_data)			// STANDARD PACKET SIZE
#define _DEFAULT_PCA9634_STACK_SIZE			14000		// task stack size 7000

Note that the queue item size is now a struct and not a union.

void PCA9634_TASK( void const * argument)
{
	PCA9634*						me;
	struct I2C_PACKET_data			received_packet;
	struct reply_data_type			reply;
	enum LED_command_type			command;
	uint8_t							which_led;
	uint8_t							leds[8];
	union AT24X_ROM_DATA_type*		result_data;
	int								i;
	uint64_t						error_code;
	uint16_t						delay_time;

	me = (PCA9634*) argument;
	#ifdef _STATISTICS
		statistics_tag(__FILE__, __LINE__, (char*) "Before PCA9634 task");
	#endif

	while (1)
	{
		if (uxQueueMessagesWaiting(me->PCA9634_queue) != 0)
		{
			xQueueReceive(me->PCA9634_queue, &received_packet, 0 );

			// message format is meant for I2C
			delay_time = 100 * received_packet.IC_DATA.PCA.data.delay;
			result_data = &received_packet.IC_DATA.AT.rom_data;
			command = (enum LED_command_type)received_packet.IC_DATA.PCA.command;
			which_led = received_packet.IC_DATA.PCA.data.which_led;
			for (i = 0; i < 8; i++) leds[i] = received_packet.IC_DATA.PCA.data.value[i];
			result_data->bytes[0] = received_packet.IC_DATA.PCA.data.value[0];

			error_code = me->direct_execute
			(
				which_led,
				command,
				&leds[0],
				&result_data->bytes[0]);

			reply.result = error_code;
			vTaskDelay(delay_time);
		}
		else
		{
			vTaskDelay(20);
		}
	}
}


and the received packet is now a struct, not a union.
Unfortunately, this seems to make no difference.

in List.c, it still hangs with

		for( pxIterator = ( ListItem_t * ) &( pxList->xListEnd ); pxIterator->pxNext->xItemValue <= xValueOfInsertion; pxIterator = pxIterator->pxNext ) /*lint !e826 !e740 !e9087 The mini list structure is used as the list end to save RAM.  This is checked and valid. *//*lint !e440 The iterator moves to a different value, not xValueOfInsertion. */
		{
			/* There is nothing to do here, just iterating to the wanted
			insertion position. */
		}

and pxNext and pxPrevious are identical values.

Think that this didn’t happen as far as I can see.

Thanks, though

So rom_data is an inline union defined somewhere ? (For me there are too much unions in your code :wink: ) and the address of operator can be safely used to assign the pointer to result_data and writing to the bytes array element 0 ?

BTW what”s the reason for

I should also add that I completely eliminated the read task (as above) and the program still hangs. Works fine with 24 items written to a 30 item queue, hangs with 24 items written to a 20 item queue. Definitely on the write side of it.

This is a packet oriented system. There’s a header (well defined), and a payload. You’re looking at stuff defined in the payload. I suppose I could have cast the thing as binary, and just sent that, but getting a payload and going through the structure to find what’s what was easier in debugging.

result data is a copy of the incoming packet with the data field replaced by results. The incoming packet contains routing data. To reply, simply exchange to and from fields and put it on the main queue. It will automatically be returned by the transport layer.

This follows the structure of a generic “send to I2C chip” and “possibly get something back”. So the packet data has, in part, the I2C interface and the address of the chip., the specific chip data contains commands and data for the chip itself, specific to the chip. The reply is just an overlay on binary of integers, floats, etc, whatever the desired response is. Easier that way.

As far as the difference in structure (messageswaiting), it’s a bit easier to debug, and I’ve had instances in the past where the scheduler wouldn’t let anything else run. It really shouldn’t make a difference, but it’s slightly more flexible if I need to play with things.

Thanks

so do you see the hang both with static and dynamic allocation or just one of them? If both, does the hang manifest itself differently?

Both static and dynamic behave identically. Since I’m not even trying to read the queue (code is commented out), I’m blaming the queue writing process. As I mentioned, hangs at the code segment given with beginning and end markers the same.

I’m beginning to wonder if it’s a software flaw, where perhaps (possible) that the list software has been corrupted. I can replace the source and include files for list (and queue) if needed.

Thanks

who provided the port for your target? A candidate for what you see is a faulty implementation of the critical section which is typically part of the port, as listed in the comment you quoted earlier. Or a faulty configuration of interrupt priorities.

Does your problem happen predictably or intermittent?

With the queue length set short, it’s very reliably wrong. This is ST-Micro’s own code.

It seems a recent happening, but then again, I’m trying to adjust parameters and dealing with static vs. dynamic allocation. Anything could be wrong, although since the same thing happens whether or not the queue is static or dynamic suggests something else.

Not sure how interrupt priorities play here, since I’m writing the queue and not reading it at all. The queue reading task is commented out completely. I’ll play with them, though.

I’ll see if another section of the code, which is similar, has the same problems.

for example, if the PendSVc interrupt does not have the lowest interrupt priority, there can be attempts to restore what is not a valid task context. The behavior you see does not exactly hint at that, but misconfigurations can yield many different error scenarios. As the comment in the code suggests, it may also be a manifestation of an interrupt with pri > MAX_SYSCALL attempting to call a FreeRTOS API function.

Have you already shown us your freertosconfig.h file?

FreeRTOSConfig.h (8.8 KB)

Here’s the file.

Right now, other than what FreeRTOS does, there are no active interrupts, (SPI, I2C, USART, etc running.

I think that the whole interrupt system is turned off when we got the endless loop for the queue. That would explain why the program never exits that loop. However, since there’s no trap for next and previous being the same, that seems to fall into the philosophy of “of course it’s an error, that’s why we let the program hang.” It isn’t my philosophy, but I do have access to the hardware, so when that happens, I can at least light up a light. Requires only a processor pin for me.

My question, perhaps rhetorical, is that under what circumstance will this happen? It can’t be initialization, since that kind of error, by design, wouldn’t happen. Since this happens when the queue capacity is exceeded, I’d suspect something there.

Thanks

Life has gotten more interesting.

situations:

  1. full system:
    1. user program generates PCA9634 commands, full sized packets.
    2. packets placed on transport queue (many entries, full packet size).
    3. packet task reads main queue, directs packet according to type.
    4. packet for PCA9634 is examined and put on PCA9634 queue
    5. Queue is read, and data controls PCA9634
  2. Partial system:
    1. User program generates PCA9634 commands, partial packets generated
    2. partial packets placed in PCA9634 queue
    3. Queue is read, and data controls PCA9634.

Now here’s where things get interesting. Test situations are about 24 patterns controlling two tricolor LEDS and one white LED.

  1. In a partial system,
    1. with the queue sized to hold all items without blocking, the queue system works as advertised.
    2. With the queue sized to force blocking, lists.c hangs on the PCA9634 queue
  2. With a full system,
    1. all queues sized to avoid blocking, everything works.
    2. With the PCA9634 queue sized to force blocking, and the packet queue sized to avoid blocking, the program works.
    3. With the PCA9634 sized to avoid blocking, and the packet queue sized to force blocking, the main packet queue hangs at list.

The reason I never had this as a problem is that the main packet queue was never full.

I am puzzled.