Odd problem with queues

Feels like a memory corruption to me. Can you share the complete code as a zip or put it in a GitHub repo somewhere?

it is still unclear what the relationship between that and

is. In the dynamic setup, your queue is configured to hold x items of sizeof(SYSTEM_PACKET_type) but you copy to an identifier of size I2C_PACKET_data . Again, did you ensure that those sizes match? Typically it is good practice to restrict queue operations to members of the same data type.

Edit: Insertion into queues where source and target sizes do not match is normally not a problem, at least not in terms of memory overwrite potential; in the worst case, the source is truncated (which may lead to follow up issues on the application level) or stuffed with random data. The bigger potential problem is with receiving from such a queue if the destination is smaller than the queue element size because then memory in the target vicinity may get overwritten.

1 Like

I would, but it’s way too complicated (the system) to reproduce easily. I’ll see if I can reduce the problem a bit.

Please see the next message for an idea of the design strategy.

We’re talking two queues here.

The main queue takes 64 byte elements, of which the first 32 bytes are constructed and then sent over any network by the transport layer. The remainder of those 32 bytes are not sent, but are bits broken out, helper data, and the like.

Almost everything feeds the main queue, networking and user routines, etc. The idea is once the packet is constructed (contains routing data and a data segment), the transport layer sees that it goes where it needs to go. The transport layer is fed by a reader task.

If addresses match to local, then the whole packet is processed. The PCA9634 is an LED control chip which needs very little data once you look at it, obviously, routing data is not needed by the actual chip control routines.

The data area of the packet is a number of unions, allowing decoding of the data in the raw form without having to cast into a particular format. Once you have the entire large packet, you can check the data by looking at the unions.

The problem with the PCA9634 is that proper use requires writing to the chip, then waiting enough time to notice the lights, then getting the next command. It’s intended to be a background function. That’s the need for the PCA9634 queue, since I don’t want to block the main queue when it has data.

But the PCA9634 queue does not need all the information in the main queue when doing a write, so the I2C data part is just the much smaller amount of data needed to write to the chip to control an LED.

union SYSTEM_PACKET_type
{
		struct   PACKET_RECORD 										R;
		uint8_t														U8[sizeof(struct PACKET_RECORD)];
		uint16_t													U16[sizeof(struct PACKET_RECORD)/2];
		uint32_t													U32[sizeof(struct PACKET_RECORD)/4];
};

where packet record is defined:

truct  PACKET_RECORD
{
	// part of packet sent (32 bytes maximum)
	union MESSAGE_type			MSG;

	// added data only local, not sent over network
	union PACKET_DATA			packet_data;				// overlay		 ( VARIABLE BYTES, NOT SENT)
	void*						driver;						// if driver == nullptr then we're not talking to a chip
	int							line_number;				// line number tag
	enum task_tag_type			task_tag;					// diagnostic
//	int16_t						dummy[2];					// dummy/garbage for overruns
	void*						reply_ptr;					// reply pointer
};


Then the definition of MSG:

// SEND MESSAGE_type TO NRF CHIP (bin[32])
union MESSAGE_type
{
	struct MESSAGE				MSSG;
	uint8_t						bin[32];
};


and what really gets sent:

#pragma pack(1)

	// BUILD MESSAGE BY PROGRAM, 32 bytes which is all that is sent as an individual packet
	struct MESSAGE
	{
		// part of packet sent (32 bytes maximum)
		struct packet_header_type	header;				// packet header (12 BYTES), routing information
		union  PACKET_DATA			payload;			// maximum 18 bytes
	};

now looking only at the payload:

You can see how the packet becomes multi-purpose at the point

// used to be a floating overlay, is now part of packet structure
// this is payload
union  PACKET_DATA
{
	// packed structures
	struct LED_PACKET_data			LED;				// non I2C, pin mapped LED
	struct I2C_PACKET_data			I2C;				// all I2C chips
	struct SYS_PACKET_data			SYS;				// system command data area
	union  MAP_PACKET_data			MAP;				// map data when sent
	struct DISPLAY_ALIAS_data		DIS;				// display alias data

	uint8_t							BIN[PACKET_DATA_SIZE];
};

and in this case, the I2C data is what we want (LED is for hardwired LEDS and may be obsolete), SYS involves mesh networking, etc.

so that is:

#pragma pack(1)
		// total size 18 bytes maximum
		struct I2C_PACKET_data
		{
				// effective I2C header
				// GENERIC FOR ANY I2C CHIP
				uint8_t						interface; 					// 1..4, typically (local)
				uint8_t						address;					// 0..127 typically of device to talk to (local)
				enum chip_type				chip; 						// enumerated chip type (driver to call)
				uint8_t 					byte_count;					// data byte count (not including I2C header), used for bytes_to_send

				// SPECIFIC TO CHIP DATA
				// effective I2C data, may or may not be used
				union  IC_DATA_TYPE			IC_DATA;					// based on chip type (includes command) (14 bytes maximum)
		};

Note on above, I2C is a bus structure, and the chip can have at least 4, so we need to know which interface, the address on that bus, the chip type (splits out into specific drivers) and so on. The IC_DATA is chip specific, but that’s handled in the execute routine.

So two queues, different capacities, and different size entries. If they are mixed (and should not be), it’s an error.

Shouldn’t be having a problem with different queue element sizes, certainly not intended.

The structure is complicated by the NRF24L01 RF mesh networking chip (another large section of code) having a limited packet payload size of only 32 bytes. For any mesh networking situation, you need to know who sent it, where it goes next, and where it’s finally going to be. Just for fun (odd definition of fun), the header structure is here:

#pragma pack(1)
	struct  packet_header_type
	{
		uint16_t					address;			// address on interface (@1)(I2C, SPI, IEEE488, SERIAL?, ESP, NRF)
		uint16_t					hop;				// address on interface (@1)(I2C, SPI, IEEE488, SERIAL?, ESP, NRF)
		uint16_t					from;				// address on interface (@1)(I2C, SPI, IEEE488, SERIAL?, ESP, NRF)
		uint16_t					checksum;			// for entire packet, adds to zero (0x0000)
		uint16_t					offset;				// offset into RAM receive buffer (@7}
		int16_t						sequence1;			// sequence number for matching reply to request, combined with flag upper 3 bits
		enum PACKET_subsystem_type	subsystem;			// subsystem
		enum PACKET_command_type	packet_command;		// message type (see system/user types
	};
#pragma pack()

which allows assembling blocks of data from packets, and redirecting packet data to other queues/systems when at the destination. Your implementation may vary.

To expand the system, all you need to do is define another item in PACKET_DATA, for instance, if you added a CAN interface, just define a CAN_PACKET_data type.

thanks, but nothing of that is of interest for the problem theory at hand…

when you create a queue of, say, 4 elements of 10 bytes each, then each write to the queue will copy 10 bytes from the source to the queue memory. Likewise, when you read (receive) from the queue, the receive funtion will copy 10 bytes from the queue head position to the destination. If the destination is an array of, say, only 8 bytes, the receive funtion will overwrite 2 bytes past the array.

So the question is not what the union does but simply what the number of bytes in both structures is and thus whether there is potential for overwrite at read.

or accessing a struct member (for example writing to it) in a union in a union .. not properly initialized (by sender) due to simple programming bug.

As far as I can tell, they’re matched.

What puzzles me is that the secondary queue and primary queues work differently in different situations. I think there’s something in that part to check.

I’d agree, except that this is on writing, and the queue write is strictly binary and data agnostic.

I have a suspicion that I need to check.

very suspicious subclause… :wink:

If you have a debugger, the safe way to be sure is this:

  • trace into the xQueueCreate function and remember the size passed in (the queue will also have the item size as one of its local members on which you can check what the queue works with internally).
  • Right before you do your xQueueReceive() call, do something like that:
    volatile unsigned aDum = sizeof(I2C_Packet_data);
    and inspect the variable.

I would agree with Hartmut that the coding style is sort of obfuscating and error prone, there may be hidden problems in those nested and convoluted data structures.

As @RAc suggested, use the sizeof on both the sent and received items to ensure that they match the queue item size.

Another way could be to add some guard areas in your structures/unions and initialize it with a known pattern. Any change in any of the guard area would indicate a memory corruption which you can then catch using data breakpoint. Alternatively, you could comment parts of your application and see if you can narrow down the problematic area.

A quick advice:

In the method create() you’ve declared the variable ‘which’ as unsigned (uint8_t), however later on you compare it against below zero, which will never happen.

1 Like

Ah, that’s quite true. However:

if (which < 0) return false;
		if (which >= _MAX_PCA9634) return false;

that should catch the problem since uint8_t, if fed from a negative number, will be 0x80 or greater depending on the value.

Good catch, I will go and fix that.

It happens a bit, and the compiler never seems to complain about this.

Harvey