MQTT stop working after receiving large bytes of data in FreeRTOS

I am using RIVERDI 7’inch display with STM32H7 microcontroller. I also have enabled LwIP and developed MQTT client which is working correctly. At that time I need to read a large JSON string in order to extract some data from it. For this action, I went to the mqtt_opts.h file in order to increase the incoming payload buffer size, which is: MQTT_VAR_HEADER_BUFFER_LEN and I set it from 350 to 1271 because thats the size I need to set, based on the recommendation "

/**
 * Number of bytes in receive buffer, must be at least the size of the longest incoming topic + 8
 * If one wants to avoid fragmented incoming publish, set length to max incoming topic length + max payload length + 8
 */

My payload length is 1215. When I did that it opens the function which I have create in order to extract the data and after the execution of the first line it crashes and leads me to that line of code, which I cannot understand why, because I don’t use my queue when this operation is occured:

/* Check this really is a semaphore, in which case the item size will be
	0. */
	configASSERT( pxQueue->uxItemSize == 0 );

The function that handles the extraction of data uses jsmn.h library and I have test it that it works correctly. Although, when I have the buffer size in the mqtt_opts.h file into 350 it doesn’t crash but it cant read all the message just a part from that. Am I missing something else? Do I need to increase also another parameter in order to work? What should I check in the debugging mode in order to figure it out?

I am subscribing to 6 different topics, could this lead to a problem with task size? I have set it to 9000*4. Manipulating that number of subscription could be difficult to handle a task in FreeRTOS?

I am running LwIP in linkerscript like this:

.lwip_sec (NOLOAD) :
  {
  	. = ABSOLUTE(0x30000000);
  	*(.RxDecripSection)
  	
  	. = ABSOLUTE(0x30000200);
  	*(.TxDecripSection)
  	
	. = ABSOLUTE(0x30000400);
  	*(.RxPool)
  } >RAM_D2

The DMA Descriptors starts at the address 0x30000000, and together they occupy 608 bytes, i.e. 0x200 + 4×24. The Heap size is set to 14KB and the Heap pointer is set to address 0x30004000. I also configured this in the MPU_Region. My ETH_RX_BUFFER_SIZE is equal to 1536.

I don’t know what else to do. I would appreciate any response! Thank you in advance!

I’m going to guess the larger payload size is overwriting the queue data structure - which is triggering the assert(). Our examples use our own TCP/IP and JSON libraries, so I’m afraid I’m not an expert on either jsmn or lwIP’s MQTT implementations so can’t advise where to look there. Placing a data watchpoint to stop the debugger when pxQueue->uxItemSize gets overwritten would be a good way of finding the cause if my theory is right.

Isn’t this related to the question you asked here before?

LwIP mqtt fail in freeRTOS after a while & large payloads cant be stored after a subscribe - Kernel - FreeRTOS Community Forums

There is still the open question about the size of your Rx descriptor memory section. May be the root cause for both issues (in case they are different in the first place).

@rtel I am not so familiar with the debugger. If that is the cause of corruption what probably could I do for that?

Richard has already suggested using a data breakpoint to catch the memory corruption.

Where do you get this file from? If this is from LWIP, you are likely to get better response on LWIP forums.

@rtel indeed this is what is happening. It overwrites the queue structure. While it was running normal and the queue structure has uxLength equal to 1 and uxMessagesWaitting either 0 or 1. When it receives the large payload it changes the values to 1763844140 for the first one and 2099396898 for the second one, which I guess that it is not normal.

@aggarg It is a code generated by CubeMX when I installed LwIP to my microcontroller

That strongly suggests memory corruption. One thing you can do it add a new member to the queue data structure which should not change normally. Then put a data breakpoint on that new member to break when that changes. It will catch the memory corruption right when it happens.

@aggarg The thing is that I am not changing the contents of any member in the queue. Let me explain what I am doing. I have a structure which contains 7 char variables. Also, I have created a freeRTOS queue in order to send data to the UI (Model.cpp class), this queue has room for 5 messages and size equal to my structure QueueHandle = osMessageQueueNew (5, sizeof(buf_t), &Queue_attributes);. When I am receiving the payload it happens what I mentioned above. I am catching the payload with that way char

buf3[len+1];
strncpy(buf3, (char *)data, len);
buf3[len]='\0';

So in order to understand what you are saying. You recommended to add a new member in the queue structure, or in my structure?

Next to whatever is getting corrupted. Lets say that you have a variable x that is getting corrupted:

uint32_t x;

Generally, the memory corruption is not limited to one word. We declare a variable next to it and usually, the linker places it next to the variable we are interested in (in this case, x) :

uint32_t x;
uint32_t trap;

The application code does not use trap and therefore, its value should not change in the normal course of execution. You can then put a data breakpoint to break when trap value changes. This enables you to catch the memory corruption right when it happens.

@aggarg I understand now. First of all I would like to thank you because you are willing to help me. As I was searching I noticed that two lines of my code possible leads to that corruption. The call back function that contains the payload is this static void mqtt_incoming_data_cb(void *arg, const u8_t *data, u16_t len, u8_t flags) where the payload = data. I want to extract a number from that payload, hence I was converting the u8_t data to char like that

char buf2[len+1];
memcpy(buf2, (char *)data, len);
buf2[len]='\0';

Although, when I don’t use that code and not sending the payload for extraction it appears to run correctly and no overwriting to the structure of the queue happens. So my question is, could it be so wrong the conversion in order to damage the struct and if yes is there any way to continue with that? For instance, can I extract data from u8_t payload?

your buf variable appears to be located on the stack, so if the problem goes away when not using it, that is an indication of a stack overflow.

2 Likes

I tried to look at the LWIP code and it seems that the above callback is called from the LWIP TCP thread. You need to increase the stack size of that thread as you are most likely overflowing the stack because of creating the large buffer. Try increasing the value of TCPIP_THREAD_STACKSIZE.

1 Like

@aggarg It seems that is the problem I increased it and it sends data successfully to the screen. I would like to thank you a lot and I hope all the best to you

Glad that it worked for you!

Do you have configCHECK_FOR_STACK_OVERFLOW set to 2?

No it was by default to 1. I have changed it now in order to check it.

@rtel could you please explain to me why should I have this parameter configure to that way (equal to 2)?

Thank you.

Stack overflows are one of the most common causes of runtime crashes and other issues in FreeRTOS. It would have spared you and all of us a lot of time and brain power if the stack overflow had been pinpointed earlier. The runtime check @rtel mentioned would probably have helped locating the problem faster.