Problem with xTaskNotifyGive not unblocking task


I am trying to use the Task Notify functions for deferred interrupt processing in lieu of semaphores, but am not having much success.

I have a task (printTask) that extracts characters in blocks of up to 512 from a circular buffer puts them in a DMA buffer and then writes them out to a uart using HAL_UART_Transmit_DMA_rjg(&huart1, …

I have an interrupt handler that resets the uart and signals to the printTask function that the transfer is complete.

The printTask is blocked using: ulTaskNotifyTake(pdTRUE, portMAX_DELAY);

However the task does not unblock when either the vTaskNotifyGiveFromISR(printTaskHandle,
or the xTaskNotifyGive(printTaskHandle) statements are given in the ISR and printfdma functions respectively.

I put the statement: ulTaskNotifyTake(pdTRUE, 0) on line 62; if I didn’t have this the task never entered the blocked state and fell straight through.

Here is the code:

/* Static FreeRTOS Variables */
TaskHandle_t 	printTaskHandle;
StackType_t 	printTaskStack[STACK_SIZE_START_WIFI_TASK];
StaticTask_t 	printTaskBuffer;

void initPrintfUart1(void)

  /* Start the printing thread */

* Put Chars in Circular Buffer */
int printfdma(const char *format, ...)
	int nResult = 0;
	int k;
  va_list args;


		/* Kick off the uart */
  	/* Circular buffer full - post message TODO */
  return nResult;

static bool startPrintTask(void)
	printTaskHandle 	 =    TaskCreateStatic(printTask,			                    										 "printTask",

	if(printTaskHandle == NULL)
	return PRINTF_OK;

static void printTask(void * argument)
	static int k = 0;
	char chr[1];

	/* call the thread notification to set the counter to zero so it blocks */
	ulTaskNotifyTake(pdTRUE, 0);

    ulTaskNotifyTake(pdTRUE, portMAX_DELAY);
		while(!circular_buf_char_empty(&circBuffTxHandle) && k < DMA_BUFF_TX_SIZE)

		if( HAL_UART_Transmit_DMA_rjg(&huart1,  (unsigned char*)&uart1DMABuffTx, k) != HAL_OK)
			/* TODO Hal error handling */

* Uart 1 TX DMA Interrupt Handler. */

void DMA1_Stream7_IRQHandler(void)
  BaseType_t xHigherPriorityTaskWoken = pdFALSE;




What am I doing wrong?

How did you verify that the task did not unblock? Also, the sequence seems a bit odd - don’t you need to differentiate the 2 notifications:

  1. Wait for the notification that there is data to be sent to UART.
  2. Start DMA.
  3. Wait for another/different notification that the transfer is complete.

The way I verified that the task did not unblock: I put a break-point after the unblock command and it never got to it. i.e. I called printfdma which should unblock the task so that characters get output to the uart. That didn’t happen.

The sequence of operation is:

  1. Start the print task - it will run and then block on the ulTaskNotifyTake(pdTRUE, portMAX_DELAY) statement.

  2. Call printfDMA: This put data in the circular buffer and then calls xTaskNotifyGive(printTaskHandle) to release the block.

  3. The print task should then print out the characters. When the transmission is complete the UART interrupt routine uses:


This should unblock the task again.

I need to add some code to only unblock the task from the interrupt if there are still characters in the circular buffer. I haven’t done that yet as I was trying to understand how the notify task system works. I read the manual and looked at the example for xTaskNotifyGive on pp 117.

I think I might use a FreeRTOS queue to do the transactions in lieu of using my own circular buffer and task notify. However I would like to use task notify instead of binary sem’s in other deferred interrupt scenarios as the task notify system is faster and you don’t have to create semaphore objects.

Best regards

Did the task execute at all?

That seems like notify is not working at all for you. Can you remove the DMA stuff and see if you are able to unblock a task from another task? If not, please share the code snippet so that I can give it a try.

Hi Gaurav,

The printfdma function is called from other higher priority tasks and has the xTaskNotifyGive(printTaskHandle) which should unblock the task.

I will attach the full code module to this post.

The purpose of the code is to provide non-blocking printf driver for debugging purposes. I have been having a lot of trouble with the blocking printf causing hard faults. I have been using Georges Menie’s printf-stdarg which is a cut down version of the standard c printf and sprintf code. It only does integer numbers not floats or doubles and is supposed to be a lot faster. I reckon it would be, but I’ve not tested this.

I have re-coded the module to use a static FreeRTOS queue in lieu of my own circular buffer. I would have used FreeRTOS queue in the first place, but I didn’t realize that there are now static queues until I started browsing the latest manual. I don’t much like dynamic objects because they use the heap which is a complete mystery to me. All of my FreeRTOS objects in this project must persist i.e. they are must be never destroyed so static objects aren’t a problem. I also have plenty of memory. My board uses an STM32H7B3I processor with 1 MByte internal ram and 15 MByte of external.

I would really like to get task notify working. I currently am using binary semaphores for all of my deferred interrupt handling. They work perfectly, but it’s like using a sledge hammer to crack a nut and the current project I’m working on has speed as its priority.

Best Regards
Rob (4.7 KB)

The code you shared contains your circular buffer and UART and platform code - I removed everything and just kept only necessary parts to test FreeRTOS task notifications. (1.5 KB)

Please try this code. You will need to:

  1. Call initPrintfUart1(); from your main.
  2. Create another task which periodically calls printfdma.

If everything is working as expected, the static variable k in the printTask will keep incrementing. I have verified it and it works for me. Here are my code snippets for main function and additional test task:

osThreadId_t defaultTaskHandle;
const osThreadAttr_t defaultTask_attributes = {
  .name = "defaultTask",
  .stack_size = 128 * 4,
  .priority = (osPriority_t) osPriorityNormal,

int main(void)

   defaultTaskHandle = osThreadNew(StartDefaultTask, NULL, &defaultTask_attributes);

  /* Start scheduler */

  while (1)

void StartDefaultTask(void *argument)

Please try this and let me know if the static variable k in the printTask is incrementing for you.


Thanks I’ll give it a whorle.

It is a fairly safe bet that this approach will only make things worse than better. Debug streaming via printf() is not a good idea in the first place for various reasons that have been discussed here many times.

Using a stripped printf() engine is an improvement (most likely your hard faults were a result of crashing stacks - the stock implementation of prontf is a first class stack hog), but it doesn’t solve the many problems inherent in the printf() streaming approach.

Your approach to make the printf()s asynchronous bears many many more problems down the road. I promise you’ll bucket many hours in trying to get that to work.

Best of luck anyways!

Hi GA,

I looked at your code and it prompted me to look at my default Task (monitorTask). I found that I had a second loop within the main task loop that was hammering the processor. As the monitor Task had a higher priority than the print task wasn’t getting a chance to run. If I reversed the priority it worked.

I got rid of the silly loop from my monitorTask and et voila! It works fine.

In the end I only used task notify for the printfdma function to notify the task when its blocked.
To coordinate the interrupt I uses a delay loop on uart busy. So after sending the uart tx command I wait for it to become ready using the delay loop rather than a semaphore, because the timing isn’t critical. Generally it will only go through the loop once.

I have attached a copy of the final code in case anyone else is interested.

Thanks for your help.


Updated zip -includes small mods to stdarg-alternative to provide an sprinf that takes an argument struct, rahther than variable args.

PrintfUART1 DMA (11.3 KB)

Glad that it worked for you.


I have bolted my printfdma into my project and it works fine. The hard fault that occurred when I printed out the date and time now works.

I have kept the alternative standard printf that uses the uart in single character mode for my hard fault handler as it is impossible for the DMA version to run when a hard fault occurs.

I am not sure why printfdma using a custom intermediate circular buffer or for that matter a FreeRTOS queue should create big problems if written correctly, and where the task priorities and interrupt priorities are likewise correctly set.

I too thought that the stacks might be the problem when using the alternative single character printf, but all my task stacks are huge (2048 words) and static; and I was only printing
“UTC 2022-03-04T13:03:45” which is only 22 characters. I checked my stacks by filling them with 0xA5A5A5A5 and then looking to see where they ended up and they were OK.

I debated whether to continue debugging further to try to find the cause, but looking at disassembly and debugging through register contents did not seem to me to be a profitable course of action. I don’t know the instruction set of ARM processors very well and I’m no expert on debugging hard faults. And there is nothing worse that trying to sort out a recalcitrant function in the disassembly with no C source code.

I have always wanted to have a crack at a streaming DMA printf so I did, and it seems to work OK.

I have had lots of problems with gcc standard C libraries. For example memset also caused hard faults. I googled memset after it continually crashed the machine and found I wasn’t Robinson Crusoe. Quite a few people said don’t use it. I took their advice and wrote my own which ended my problems. I don’t bother using registers to speed things up; if you turn pragma push “Ofast” for your function the compiler does it all for you.

I would appreciate it if you could point me to some of the articles on streaming printf so I can check whether my one has hidden traps and issues.

Best regards


one of the problems with asynchronous printf is that it won’t work. Consider the following code:

unsigned char a_PBuf[x];


you will have to rely on the printfasync to return before the function returns, otherwise the buffer on the stack will run out of scope and become invalid, and garbage is being printed. Try it with a longer string and a low baud rate. On a high enough baud rate, you can sort of expect it to work mostly, in particular with small strings, but it’s not guaranteed. Thus, the net gain over a synchronous printf is zip.

Of course you can modify your printf such that it copies the string to some other globally accessible place before kicking off DMA xmit, but it’s easy to see that there will be follow up problems.

The more general problem with debug printf (regardless of whether sync or async) is of course the Heisenberg effect: Your debug code will change the runtime behavior of your system so drastically that a good number of release problems (in particular those related to timing) will not show anymore in your debug version; conversely, you will spend many hours trying to pinpoint issues introduced BY the debug engine itself (you have been there). Also, the footprint of your code will bloat tremendously. I’ve seen systems where Flash memory that would easily have sufficed to accomodate the target’s functionality ran out because of literally hundreds of printf(“Encountered an unexpected packet at time %d:%d:%d:%d”) messages.

I won’t complain because I make my money from companys that ask me to find problems in their code. I can’t remember how many of those problems (implying revenue for me) stemmed from well intended debugging code.

And yes, ARM code can be hard to decipher, in particular when optimization is in effect (the differences in footprint vs. performance between optimization levels can be in close to astronomical dimensions). Me personally prefers to learn intricacies like those than trying to get debug crutches to work reliably…

But that’s not meant to be derogatory or anything. There juast isn’t a perfect solution in our job; you always abandon one easy or semingly straightforward solution in favor of another requirement. Isn’t that what engineering is all about?

Best of luck, again!

1 Like

Rervised printfdma with recursive mutex
PrintfUART1 DMA (11.2 KB)
control of printfdma function and a couple of fixes.

Hi Rob,

I briefly looked over your code. I beleve that there is a potential serialization problem. All tasks calling pritfdma are serialized against each other via a mutex which is right. However. Assume 10 task pris. Assume next that a task A with pri 7 requests an output claiming the mutex. Now assume that the printfdam routine suspends A, allowing C with pri 1 to run. It in turn tries to output something and gets suspended on the mutex. Now Task B at pri 5 gets woken up, say by an IRQ handler, tries to output something, gets suspended on the mutex…

Finally A finishes, releases the mutex… and the scheduler will allow B to get the mutex which means that in the end, you will see outputs in the order A, B, C which is NOT the timing order the outputs occurred in, I would expect A, C, B!

Is that what you want, or am I missing something?

Hi RAc,

Yes you are correct. Mis-ordering can occur, however printfdma is only for debugging one or two tasks at a time, mostly just the one, or for tasks that don’t run concurrently. I use compiler directives to turn them on or off in the various places in the code.

The problem I had was I was debugging code for an Inventek WiFi modem using SPI. The alternative printf seemed to be upsetting the timing and causing hard faults i.e. with it was turned off the hard faults went away and the system worked.

I did the printfdma version and that seemed to fix the issue.

It turned out that it wasn’t the printf that was the root cause of the printf problem it was I. I hadn’t initialized some variables which resulted in memory access faults. I found this out when I restarted using power-up not starting from the debugger.

Anyway if you have any ideas on a fix let me know. I could have used FreeRTOS queues rather than my own circular buffer. Unlike my circular buffer you can add entries to FreeRTOS queues to either the head or tail. Along with other things this might get around the ordering problem.

Printfdma seems to work very well and it is very fast; adding a typical message to the circular buffer only takes 5 to 15 us compared with 3 or 4 milliseconds for our old friend printf running on a uart at 115200. This was for a string 26 characters long with 6 numerical formats. Measured it by bit banging an IO pin and measuring it on my scope.

Best regards