Weird behavior on RISC-V QEMU 'virt' ports

Hello everyone! hope you’re all doing well.

Lately I’ve been trying to develop real-time software on RISC-V (taking the official FreeRTOS QEMU demo ports as a starting point), but some very weird issues got in the way and made me decide to ask for help on the related forums.

TL:DR

FreeRTOS forums won’t allow me to paste the repository link here, but it basically goes like this: the program executes fine when a certain code snippet is encapsulated within a function, but “crashes” (i.e. hangs) when the same snippet is placed directly in the main code:

for(int i=0; i < NUMBER_OF_ITEMS; i++){
    createAndPushItem(i);

    // the function above does the exact same thing as the commented code below
    // yet, the commented code does not work and will crash the program. but why??
        
    // int index = priorities[i];                                               
    // void *value = (void *) getValue(i + 1);
    // LinkedListItem_t *item = createItem(index, value);                          
    // if(item){
    //     push(item, &list);
    // }
}

The scope shouldn’t matter at all here, since there is no local variable being used or anything like that. Also, in favor of simplicity the sample code doesn’t even uses FreeRTOS’ tasks or scheduler, just the pvPortMalloc/vPortFree functions.

Context

It all started with me deciding to use Robin Kase’s ESFree library to append the EDF scheduler to the project, but the code didn’t work out-of-the-box on neither of the ‘virt’ ports. It ran OK on the ARM ports for QEMU, though, so I figured it should be some incompatibility issue on the RISC-V side. While investigating, I found the problem seemed to be related to the linked lists API, as the List_t uxNumberOfItems variable just seemed to go crazy on a certain point of the code that wasn’t even supposed to interact with it.

I then decided to create my own library for managing linked lists as a workaround and created a whole new sample project to test it, but my attempt of making it happen also went sideways with a new bug: the code would execute just fine when a certain snippet were placed inside a dedicated function, but not so much when placed directly on the main code. That is what I describe on the TL:DR section above.

This doesn’t really seem to me like a problem within FreeRTOS, as like I said there are no schedulers or tasks being used. The only resources provided by FreeRTOS which I actually use on the sample project are pvPortMalloc and vPortFree, but the problem happens anyway with regular malloc (as shown below). I can’t really cross FreeRTOS out of the equation, though, since the whole port setup (e.g. libraries, assembly files and makefile) has been provided by it.

Workarounds

These were my attempts of solving the problem so far:

  1. Tried using the other FreeRTOS RISC-V port meant for QEMU (RISC-V-Qemu-virt-GCC), but the problem persists.
  2. Tried building the project without compiler optimization (using -O0 instead of -Os in the makefile). The output is even worse, with the execution hanging even before printing the list items for the first time.
  3. Tested with other heap files provided by FreeRTOS (heap_1, heap_3 and heap_4). No success whatsoever.
  4. Tested with regular malloc() and free() from stdlib.h. Output shows same behavior as compiling without optimization (i.e. prints nothing on the terminal).

I suppose the next logical step would be to create my own RISC-V port for other QEMU supported boards such as SiFive HiFive Unleashed, but I am not very confident on this mission since I have little experience with developing ports for emulators, and there doesn’t seem to have many good guides about it online. I am familiar with many programming languages, but FreeRTOS/QEMU/compiler developing/debugging are definitely out of my league. Hence why I’m here asking for help from the clever ones.

I’ll be posting this issue on the RISC-V toolchain and QEMU forums aswell. Hopefully someone might help me out on this quest.
Thanks in advance.

Might be an optimization issue related to your counter variable. You could try building your inlined code with optimizations disabled and see if you still see the problem then.

You should now be able to post the link. Please share it.

This is the repository with the sample project if anyone’s interested (hopefully the forum will allow me to include here). It contains a more detailed description and instructions on how to install, build and execute.

You mean changing -Os for -O0 on the compiler flags and keeping the inline code around instead of the function call? If so, I’ve already done that (and it sadly didn’t work).

I also tried inserting the function inline with __attribute __((always_inline)) and the result was unsurprisingly equal as placing the snippet directly (i.e. still not working).

ok, sorry, hadn’t seen that in your first post.

Probably not directly related to your problem, but your getValue() function does not look right. You try to free a null ptr if the alloc fails.

Also, you assume that all of your allocations return success, never check the return values. Is there an implicit check somewhere?

Finally, how large is your startup stack? sprintf() is a notorious stack hog. Of course it would be counter intuitive if a smaller stack frame should blow and a larger one succeed, but you never know…

No problem! I tried to be as straightforward as possible but may have written too much information to digest at first contact anyway.

I understand the possible flaw and might change this code, but I’m not sure it is related to the described problem anyway. The pvPortMalloc() return values are checked after the for loop, when all items on the list are printed (that includes their addresses). The execution hangs after the first print round.

I didn’t know about that, is there a way to check the startup stack size? Either way, I don’t think I wrote this above as I thought it was irrelevant for the case but a) I also tried running the code using regular printf instead of sprintf + vSendString originally used by the port (don’t really know why the demo port chooses to emulate a ns16550 UART when printf is available, but there you go) and saw no difference, b) the sample code I posted is not really that big, so I fail to see how would this be a major concern, and c) this exact code will perform accordingly on the ARM ports meant for QEMU.

It should be detectable right from the linker map file.

One easy thing to do would be to manually fill the stack with a signature before doing anything else and then checking to see how much of the signature is left. This is a lot like built-in stack overflow checking is done for tasks.

What happens if you stub out the printf/sprintf altogether?

you mean just removing them from the code? How am I supposed to see any output then?

Anyway, although this might show a possible way of avoiding the problem I’d rather find an actual solution for it instead :frowning: I’m pretty sure this behavior I described is not expected in any circumstance (RTOS or not). I’ve posted this same issue on QEMU and the RISC-V toolchain forums aswell in the hopes I’ll find answers and help elsewhere.

One new thing to add is that I ran the same code on a Seeed XIAO ESP32-C3, which is RISC-V and based off FreeRTOS (although modified by Espressif) and it worked as intended. That leads me to think the issue might be more related to the compiler or the emulator.

The idea is to positively pinpoint or exclude printf as the source of your problem.

Ok, so I have a new finding that resonates with your theory to some extent.

I have already posted a more comprehensive explanation on the RISC-V toolchain’s github, but in short I ran an Assembly-level analysis after learning how to use GDB, and with that found the inlined code will hang right after creating and printing the items for the first time because printi - a soubroutine of both printf and sprintf - corrupts the pointers to the linked lists’ head and tail after executing. More precisely, the tail gets corrupted first, at Assembly instruction [printi+98]: sb t3,-1(a5), whereas the head comes next a few loop cycles laters in the same exact line.

So it looks like the first forEach executes entirely because it only reads the head pointer once, at the start. But when the sort function follows, it tries to access the head pointer again - and with a corrupted value, a segmentation fault happens and the handler raises the register values as an illegal instruction. Or at least that’s my understanding.

With that said, I suppose the problem is related to the standard C printi function. If I run the code printing only strings - i.e.

printf("item: whatever");

instead of

printf("item: %d", itemValue);

The linked list pointers are never corrupted. It still may not explain the first problem I reported of encapsulated vs inline code, but it is a problem that deserves some attention anyway. That apparently settles the toochain as the guilty part, unless FreeRTOS does some changes on the printf API that I’m not aware of.

So have you checked on the stack size?

As @RAc already said, you should check that you are not overflowing your stack which is resulting in memory corruption.

The FreeRTOSDemo.map file shows __stack_size = 0x15e. This value is the same regardless of:

  • compiler optimization values (tested: -Os and -O0)
  • code structure (tested: inlined code and encapsulated function)
  • demo files included (tested: leaving and removing all full_blinky demo files from the makefile).

I suppose the latter wouldn’t make a difference because the project in question is already as simple as it gets (hence uses nothing of these demo files). But for the first two I admit I was expecting some sort of change.

In case it’s useful, the FreeRTOSDemo.map file also shows the following with relation to the stack. Again, it’s the same content across the different settings above:

.stack          0x800945fc      0x162
                0x80094600                        . = ALIGN (0x10)
 *fill*         0x800945fc        0x4 
                0x8009475e                        . = (. + __stack_size)
 *fill*         0x80094600      0x15e 
                0x8009475e                        _stack_top = .

One side note: I hadn’t noticed before, but while searching for stack configuration references on the files I stumbled upon these flags on the Makefile:

LDFLAGS += -nostartfiles -Xlinker --gc-sections -Wl,-Map,$(OUTPUT_DIR)/RTOSDemo.map \
           -T./fake_rom.ld -march=rv32imac -mabi=ilp32 -mcmodel=medlow -Xlinker \
           --defsym=__stack_size=350 -Wl,--start-group -Wl,--end-group

--defsym=__stack_size=350 likely explais why the values won’t change (350 = 0x15e). Could it be the responsible for the problem all this time? I made the experiment of doubling the value to 700 and suddenly all the problems seemed to be gone. The program worked well with or without compiler optimization, and with both inlined and encapsulated variants.

Again, this is easy to verify. Fill a section near the top of your stack with a signature and once your system is crashed, check to see whether the signature is still there.

It looks like the static stack size definition was the problem indeed. Increasing the stack size to a greater value (such as 700 in my case) solved the issue for good. Thanks for your help.

1 Like

Thank you for reporting back!