Relocation and binary angst

A few weeks back I was testing out the flash controller for an NXP chip I had laying around and ran into unexpected issues. On chips with on-board flash memory it’s typical for there to be functions included in the chip’s rom that user code can access for modifying flash. On bigger systems you would tend to have an actual external controller for access to disk and memory space, but on smaller things like a Cortex-M it is generally just included in the chip itself. In the case of this chip, the rom entry point is hardcoded at 0x1FFF1FF1 in memory and is possible to call it via a function pointer like so:

void (*flash_cmd)(uint32_t *input, uint32_t *output) = (void *)0x1FFF1FF1;

This should have been a simple case, but the following code would often result in a hardfault

uint32_t flash_read_partid()
{
    uint32_t cmd[5], ret[4];
    __disable_irq();
    cmd[0] = 54;
    flash_cmd(cmd, ret);
    __enable_irq();
    return 0;
}

At the time it didn’t make a lot of sense to me. The address in *flash_cmd was correct, the stack pointers cmd and ret were both valid. But every time the function was called the system would fault and a seemingly random address would be in the PC register. As it turns out, this has nothing to do with the flash controller at all and everything to do with how data is stored in binaries and how code is actually executed in modern CPUs.

To start, I suppose I should go into binary formats a bit. On OSX programs are generally compiled into the Mach-O format, on Windows the PE format, and on Linux the ELF format. When one is working with cross compiled toolchains and embedded systems typically we are generating a final flat binary that is simply the code laid out flat with no metadata, but before it reaches that point it is typically an ELF binary. ELF can have dozens of sections, but the four most important are:

text: code is stored in this segment
rodata: variables that are constant and immutable
data: values for initialized variables are stored here
bss: variables that are initialized to zero

To put it simply, your code is in the .text section and your variables are in a mixture of the other three depending on their type. In the case of my flash_cmd above it would be stored in data or rodata depending on how the linker script was configured.

When you compile a binary to something like ELF it contains metadata telling the operating system to load sections of it at certain memory addresses. On the NXP chip I’m using, code is stored starting at the address 0x00000000 and memory starts at 0x10000000. As an ELF file my binary would have told the system that data and bss need to be located at 0x10000000 (the start of memory) and that code itself would be at 0x00000000. With no virtual memory or relative memory addressing, the program will look at 0x10000000 despite the data actually being in the 0x00000000 code space and bad things happen. It has the same effect as casting a function pointer to a random memory address and crossing your fingers. For this reason you need to handle this so that everything is in the right place when needed. If we look at the objdump output of the binary we can see flash_cmd is in the binary properly at 0x10000000, but we’re still getting a junk jump and causing a hardfault when accessing it..

000005bc 00000024 T putchar
000005e0 00000011 r hex_tbl
0000071c R __text_end__
10000000 D __data_start__
10000000 00000004 d flash_cmd
10000004 B __bss_start__
10000004 D __data_end__
10000004 00000400 b cm3_stack
10000404 B __bss_end__

To better explain why this is happening it’s easiest to take a look at the linker script::

OUTPUT_FORMAT("elf32-littlearm", "elf32-littlearm", "elf32-littlearm")
OUTPUT_ARCH(arm)
ENTRY(cm3_start)

MEMORY {
    FLASH (rx)  : ORIGIN = 0x00000000, LENGTH = 32K
    RAM (rwx)   : ORIGIN = 0x10000000, LENGTH = 8K
}

SECTIONS {
    /* straightforward, put the vector table in front and lay out text/rodata first */
    .text : {
        . = ALIGN(4);
        KEEP (*(.text.vector_table))
            KEEP (*(.text.vector_table_platform))
            *(.text)
            *(.text.*)
            . = ALIGN(4);
    } >FLASH

    .rodata : {
        . = ALIGN(4);
        *(.rodata)
            *(.rodata.*)
            . = ALIGN(4);
        __text_end__ = . ;
    } >FLASH

    /* This represents initialized memory values. 
       place data at __text_end__ but link so that references are in ram */
    .data : AT(__text_end__) {
        . = ALIGN(4);
        __data_start__ = . ;
        *(.data)
            *(.data.*)
            . = ALIGN(4);
        __data_end__ = . ;
    } >RAM

    /* data to clear on boot (ex: the stack) */
    .bss : {
        . = ALIGN(4);
        __bss_start__ = . ;
        *(.bss)
            *(.bss.*)
            . = ALIGN(4);
        __bss_end__ = . ;
    } >RAM
}

There’s a lot going on here but the important details are to notice there are two MEMORY listings, and the sections I mentioned earlier are here. You’ll also see symbols I’m creating like __data_start__ and such, they’re the important detail. Since everything in the binary is all in one place, by creating these I know where the data structures and such of the executable are stored. Then at runtime I can copy from those addresses to the actual address in ram so that they can be accessed without causing faults and bad jumps. If this isn’t done then it means when the memory is accessed it will not contain the properly initialized values and issues like the one I was seeing will appear. An example of how to do this is in the first function PC points to at boot

void cm3_start(void)
{
    uint32_t *src = &__text_end__, *dest = &__data_start__;
    while (dest < &__data_end__)
        *dest++ = *src++;

    uint32_t *bss = &__bss_start__;
    while (bss < &__bss_end__)
        *bss++ = 0;

    platform_init();
    main();
}

Hopefully it’s straightforward with the earlier explanation in mind. Another detail to note is that I am zeroing out the bss section. This is because you can not assume memory is cleared on boot, whereas values in bss are expected to be initialized to zero. For this reason the entire section is modified by hand.

In the end, my flash issues were because I had incorrectly assigned one of my linker symbols for relocation. But like most things in embedded, figuring out the problem is usually a great learning experience.