Implementation of QFP-lib for Cortex M0 MCUs

o_lampe · September 21, 2023, 10:19am

In the hoverboard thread I started to investigate the option to use a smaller math-library for the Cortex_M0 MCU, called qfplib.
I want to use a batch-file to locate all the replaceable math-functions in simpleFOC libraries.
So far I could identify some of them, but the basic qfp-routines are unmatched yet.
(see end of file)

See the attached document for more and feel free to add the missing pieces

/* Single precision versions of ANSI functions.  */

extern float atanf (float);
extern float cosf (float);	//	float qfp_fcos(float x);
extern float sinf (float);	//	float qfp_fsin(float x);
extern float tanf (float);	//	float qfp_ftan(float x);
extern float tanhf (float);
extern float frexpf (float, int *);
extern float modff (float, float *);
extern float ceilf (float);
extern float fabsf (float);
extern float floorf (float);

#ifndef _REENT_ONLY
extern float acosf (float);
extern float asinf (float);
extern float atan2f (float, float);	//	float qfp_fatan2(float y,float x);
extern float coshf (float);
extern float sinhf (float);
extern float expf (float);	//	float qfp_fexp(float x);
extern float ldexpf (float, int);
extern float logf (float);	//	??? float qfp_fln(float x);
extern float log10f (float);
extern float powf (float, float);
extern float sqrtf (float);	//	float qfp_fsqrt(float x); or  float qfp_fsqrt_fast(float x);
extern float fmodf (float, float);
#endif /* ! defined (_REENT_ONLY) */

/* Other single precision functions.  */

extern float exp2f (float);
extern float scalblnf (float, long int);
extern float tgammaf (float);
extern float nearbyintf (float);
extern long int lrintf (float);
extern long long int llrintf (float);
extern float roundf (float);
extern long int lroundf (float);
extern long long int llroundf (float);
extern float truncf (float);
extern float remquof (float, float, int *);
extern float fdimf (float, float);
extern float fmaxf (float, float);
extern float fminf (float, float);
extern float fmaf (float, float, float);

extern float infinityf (void);
extern float nanf (const char *);
extern float copysignf (float, float);
extern float logbf (float);
extern int ilogbf (float);

extern float asinhf (float);
extern float cbrtf (float);
extern float nextafterf (float, float);
extern float rintf (float);
extern float scalbnf (float, int);
extern float log1pf (float);
extern float expm1f (float);

#ifndef _REENT_ONLY
extern float acoshf (float);
extern float atanhf (float);
extern float remainderf (float, float);
extern float gammaf (float);
extern float lgammaf (float);
extern float erff (float);
extern float erfcf (float);
extern float log2f (float);
extern float hypotf (float, float);
#endif /* ! defined (_REENT_ONLY) */

// unmatched qfp-routines

float qfp_fadd(float x,float y);
float qfp_fsub(float x,float y);
float qfp_fmul(float x,float y);
float qfp_fdiv(float x,float y);
float qfp_fdiv_fast(float x,float y);
int   qfp_float2int(float x);
int   qfp_float2fix(float x,int y);
unsigned int   qfp_float2uint(float x);
unsigned int   qfp_float2ufix(float x,int y);
float qfp_int2float(int x);
float qfp_fix2float(int x,int y);
float qfp_uint2float(unsigned int x);
float qfp_ufix2float(unsigned int x,int y);
int   qfp_fcmp(float x,float y);

Candas1 · September 21, 2023, 12:05pm

It’s not as simple as that lol
Example for the multiplication:

the source code is using the * operator
gcc is replacing * by the builtin ___aeabi_dmul or ___aeabi_fmul function depending on the data type, because you have no FPU
the nanofloat library you shared declares a __wrap___aeabi_fmul function that uses the qfplib qfp_fmul function, and adds the -Wl,-wrap,__aeabi_dmul gcc option so that __wrap___aeabi_fmul is intercepted and ___aeabi_fmul is called instead

If * could be directly mapped to qfp_fmul with some gcc or linker magic, probably you wouldn’t have to change the source

Ideally if this works, we could save memory and increase speed on Cortex M0 and M3, without touching SimpleFOC. This is already used with Cortex M0+ for RP2040 it seems.
I think we should only be careful not use this for the precise sensor angle calculation because it needs a double.

o_lampe · September 21, 2023, 2:31pm

But I can replace sinf, cosf, etc. functions directly, like you did with atan2?
Maybe that’s already a big step forward…

I found that STSTM uses a dedicated arm_cortexM0l_math.lib (Cortex-M0 / Cortex-M0+, Little endian) and also a readme how to compile a new library with cmake. (or replace the existing?)
`…\AppData\Local\Arduino15\packages\STMicroelectronics\tools\CMSIS\5.7.0\CMSIS\DSP\README.md
But that’s one step too far?

Problem is, that *variable_name would be mapped too. But it has a totally different meaning.
Same thing with var++ or var--

Candas1 · September 21, 2023, 2:39pm

Atan2 is a good example, it’s in simplefoc and the imu library.

From my side I need to investigate this:
“arm-none-eabi-objcopy” “–redefine-sym”

o_lampe · September 21, 2023, 2:58pm

Check this:
--extract symbol generates an object file with only symbol data. (which we can fill with our own stuff?)

Candas1 · September 21, 2023, 3:00pm

Yes I was thinking about something like that.

Another rabbit hole

o_lampe · September 21, 2023, 3:11pm

…LOL You can unwrap your new LCR-meter after christmas.

runger · September 21, 2023, 10:54pm

Are you sure you can wrap occurrences of float multiplication? I would expect the compiler to inline this…

Anyways, it sounds like you’d have to get pretty deep into the linking process and change a lot of stuff.

At that point, might it not be simpler to create a copy of the gcc-arm-eabi tools that actually have the ___aeabi_fmul implemented differently?

Candas1 · September 22, 2023, 5:31am

No I think those functions that emulate floating points are too big and are not unlined, that 's why this library seems to be able to replace those math functions.

I thought about either modifying the functions in gcc library, or renaming the function in Qfplib, which is what the linker functions might be doing anyway but after compilation.

Multiplication is just an example, whatever works for the mutiplication should work for other operations.

Candas1 · September 22, 2023, 7:12am

My aim is not to change anything in SimpleFOC.
Otherwise it would look like that , even the core foc implementation becomes hardware specific then.

@runger does it look familiar ?

o_lampe · September 22, 2023, 7:33am

I wouldn’t mind having a simpleFOC branch for Cortex M0 & M3 MCUs.

IMHO there should also be an 8-bit simpleFOC branch. Because the ongoing development will no longer work on them. It will probably be a dead branch with outdated sFOC-version, but at least it will fit.

Candas1 · September 22, 2023, 8:25am

Anyone is free to create his own branch and support it.
I have my own SimpleFOC branch for gd32, but it’s only for PWM and current sense driver, so I can just synchronize it with most of the new changes coming in the main SimpleFOC.

A branch with Qpflib will be a lot of work to adapt the changes, or a dead branch as you said.

So no new features, no bug fixes, no support.

Candas1 · September 22, 2023, 2:43pm

It looks like the raspberry pico (M0+ chip) has its own __aeabi_fmul in float_aeabi.s

Candas1 · September 22, 2023, 4:59pm

I tried something dirty.
Using the dev branch of SimpleFOC in foc_current mode, I added Qfplib_m3 files.
I renamed the functions in the .S file:

qfp_fadd to __aeabi_fadd
qfp_fsub to __aeabi_fsub
qfp_fmul to __aeabi_fmul
qfp_fdiv to __aeabi_fdiv
qfp_fatan2 to atan2

It would complain about multiple definitions so I use the following option -Wl,–allow-multiple-definition.
It worked because de qfplib definitions always came first.

Before:
RAM: [= ] 9.2% (used 4536 bytes from 49152 bytes)
Flash: [=== ] 27.1% (used 71124 bytes from 262144 bytes)
loopfoc = 360us

After:
RAM: [= ] 9.2% (used 4536 bytes from 49152 bytes)
Flash: [=== ] 27.8% (used 72908 bytes from 262144 bytes)
loopfoc = 312us

So it increased the size, probably because it needs both Qfplib and libgcc now.
But it made loopfoc faster.

o_lampe · September 23, 2023, 8:48am

I guess, that’s where nano-float comes into play? It provides the math-functions qfplib doesn’t have and allows to skip libgcc completely.

o_lampe · September 23, 2023, 9:04am

It’s a bit offtopic, but aims to the same goal: reduction of flash-useage:
I was crawling through the sFOC source code and found some code that doesn’t make sense to run at every start.
We are checking if “pins are configured twice” or if the “timers are the best match”.
That makes no sense on all_in_one boards IMHO. Because the driver- and sensor pins are already hardwired.
I’d like to remove these functions, like I can outcomment motor.monitoring.

Furthermore I was thinking to split the program in two pieces:

run a program that only tests the configuration.
It contains all the pin mappings, init routines and commander declarations.
This prog writes a board_init file, which we can include in the main program.
the main program would only have to read the board_init file and focusses on initFOC and the main loop.

Candas1 · September 23, 2023, 9:07am

Exactly, it completely skips libgcc.
Nano-float is based on the qfplib variant for cortex M0 and focusing on reducing place (tiny).
A Nano-float-m3 would help.

But I am not sure why they wrap the function names if they replace the library fully.

Candas1 · September 23, 2023, 9:41am

Have you also looked at the minimal branch ?
It’s probably more to reduce the project size then the binary size though.

o_lampe · September 23, 2023, 9:54am

That’s a very good tip.
It will reduce the file count I had to crawl through, if I wanted to manually include qfp-math.
It would also be easier to remove the useless parts for AIO-boards (as mentioned above)
In the end, it’s all in one directory and can’t be confused with other setups.

o_lampe · September 23, 2023, 10:37am

Maybe it’s because the minimal branch uses sFOC V2.01, but I compiled the example-file STM32_hall.ino including the commander (which is a lot different than the current version) and it uses only 89% of flash.
With the current version, I am almost 4k short of flash-mem and can’t use the commander.