Improved _sin and _cos calculations

robca · June 26, 2023, 4:53pm

Hello, is there any plan to include something like this https://community.simplefoc.com/t/embedded-world-2023-stm32-cordic-co-processor/3107/81 in SimpleFOC?

I see that in the dev branch the _sin and _cos functions are defined as weak, so easy to override, but when I was doing some tests using the CORDIC coprocessor, calling sincos was much faster than individual sin and cos calls. And in the SimpleFOC code, sin and cos are always needed at the same time

Anthony_Douglas · June 26, 2023, 9:25pm

I thought all it did was look stuff up in a table. I have found that the difference between and M4 core and an M0 core in total code executsion time is several times greater than clock speed alone would indicate. IDK if the co processor helps like that. Seems like using a chip with floating point hardware i.e. m4 core rather than m0 core is quite an effective way to speed things up if you need more speed.

robca · June 26, 2023, 9:56pm

Thanks for the reply.

Well, yes, unless you were trying to port SimpleFOC to the existing hoverboard side boards using a lowly GD32 with an M3 running at 48MHz in which case, replacing the processor is not an option.

Of course an M4 would be faster, and a CORDIC-enabled STM32G431 even better. But I need the most efficient code for the M3, and from the tests at the time, Deku’s _sincos() seemed to be the fastest non-coprocessor option. I know I could implement it on my own, but I hoped it would be in the dev tree

dekutree64 · June 27, 2023, 12:42am

Yeah, I think you’ll just have to edit the library source yourself to call the combined _sincos. At some point in that other thread it was decided to stick with separate calls for better readability of the code. I partly disagree because it’s such a small change with decent benefit (and the function could have a more verbose name to be clear what it does), but I don’t feel that strongly about it because it’s a drop in the bucket compared to what could be done in a fully speed-optimized rewrite of SimpleFOC.

runger · June 29, 2023, 12:48am

It’s on the list of TODOs I think if Antun agrees maybe we will add the _sincos() and make it weak. It seems to be a popular move And with the nowait CORDIC solution there really are sizeable speed gains to be had.

It’s not yet on the dev branch, but perhaps we can get it there quickly.

robca · June 29, 2023, 8:15pm

The only concern is that adding _sincos() by itself won’t speed up things, unless the motor code also calls _sincos() in the transforms. Maybe having a #define to enable _sincos() for the platforms that can benefit from it would be more general

runger · June 29, 2023, 8:21pm

Yes yes, of course introducing _sincos also means making use of it wherever sin and cos are both used in the code…

robca · June 30, 2023, 5:47pm

Cool, I saw your check in, thanks!

runger · July 1, 2023, 10:05pm

And I’ve now merged the PR to the dev branch

So optimised sine is now a reality, as is the possibility to use _sincos, for example in combination with the CORDIC NOWAIT on STMs that have it. That will be a pretty optimal solution!

I’ve tested it (and it was working fine) on SAMD21 - it seems to me that the main loop gained about 5% speed bump.

I would kindly ask anyone who’s following this thread and uses the dev branch of SimpleFOC to do a pull and try out the new code in your setup.
If possible, please record the main loop speed before and after (for example in iterations per second, or microseconds for 1000 iterations or something like this).
And if possible please test the default version of the dev branch before overriding it with your own even more optimised versions

This is a big change for us, as it is of course a very central part of the code and will affect all users. We’d greatly appreciate some feedback from people before releasing it in the next SimpleFOC release.

Candas1 · July 23, 2023, 7:20pm

I tried the dev branch with the work in progress gd32 drivers.
I did some measurements with an oscilloscope by triggering output pins, that’s probably not the best method as the speed is not stable, I need to calculate an average of 1000 iterations as you suggested.

trap120 was 10% faster
SinePWM was 25% faster
SVPWM was 15% faster

I wanted to also try -Ofast and -O3 optimizations but it was way slower.
Disabling the peel-loops optimization it was then even a little faster then -Os (7%-10%), maybe it’s worth experimenting.

build_unflags = -Os
build_flags =
-O2
-fgcse-after-reload
-fipa-cp-clone
-floop-interchange
-floop-unroll-and-jam
;-fpeel-loops
-fpredictive-commoning
-fsplit-loops
-fsplit-paths
-ftree-loop-distribution
-ftree-partial-pre
-funswitch-loops
-fvect-cost-model=dynamic
-fversion-loops-for-strides
-ffast-math

[EDIT] Nevermind, fast-math seems to be the main contributor for the increase of the performance, and might not be safe