Embedded World 2023 - STM32 CORDIC CO-PROCESSOR

Candas1 · June 25, 2023, 6:35am

The Vesc way for reference

runger · June 29, 2023, 3:02pm

Huh, looks like VESC approximates using a quadratic and therefore uses just multiplication and addition… I wonder how accurate it is… I’ll try to add it to the comparisons tonight

Candas1 · June 29, 2023, 3:15pm

I had one thought.
Calculating sin and cos real-time can be slow.
A look up table is faster but uses flash memory.
Could calculating the look up table at start up and storing it in ram consume less flash overall?

runger · June 29, 2023, 3:22pm

In principle this kind of idea can work, but only if the LUT fits into actual RAM.

In practice, the MCUs have a lot less real RAM than flash on the one hand, and on the other hand the flash memory is normally mapped into the regular address space, and the MCU can execute code and read data sections from these address ranges.
This leads to the situation that the LUT, when used, is accessed directly from flash memory (or its cache), and doesn’t use space in RAM as well…

But there could be MCU architectures where calculating the LUT makes sense. It would also be a way to make the precision (e.g. number of entries and hence memory used) user-controllable.

Candas1 · June 29, 2023, 3:40pm

It’s even more complicated then that.
Some mcus have zero wait state for accessing the flash, some only for a portion of the flash. (e.g gd32)

runger · June 29, 2023, 11:24pm

Here it finally is:

change sine implementation to deku65i version by runger1101001 · Pull Request #285 · simplefoc/Arduino-FOC · GitHub

This PR includes:

_sincos function, as requested, and replacement of all the code-parts using both sine and cosine to use the new function.
“Deku65i” version of _sin() replaces the original version
normalizeAngle is removed from the code where no longer needed

If Antun has no objections it should make it into the next release.

runger · July 1, 2023, 9:54pm

Incidentally, I also tested the VESC version, with only the sine part of the code, and while I didn’t copy the result to paste here I can report it performed worse than all the other options except stdlib sine, HAL function CORDIC and SimpleFOC sin + normalizeAngle.
In fact, it was also a bit worse than the arm_math.h sine, so they should have just used that. Maybe we can contribute our solution back to them?
Of course I didn’t spend loads of time investigating this, and also I tested STM32G471 while VESC is STM32F4 IIRC. So maybe my results aren’t representative for what they’re getting.

runger · July 1, 2023, 10:00pm

And I’ve now merged the PR to the dev branch

So optimised sine is now a reality, as is the possibility to use _sincos, for example in combination with the CORDIC NOWAIT on STMs that have it. That will be a pretty optimal solution!

I’ve tested it (and it was working fine) on SAMD21 - it seems to me that the main loop gained about 5% speed bump.

I would kindly ask anyone who’s following this thread and uses the dev branch of SimpleFOC to do a pull and try out the new code in your setup.
If possible, please record the main loop speed before and after (for example in iterations per second, or microseconds for 1000 iterations or something like this).
And if possible please test the default version of the dev branch before overriding it with your own even more optimised versions

This is a big change for us, as it is of course a very central part of the code and will affect all users. We’d greatly appreciate some feedback from people before releasing it in the next SimpleFOC release.

Candas1 · July 2, 2023, 5:29am

Maybe this is the reason why it’s slower

But wouldn’t it be the case for all the algorithms?

runger · July 2, 2023, 8:52am

Hi,

Yeah, the way I wrote my timing test code, all the functions get called as non-inline functions.

They could all be inlined, but it should make no difference to their relative performance. It would just make each version a little bit faster.

Inlining is something you can do with any non-virtual function - to save on the overhead of creating a stack frame and jumping the instruction counter to the other function and back. For tiny functions, it can be a net gain in space and speed, but for most functions it’s tradeoff - a little speed for more space used by the compiled code.

Candas1 · August 27, 2023, 12:43pm

Has anyone looked at faster implementation of atan2 ? (e.g. Odrive)

It could make Space Vector PWM faster.
[EDIT] Hmm, I see discussions hinting about SVPWM implementations that don’t even use atan2.

runger · August 27, 2023, 2:29pm

Happy to add a more optimised function
We can also make a weakly bound _atan2() in the same way as sin/cos/sqrt and default it to the current implementation.

That way you can easily plug in your more optimised but possibly hardware-specific code.

Candas1 · August 27, 2023, 5:57pm

The STM FOC bible mentions this in chapter 4.11:

That seems to run after the Clarke Transform like in the SinePWM implementation of SimpleFOC, and doesn’t use atan2 at all.
I don’t understand much… but it’s probably worth investigating.

But still a fast atan2 can be useful somewhere else.

Candas1 · August 29, 2023, 11:46am

ok I think I found something even better.
To generate those bumps, simpleFOC implements Space Vector pwm

But you can also just apply a midpoint clamp to the actual simpleFOC sinusoidal implementation after the Clarke transform.
Replacing this here:

center = driver->voltage_limit/2;
// Clarke transform
Ua = Ualpha + center;
Ub = -0.5f * Ualpha + _SQRT3_2 * Ubeta + center;
Uc = -0.5f * Ualpha - _SQRT3_2 * Ubeta + center;

By this:

center = driver->voltage_limit/2;
// Clarke transform
Ua = Ualpha;
Ub = -0.5f * Ualpha + _SQRT3_2 * Ubeta;
Uc = -0.5f * Ualpha - _SQRT3_2 * Ubeta;
if (svpwm){
float Umin = min(Ua, min(Ub, Uc));
float Umax = max(Ua, max(Ub, Uc));
center -= (Umax+Umin) / 2;
}
Ua += center;
Ub += center;
Uc += center;

Tada, it should be nearly as fast as sinePWM, no atan2 anymore.

runger · August 29, 2023, 4:11pm

I love it!

Your rate of innovation greatly exceeds my ability to test everything

We should run this by @Antun_Skuric

If it really produces the svm waveform that would simplify and speed up the code

Candas1 · August 29, 2023, 4:17pm

Here you guys already have a bottom clamp

This formula could help simplify even more and you get top-clamp for almost free, but I am not sure if top-clamp is really useful:

Antun_Skuric · August 29, 2023, 4:55pm

Very interesting!

I’ve tested it in matlab quickly and it seems to work well.
I’ll make sure to test it properly in the simplefoc as well. This could be a great addition

Candas1 · September 14, 2023, 5:01am

I was checking Qfplib again, I read here it can be made to replace the standard math functions.

According to this document (page 23-25) gfplib is already used on the Raspberry pico because of the poor floating math support on cortex M0+.

As Simplefoc supports the RP2040, I imagine there is no impact.

Candas1 · September 18, 2023, 7:26am

Just an update. I am sharing this here because it’s related to the math functions but please move my previous post and this one if needed.

All the single precision calculation would probably benefit from this library. For things like precise angle calculation that use double it would be safer not to use it.
I am wondering how the RP2040 handles the double precision though.

I was looking for options to replace the gcc math functions like addition/substration/multiplicatio/division by the Qfplib ones without changing anything in SimpleFOC, but I wasn’t successful yet. But there are posts hinting at doing linker magic. Stay tuned.

Juan-Antonio_Soren_E · December 23, 2023, 6:24pm

Mary x mass