Embedded World 2023 - STM32 CORDIC CO-PROCESSOR

The Vesc way for reference

Huh, looks like VESC approximates using a quadratic and therefore uses just multiplication and addition… I wonder how accurate it is… I’ll try to add it to the comparisons tonight

1 Like

I had one thought.
Calculating sin and cos real-time can be slow.
A look up table is faster but uses flash memory.
Could calculating the look up table at start up and storing it in ram consume less flash overall?

In principle this kind of idea can work, but only if the LUT fits into actual RAM.

In practice, the MCUs have a lot less real RAM than flash on the one hand, and on the other hand the flash memory is normally mapped into the regular address space, and the MCU can execute code and read data sections from these address ranges.
This leads to the situation that the LUT, when used, is accessed directly from flash memory (or its cache), and doesn’t use space in RAM as well…

But there could be MCU architectures where calculating the LUT makes sense. It would also be a way to make the precision (e.g. number of entries and hence memory used) user-controllable.

1 Like

It’s even more complicated then that.
Some mcus have zero wait state for accessing the flash, some only for a portion of the flash. (e.g gd32)

Here it finally is:

This PR includes:

  • _sincos function, as requested, and replacement of all the code-parts using both sine and cosine to use the new function.
  • “Deku65i” version of _sin() replaces the original version
  • normalizeAngle is removed from the code where no longer needed

If Antun has no objections it should make it into the next release.

3 Likes

Incidentally, I also tested the VESC version, with only the sine part of the code, and while I didn’t copy the result to paste here I can report it performed worse than all the other options except stdlib sine, HAL function CORDIC and SimpleFOC sin + normalizeAngle.
In fact, it was also a bit worse than the arm_math.h sine, so they should have just used that. Maybe we can contribute our solution back to them?
Of course I didn’t spend loads of time investigating this, and also I tested STM32G471 while VESC is STM32F4 IIRC. So maybe my results aren’t representative for what they’re getting. :man_shrugging:

And I’ve now merged the PR to the dev branch :slight_smile: :partying_face: :champagne:

So optimised sine is now a reality, as is the possibility to use _sincos, for example in combination with the CORDIC NOWAIT on STMs that have it. That will be a pretty optimal solution!

I’ve tested it (and it was working fine) on SAMD21 - it seems to me that the main loop gained about 5% speed bump.

I would kindly ask anyone who’s following this thread and uses the dev branch of SimpleFOC to do a pull and try out the new code in your setup.
If possible, please record the main loop speed before and after (for example in iterations per second, or microseconds for 1000 iterations or something like this).
And if possible please test the default version of the dev branch before overriding it with your own even more optimised versions :smiley:

This is a big change for us, as it is of course a very central part of the code and will affect all users. We’d greatly appreciate some feedback from people before releasing it in the next SimpleFOC release.

3 Likes

Maybe this is the reason why it’s slower

But wouldn’t it be the case for all the algorithms?

Hi,

Yeah, the way I wrote my timing test code, all the functions get called as non-inline functions.

They could all be inlined, but it should make no difference to their relative performance. It would just make each version a little bit faster.

Inlining is something you can do with any non-virtual function - to save on the overhead of creating a stack frame and jumping the instruction counter to the other function and back. For tiny functions, it can be a net gain in space and speed, but for most functions it’s tradeoff - a little speed for more space used by the compiled code.

1 Like

Has anyone looked at faster implementation of atan2 ? (e.g. Odrive)

It could make Space Vector PWM faster.
[EDIT] Hmm, I see discussions hinting about SVPWM implementations that don’t even use atan2.

Happy to add a more optimised function :slight_smile:
We can also make a weakly bound _atan2() in the same way as sin/cos/sqrt and default it to the current implementation.

That way you can easily plug in your more optimised but possibly hardware-specific code.

The STM FOC bible mentions this in chapter 4.11:

That seems to run after the Clarke Transform like in the SinePWM implementation of SimpleFOC, and doesn’t use atan2 at all.
I don’t understand much… but it’s probably worth investigating.

But still a fast atan2 can be useful somewhere else.

ok I think I found something even better.
To generate those bumps, simpleFOC implements Space Vector pwm
image

But you can also just apply a midpoint clamp to the actual simpleFOC sinusoidal implementation after the Clarke transform.
Replacing this here:

center = driver->voltage_limit/2;
// Clarke transform
Ua = Ualpha + center;
Ub = -0.5f * Ualpha + _SQRT3_2 * Ubeta + center;
Uc = -0.5f * Ualpha - _SQRT3_2 * Ubeta + center;

By this:

center = driver->voltage_limit/2;
// Clarke transform
Ua = Ualpha;
Ub = -0.5f * Ualpha + _SQRT3_2 * Ubeta;
Uc = -0.5f * Ualpha - _SQRT3_2 * Ubeta;
if (svpwm){
float Umin = min(Ua, min(Ub, Uc));
float Umax = max(Ua, max(Ub, Uc));
center -= (Umax+Umin) / 2;
}
Ua += center;
Ub += center;
Uc += center;

Tada, it should be nearly as fast as sinePWM, no atan2 anymore.

3 Likes

I love it!

Your rate of innovation greatly exceeds my ability to test everything :smiley:

We should run this by @Antun_Skuric

If it really produces the svm waveform that would simplify and speed up the code :blush:

1 Like

Here you guys already have a bottom clamp

This formula could help simplify even more and you get top-clamp for almost free, but I am not sure if top-clamp is really useful:

Very interesting!

I’ve tested it in matlab quickly and it seems to work well.
I’ll make sure to test it properly in the simplefoc as well. This could be a great addition :smiley:

2 Likes

I was checking Qfplib again, I read here it can be made to replace the standard math functions.

According to this document (page 23-25) gfplib is already used on the Raspberry pico because of the poor floating math support on cortex M0+.

As Simplefoc supports the RP2040, I imagine there is no impact.

1 Like

Just an update. I am sharing this here because it’s related to the math functions but please move my previous post and this one if needed.

All the single precision calculation would probably benefit from this library. For things like precise angle calculation that use double it would be safer not to use it.
I am wondering how the RP2040 handles the double precision though.

I was looking for options to replace the gcc math functions like addition/substration/multiplicatio/division by the Qfplib ones without changing anything in SimpleFOC, but I wasn’t successful yet. But there are posts hinting at doing linker magic. Stay tuned.

Mary x mass