Embedded World 2023 - STM32 CORDIC CO-PROCESSOR

Awesome! Now it does work :smiley:

These are the timings for total loop times w. <TrapezoidalPlanner.h>

time per iterasion:   17.0903
time per iterasion:   17.0944
time per iterasion:   17.0943
time per iterasion:   17.0951

Compared to:

time per iterasion:   14.5107
time per iterasion:   14.5110
time per iterasion:   14.5109

Cant wait to try this feed-forward thing. I see there is a branch in the repo with some feed-forward changes.

I suppose we are more or less ready for OpenPnP integration. Its been a while since I looked at it, how it parses commands. Maybe dual motor Y axis still need some synchronization, possibly over CAN FD?

Here it is running max 50rad/s with 25rad/s/s acceleration/de-acceleration.

First move is to G20, second move is to G-400


It receives commands using the Commander interface and then, based on the calculated trajectory, sends target commands to the linked engine. If you have more than one engine in your SimpleFOC node, you can have one instance of the path planner for each engine.

If your SimpleFOC node is able to receive CAN bus messages you can use the trajectory to perform trajectories ordered via CAN bus istead of Commander interface.

Brilliant! I hope I can contribute with a dedicated STM32 USB solution at some point, maybe similar to how Klipper does it. That should be first step in the dirrection toward Klipper syncronization and timed moves, and it will speed up the communication. Luckily openPnP does not need Klipper, so when I have the machine setup for PnP, it will be a good playground for further development.

Also, since openPnP need vacuum for pickup, a SFOC controlled pump/vacuum device is on my list. Usually these pumps are really loud, so if we can make it less noisy, that will be a great achievement.

A completely different approach would be to integrate a basic Gcode visual GUI into OPnP and use that Java code base for CNC/3D motion?

I just remembered. In order for the stepper to overcome the magnet cogging and have a minimum of torque, the planner should not make the torque zero out, when it reaches the end of a move.

Came across this in Hacker news today, you guys might be interested

1 Like

What simplefoc has in common with Mario64 and Zelda Finding the BEST sine function for Nintendo 64 - YouTube

The Vesc way for reference

Huh, looks like VESC approximates using a quadratic and therefore uses just multiplication and addition… I wonder how accurate it is… I’ll try to add it to the comparisons tonight

1 Like

I had one thought.
Calculating sin and cos real-time can be slow.
A look up table is faster but uses flash memory.
Could calculating the look up table at start up and storing it in ram consume less flash overall?

In principle this kind of idea can work, but only if the LUT fits into actual RAM.

In practice, the MCUs have a lot less real RAM than flash on the one hand, and on the other hand the flash memory is normally mapped into the regular address space, and the MCU can execute code and read data sections from these address ranges.
This leads to the situation that the LUT, when used, is accessed directly from flash memory (or its cache), and doesn’t use space in RAM as well…

But there could be MCU architectures where calculating the LUT makes sense. It would also be a way to make the precision (e.g. number of entries and hence memory used) user-controllable.

1 Like

It’s even more complicated then that.
Some mcus have zero wait state for accessing the flash, some only for a portion of the flash. (e.g gd32)

Here it finally is:

This PR includes:

  • _sincos function, as requested, and replacement of all the code-parts using both sine and cosine to use the new function.
  • “Deku65i” version of _sin() replaces the original version
  • normalizeAngle is removed from the code where no longer needed

If Antun has no objections it should make it into the next release.


Incidentally, I also tested the VESC version, with only the sine part of the code, and while I didn’t copy the result to paste here I can report it performed worse than all the other options except stdlib sine, HAL function CORDIC and SimpleFOC sin + normalizeAngle.
In fact, it was also a bit worse than the arm_math.h sine, so they should have just used that. Maybe we can contribute our solution back to them?
Of course I didn’t spend loads of time investigating this, and also I tested STM32G471 while VESC is STM32F4 IIRC. So maybe my results aren’t representative for what they’re getting. :man_shrugging:

And I’ve now merged the PR to the dev branch :slight_smile: :partying_face: :champagne:

So optimised sine is now a reality, as is the possibility to use _sincos, for example in combination with the CORDIC NOWAIT on STMs that have it. That will be a pretty optimal solution!

I’ve tested it (and it was working fine) on SAMD21 - it seems to me that the main loop gained about 5% speed bump.

I would kindly ask anyone who’s following this thread and uses the dev branch of SimpleFOC to do a pull and try out the new code in your setup.
If possible, please record the main loop speed before and after (for example in iterations per second, or microseconds for 1000 iterations or something like this).
And if possible please test the default version of the dev branch before overriding it with your own even more optimised versions :smiley:

This is a big change for us, as it is of course a very central part of the code and will affect all users. We’d greatly appreciate some feedback from people before releasing it in the next SimpleFOC release.


Maybe this is the reason why it’s slower

But wouldn’t it be the case for all the algorithms?


Yeah, the way I wrote my timing test code, all the functions get called as non-inline functions.

They could all be inlined, but it should make no difference to their relative performance. It would just make each version a little bit faster.

Inlining is something you can do with any non-virtual function - to save on the overhead of creating a stack frame and jumping the instruction counter to the other function and back. For tiny functions, it can be a net gain in space and speed, but for most functions it’s tradeoff - a little speed for more space used by the compiled code.

1 Like

Has anyone looked at faster implementation of atan2 ? (e.g. Odrive)

It could make Space Vector PWM faster.
[EDIT] Hmm, I see discussions hinting about SVPWM implementations that don’t even use atan2.

Happy to add a more optimised function :slight_smile:
We can also make a weakly bound _atan2() in the same way as sin/cos/sqrt and default it to the current implementation.

That way you can easily plug in your more optimised but possibly hardware-specific code.

The STM FOC bible mentions this in chapter 4.11:

That seems to run after the Clarke Transform like in the SinePWM implementation of SimpleFOC, and doesn’t use atan2 at all.
I don’t understand much… but it’s probably worth investigating.

But still a fast atan2 can be useful somewhere else.