Embedded World 2023 - STM32 CORDIC CO-PROCESSOR

Juan-Antonio_Soren_E · May 5, 2023, 11:20am

Awesome! Now it does work

These are the timings for total loop times w. <TrapezoidalPlanner.h>

time per iterasion:   17.0903
time per iterasion:   17.0944
time per iterasion:   17.0943
time per iterasion:   17.0951

Compared to:

time per iterasion:   14.5107
time per iterasion:   14.5110
time per iterasion:   14.5109

Cant wait to try this feed-forward thing. I see there is a branch in the repo with some feed-forward changes.

I suppose we are more or less ready for OpenPnP integration. Its been a while since I looked at it, how it parses commands. Maybe dual motor Y axis still need some synchronization, possibly over CAN FD?

Juan-Antonio_Soren_E · May 5, 2023, 1:56pm

Here it is running max 50rad/s with 25rad/s/s acceleration/de-acceleration.

First move is to G20, second move is to G-400

JorgeMaker · May 5, 2023, 3:03pm

It receives commands using the Commander interface and then, based on the calculated trajectory, sends target commands to the linked engine. If you have more than one engine in your SimpleFOC node, you can have one instance of the path planner for each engine.

If your SimpleFOC node is able to receive CAN bus messages you can use the trajectory to perform trajectories ordered via CAN bus istead of Commander interface.

Juan-Antonio_Soren_E · May 6, 2023, 7:34am

Brilliant! I hope I can contribute with a dedicated STM32 USB solution at some point, maybe similar to how Klipper does it. That should be first step in the dirrection toward Klipper syncronization and timed moves, and it will speed up the communication. Luckily openPnP does not need Klipper, so when I have the machine setup for PnP, it will be a good playground for further development.

Also, since openPnP need vacuum for pickup, a SFOC controlled pump/vacuum device is on my list. Usually these pumps are really loud, so if we can make it less noisy, that will be a great achievement.

A completely different approach would be to integrate a basic Gcode visual GUI into OPnP and use that Java code base for CNC/3D motion?

Juan-Antonio_Soren_E · June 3, 2023, 10:12am

I just remembered. In order for the stepper to overcome the magnet cogging and have a minimum of torque, the planner should not make the torque zero out, when it reaches the end of a move.

Candas1 · June 5, 2023, 12:17pm

Came across this in Hacker news today, you guys might be interested

Candas1 · June 24, 2023, 10:21pm

What simplefoc has in common with Mario64 and Zelda https://youtu.be/xFKFoGiGlXQ

Candas1 · June 25, 2023, 6:35am

The Vesc way for reference

runger · June 29, 2023, 3:02pm

Huh, looks like VESC approximates using a quadratic and therefore uses just multiplication and addition… I wonder how accurate it is… I’ll try to add it to the comparisons tonight

Candas1 · June 29, 2023, 3:15pm

I had one thought.
Calculating sin and cos real-time can be slow.
A look up table is faster but uses flash memory.
Could calculating the look up table at start up and storing it in ram consume less flash overall?

runger · June 29, 2023, 3:22pm

In principle this kind of idea can work, but only if the LUT fits into actual RAM.

In practice, the MCUs have a lot less real RAM than flash on the one hand, and on the other hand the flash memory is normally mapped into the regular address space, and the MCU can execute code and read data sections from these address ranges.
This leads to the situation that the LUT, when used, is accessed directly from flash memory (or its cache), and doesn’t use space in RAM as well…

But there could be MCU architectures where calculating the LUT makes sense. It would also be a way to make the precision (e.g. number of entries and hence memory used) user-controllable.

Candas1 · June 29, 2023, 3:40pm

It’s even more complicated then that.
Some mcus have zero wait state for accessing the flash, some only for a portion of the flash. (e.g gd32)

runger · June 29, 2023, 11:24pm

Here it finally is:

change sine implementation to deku65i version by runger1101001 · Pull Request #285 · simplefoc/Arduino-FOC · GitHub

This PR includes:

_sincos function, as requested, and replacement of all the code-parts using both sine and cosine to use the new function.
“Deku65i” version of _sin() replaces the original version
normalizeAngle is removed from the code where no longer needed

If Antun has no objections it should make it into the next release.

runger · July 1, 2023, 9:54pm

Incidentally, I also tested the VESC version, with only the sine part of the code, and while I didn’t copy the result to paste here I can report it performed worse than all the other options except stdlib sine, HAL function CORDIC and SimpleFOC sin + normalizeAngle.
In fact, it was also a bit worse than the arm_math.h sine, so they should have just used that. Maybe we can contribute our solution back to them?
Of course I didn’t spend loads of time investigating this, and also I tested STM32G471 while VESC is STM32F4 IIRC. So maybe my results aren’t representative for what they’re getting.

runger · July 1, 2023, 10:00pm

And I’ve now merged the PR to the dev branch

So optimised sine is now a reality, as is the possibility to use _sincos, for example in combination with the CORDIC NOWAIT on STMs that have it. That will be a pretty optimal solution!

I’ve tested it (and it was working fine) on SAMD21 - it seems to me that the main loop gained about 5% speed bump.

I would kindly ask anyone who’s following this thread and uses the dev branch of SimpleFOC to do a pull and try out the new code in your setup.
If possible, please record the main loop speed before and after (for example in iterations per second, or microseconds for 1000 iterations or something like this).
And if possible please test the default version of the dev branch before overriding it with your own even more optimised versions

This is a big change for us, as it is of course a very central part of the code and will affect all users. We’d greatly appreciate some feedback from people before releasing it in the next SimpleFOC release.

Candas1 · July 2, 2023, 5:29am

Maybe this is the reason why it’s slower

But wouldn’t it be the case for all the algorithms?

runger · July 2, 2023, 8:52am

Hi,

Yeah, the way I wrote my timing test code, all the functions get called as non-inline functions.

They could all be inlined, but it should make no difference to their relative performance. It would just make each version a little bit faster.

Inlining is something you can do with any non-virtual function - to save on the overhead of creating a stack frame and jumping the instruction counter to the other function and back. For tiny functions, it can be a net gain in space and speed, but for most functions it’s tradeoff - a little speed for more space used by the compiled code.

Candas1 · August 27, 2023, 12:43pm

Has anyone looked at faster implementation of atan2 ? (e.g. Odrive)

It could make Space Vector PWM faster.
[EDIT] Hmm, I see discussions hinting about SVPWM implementations that don’t even use atan2.

runger · August 27, 2023, 2:29pm

Happy to add a more optimised function
We can also make a weakly bound _atan2() in the same way as sin/cos/sqrt and default it to the current implementation.

That way you can easily plug in your more optimised but possibly hardware-specific code.

Candas1 · August 27, 2023, 5:57pm

The STM FOC bible mentions this in chapter 4.11:

That seems to run after the Clarke Transform like in the SinePWM implementation of SimpleFOC, and doesn’t use atan2 at all.
I don’t understand much… but it’s probably worth investigating.

But still a fast atan2 can be useful somewhere else.