These are the timings for total loop times w. <TrapezoidalPlanner.h>
time per iterasion: 17.0903
time per iterasion: 17.0944
time per iterasion: 17.0943
time per iterasion: 17.0951
Compared to:
time per iterasion: 14.5107
time per iterasion: 14.5110
time per iterasion: 14.5109
Cant wait to try this feed-forward thing. I see there is a branch in the repo with some feed-forward changes.
I suppose we are more or less ready for OpenPnP integration. Its been a while since I looked at it, how it parses commands. Maybe dual motor Y axis still need some synchronization, possibly over CAN FD?
It receives commands using the Commander interface and then, based on the calculated trajectory, sends target commands to the linked engine. If you have more than one engine in your SimpleFOC node, you can have one instance of the path planner for each engine.
If your SimpleFOC node is able to receive CAN bus messages you can use the trajectory to perform trajectories ordered via CAN bus istead of Commander interface.
Brilliant! I hope I can contribute with a dedicated STM32 USB solution at some point, maybe similar to how Klipper does it. That should be first step in the dirrection toward Klipper syncronization and timed moves, and it will speed up the communication. Luckily openPnP does not need Klipper, so when I have the machine setup for PnP, it will be a good playground for further development.
Also, since openPnP need vacuum for pickup, a SFOC controlled pump/vacuum device is on my list. Usually these pumps are really loud, so if we can make it less noisy, that will be a great achievement.
A completely different approach would be to integrate a basic Gcode visual GUI into OPnP and use that Java code base for CNC/3D motion?
I just remembered. In order for the stepper to overcome the magnet cogging and have a minimum of torque, the planner should not make the torque zero out, when it reaches the end of a move.
Huh, looks like VESC approximates using a quadratic and therefore uses just multiplication and addition… I wonder how accurate it is… I’ll try to add it to the comparisons tonight
I had one thought.
Calculating sin and cos real-time can be slow.
A look up table is faster but uses flash memory.
Could calculating the look up table at start up and storing it in ram consume less flash overall?
In principle this kind of idea can work, but only if the LUT fits into actual RAM.
In practice, the MCUs have a lot less real RAM than flash on the one hand, and on the other hand the flash memory is normally mapped into the regular address space, and the MCU can execute code and read data sections from these address ranges.
This leads to the situation that the LUT, when used, is accessed directly from flash memory (or its cache), and doesn’t use space in RAM as well…
But there could be MCU architectures where calculating the LUT makes sense. It would also be a way to make the precision (e.g. number of entries and hence memory used) user-controllable.
Incidentally, I also tested the VESC version, with only the sine part of the code, and while I didn’t copy the result to paste here I can report it performed worse than all the other options except stdlib sine, HAL function CORDIC and SimpleFOC sin + normalizeAngle.
In fact, it was also a bit worse than the arm_math.h sine, so they should have just used that. Maybe we can contribute our solution back to them?
Of course I didn’t spend loads of time investigating this, and also I tested STM32G471 while VESC is STM32F4 IIRC. So maybe my results aren’t representative for what they’re getting.
So optimised sine is now a reality, as is the possibility to use _sincos, for example in combination with the CORDIC NOWAIT on STMs that have it. That will be a pretty optimal solution!
I’ve tested it (and it was working fine) on SAMD21 - it seems to me that the main loop gained about 5% speed bump.
I would kindly ask anyone who’s following this thread and uses the dev branch of SimpleFOC to do a pull and try out the new code in your setup.
If possible, please record the main loop speed before and after (for example in iterations per second, or microseconds for 1000 iterations or something like this).
And if possible please test the default version of the dev branch before overriding it with your own even more optimised versions
This is a big change for us, as it is of course a very central part of the code and will affect all users. We’d greatly appreciate some feedback from people before releasing it in the next SimpleFOC release.
Yeah, the way I wrote my timing test code, all the functions get called as non-inline functions.
They could all be inlined, but it should make no difference to their relative performance. It would just make each version a little bit faster.
Inlining is something you can do with any non-virtual function - to save on the overhead of creating a stack frame and jumping the instruction counter to the other function and back. For tiny functions, it can be a net gain in space and speed, but for most functions it’s tradeoff - a little speed for more space used by the compiled code.
Happy to add a more optimised function
We can also make a weakly bound _atan2() in the same way as sin/cos/sqrt and default it to the current implementation.
That way you can easily plug in your more optimised but possibly hardware-specific code.
That seems to run after the Clarke Transform like in the SinePWM implementation of SimpleFOC, and doesn’t use atan2 at all.
I don’t understand much… but it’s probably worth investigating.
But still a fast atan2 can be useful somewhere else.