Embedded World 2023 - STM32 CORDIC CO-PROCESSOR

Embedded World 2023 starts tomorrow

March 14-16.

You can also attend virtually…

I will try to push for STM CORDIC support for SimpleFOC. Considering the wide STM support SimpleFOC is providing for free and OPEN, it would be really amazing if STM could help out by collaborating on the CORDIC, STM specific implementation. For sure they have some brilliant folks, who know how to set it up and possibly enhance the SimpleFOC performance on STM specific hardware.

Hey, I’m going to be there :slight_smile:

But the CORDIC and the SimpleFOC code really work on quite different paradigms, the CORDIC likes fixed point math, while SimpleFOC uses floating point… I think we discussed it before, but at the moment SimpleFOC uses a interpolated lookup table based approach - that’s not slow, and the accuracy seems to be sufficient…
One would have to do a fairly deep dive into the whole thing, probably needing some test code, to see if there would be any benefit to using the CORDIC, either in terms of accuracy, or in terms of speed. My feeling is it won’t be a clear win.

Yes, something those STM engineers will most likely see as a fun exercise :smiley:

I thought you would attend since its relatively close. I did think about making the trip, its just too far. Keep us posted on cool things!

Edit: Im not saying they should alter the core functions, but maybe they do see some use for it eg. in combination with sensor data/readings/trigonometry


Up to two 32bit input arguments

If using eg. 21bit anglesensor, does decimal numbers provide any real improvement over using the CORDIC?

As the CORDIC is a AHB BUS slave, it can possible fetch Quadrature Encoder data through DMA requests. That is without MCU intervention. The result can be converted to a float if need be.


Ok, let´s get to the point here. I believe we could use the CORDIC without braking anything in the repo., but that will most likely require a

case FOCModulationType::SinePWM_CORDIC :

The problem in a nutshell:

TheFieldStack will rely on the STM32G4 hardware encoder which, as far as im aware, will output a angle.

The timer, when configured in Encoder Interface mode provides information on the sensor’s
current position. Dynamic information can be obtained (speed, acceleration, deceleration)
by measuring the period between two encoder events using a second timer configured in
capture mode.

In this regard, it is probably best to use TIM2 (32bit timer) in capture mode?
NEOpix LED will then use the Adafruit lib. without to much RGB fading.
TIM2 can also be set up as encoder interface, but will have to solder in a jumper. The pins are broken out, but the idea was to use TIM3 as encoder interface.

Source page 1278

Is this angle/data output a float? I assume not.
Edit: The hardware encoder interface is a timer/counter, so the output is determined by the timer settings and resolution. This counter value then needs conversion into angle_el (depending on the pole_pair).

So we would like to do some sin/cos math with this angle_el (angle, converted to angle_el + offset)
Note: It is probably best to line up the mechanical zero with the sensor zero when calibrating/setting up the sensor → less calculations.

Is this not where the CORDIC is meant to shine? sine and cosine ?

The question is, if the conversion to float takes longer than the _sin/_cos math. (which we would probably need to do anyways).

It should be possible to measure the performance:

/* Read systick counter /
start_ticks = SysTick->VAL;
Write angle /
Read cosine /
cosOutput = (int32_t)LL_CORDIC_ReadData(CORDIC);
Read sine /
sinOutput = (int32_t)LL_CORDIC_ReadData(CORDIC);
Read systick counter /
stop_ticks = SysTick->VAL;
Calculate number of cycles elapsed */
elapsed_ticks = start_ticks-stop_ticks;


Ahh, diving into the heart :slight_smile:

I see now, the algorithm does not use sin() cos() Arduino functions

float _sin(float a){
  if(a < _PI_2){
    //return sine_array[(int)(199.0f*( a / (_PI/2.0)))];
    //return sine_array[(int)(126.6873f* a)];           // float array optimized
    return 0.0001f*sine_array[_round(126.6873f* a)];      // int array optimized
  }else if(a < _PI){
    // return sine_array[(int)(199.0f*(1.0f - (a-_PI/2.0) / (_PI/2.0)))];
    //return sine_array[398 - (int)(126.6873f*a)];          // float array optimized
    return 0.0001f*sine_array[398 - _round(126.6873f*a)];     // int array optimized
  }else if(a < _3PI_2){
    // return -sine_array[(int)(199.0f*((a - _PI) / (_PI/2.0)))];
    //return -sine_array[-398 + (int)(126.6873f*a)];           // float array optimized
    return -0.0001f*sine_array[-398 + _round(126.6873f*a)];      // int array optimized
  } else {
    // return -sine_array[(int)(199.0f*(1.0f - (a - 3*_PI/2) / (_PI/2.0)))];
    //return -sine_array[796 - (int)(126.6873f*a)];           // float array optimized
    return -0.0001f*sine_array[796 - _round(126.6873f*a)];      // int array optimized

I wonder how this computes next to the CORDIC?

Picking sin value from a look up table is very fast and portable.
Now what this code does is just using the fact that sin is just 4 repeating patterns to minimize the size of the look up table if I am not wrong.

1 Like

By the way look at this, I am still amazed :star_struck:

1 Like

K, lets time it.

this function, using this value: float angle_el = 0.47936899621429443359375;

angle_el = _normalizeAngle(angle_el);


elapsed_ticks:   198
Normalized angle:   0.48

These functions, same angle_el value:

angle_el = _normalizeAngle(angle_el);
float _ca = _cos(angle_el);
float _sa = _sin(angle_el);


elapsed_ticks:   423
Normalized angle:   0.48
_ca:   0.89
_sa:   0.46

Here is the HAL


Can someone help out with the STM32 HAL → PlatformIO CORDIC setup.

This compiles:

#include "stm32g4xx_hal_cordic.h"


  CORDIC_HandleTypeDef thisCordic;
  CORDIC_ConfigTypeDef sConfig;
  thisCordic.Instance = CORDIC;
  sConfig.Function = CORDIC_FUNCTION_SINE;  
  sConfig.Precision = CORDIC_PRECISION_6CYCLES;
  sConfig.Scale = CORDIC_SCALE_0;
  sConfig.NbWrite = CORDIC_NBWRITE_1;
  sConfig.NbRead = CORDIC_NBREAD_2;
  sConfig.InSize = CORDIC_INSIZE_16BITS;
  sConfig.OutSize = CORDIC_OUTSIZE_16BITS;

But how should I write to the CORDIC and read the result?

The CORDIC HAL does not have a InitTypeDef

I would assume, that I can call : HAL_CORDIC_Init(&thisCordic);

But it throws this error : undefined reference to `HAL_CORDIC_Init’

Mkay, this compiles:

static void CORDIC_Config(void)

CORDIC_ConfigTypeDef sConfig;
thisCordic.Instance = CORDIC;

 if (HAL_CORDIC_Init(&thisCordic) != HAL_OK)
    /* ADC initialization error */

sConfig.Function = CORDIC_FUNCTION_SINE;  
sConfig.Scale = CORDIC_SCALE_0;
sConfig.NbWrite = CORDIC_NBWRITE_1;
sConfig.NbRead = CORDIC_NBREAD_2;
sConfig.InSize = CORDIC_INSIZE_16BITS;
sConfig.OutSize = CORDIC_OUTSIZE_16BITS;

  //HAL_CORDIC_Configure(&thisCordic, &sConfig);

if (HAL_CORDIC_Configure(&thisCordic, &sConfig) != HAL_OK)
    /* Channel Configuration Error */


I found a good source here

That link is for ADC analog Watchdog? Not sure how useful it will be in this case…

To enable the CORDIC functionality, add a -D HAL_CORDIC_MODULE_ENABLED to your build flags. That should make these functions visible.

The HAL layer exposes different APIs to deal with the CORDIC.

Simplest is probably calling HAL_CORDIC_Calculate, which will submit the parameters to the CORDIC and wait for the result. Using the asynchronous APIs will be very complicated in conjunction with the SimpleFOC code, which expects to call _sin() or _cos() and get the result back when the function call returns.

Perfect. I still have to learn my way around this .ini thing. Thanks you!

I know its for the ADC. The code broke as soon as I put it in the setup(); so it was not a real solution. Your trick seems to work. Now compiling with :smiley:

void setup() {

Okay, here it is

This is just int32_t data in and two int32_t out, like this:

CORDIC->WDATA = input_q31_sin;
cordic_sine = CORDIC->RDATA;
cordic_cosine = CORDIC->RDATA;

This is fast, but the math.h lib is heavy

radian:   0.48
elapsed_ticks:   29
Cordic_sine:   990459904.00
Cordic_cosine:   1905432576.00

Lets convert those with this:

#define q31_to_f32(x) ldexp((int32_t) x, -31)
float converted_cordic_sine = 0.0f;
float converted_cordic_cosine = 0.0f;

converted_cordic_sine = q31_to_f32(cordic_sine);
converted_cordic_cosine = q31_to_f32(cordic_cosine);

Hurray! SimpleFOC is faster :stuck_out_tongue:

elapsed_ticks:   957
Cordic_sine:   990459904.00
Cordic_cosine:   1905432576.00
converted_Cordic_sine:   0.46
converted_Cordic_cosine:   0.89

Lets try to run 2 cycles and not 6…

elapsed_ticks:   953
Cordic_sine:   994251008.00
Cordic_cosine:   1903432192.00
converted_Cordic_sine:   0.46
converted_Cordic_cosine:   0.89

Lets try a different approach. Above conversion was taken from the ST video series (PART2) on the CORDIC. This should convert the int32_t to float (f32) as described here.

value_f32_sine = (float)cordic_sine/(float)0x8000000;
value_f32_cosine = (float)cordic_cosine/(float)0x8000000;

Muy muy better…

elapsed_ticks:   29
Cordic_sine:   994251008.00
Cordic_cosine:   1903432192.00
converted_Cordic_sine:   7.41
converted_Cordic_cosine:   14.18

So how can we convert the f32 format to something usable by SFOC?

Hmmm… another 0 and this is what I get:

elapsed_ticks:   29
Cordic_sine:   994251008.00
Cordic_cosine:   1903432192.00
converted_Cordic_sine:   0.46
converted_Cordic_cosine:   0.89

Weird how that conversion takes 0 SysTick´s?

Bumped the CORDIC back to 6 cycles:

elapsed_ticks:   33
Cordic_sine:   990459904.00
Cordic_cosine:   1905432576.00
converted_Cordic_sine:   0.46
converted_Cordic_cosine:   0.89

I love how they call it a co-processor :smiley:

float is what SimpleFOC uses…

To time the routines, do 1000 or 10000 conversions with the CORDIC in a tight loop, and time the whole loop. Then do the same for the SimpleFOC _sin() function call.
These times will be comparable and have some degree of validity. Trying to time individual calls isn’t really possible, the inaccuracy involved with getting the micros() values is too high to give you meaningful results.

Then the other test would be to output the results of the calculations for both methods in steps of 0.001 radians, from 0 to 2PI, and put these numbers into MS Excel. Then you can make a column with the sine values according to Excel’s sine() function (which will hopefully be fairly accurate) and then make columns for the differences - and sum the absolute value of the errors. That will give you a measure with which you can compare the accuracy of both methods (relative to MS Excel :smiley: ).

If you feel like doing all this work it would give a decent comparison between these two methods, and determine whether it makes sense to think about including it in the library…

How is that possible with floats? I see two decimals e.g. 0.46

Wouldn’t it make sense to increase the granularity? As long as the relationship between the two is preserved, when setting the phase voltage

Edit: decimals are rounded by the monitor.

Float can represent much better than .001 steps until the integer portion gets very large. Serial monitor just formats it with 2 digits. But it’s still 8 bits short of that q31 CORDIC output :slight_smile: Although I don’t like that q31 can’t represent a true 1. I always used 2.30 fixed-point for my sin/cos tables in the past. But some ARM processors have special instructions for multiplying q31 and q15 formats that do the multiply and shift down in a single cycle, so that’s why they use it.

Sure would be nice if SimpleFOC would have used fixed-point from the start… 16-bit angles are awesome. You can load them as either unsigned or signed to represent the same angle as 0-360 degrees or -180 to +180 degrees (often better for differences between angles), and they wrap around naturally, or you can use a 32-bit number with a union to operate on the whole value or access the full rotations and fractional portion separately (equivalent to getMechanicalAngle or _normalizeAngle in SimpleFOC). So much more convenient than working in radians, even aside from being massively faster. Of course doing an interpolated sin/cos table lookup is also super fast, for MCUs without CORDIC.

Right! I always forget that. Then we’re good to go. I’ll set up the test.

How do you propose I collect the data?

I guess there are limits to what we can achieve with tha CORDIC without breaking the whole repo apart.

Hmm, I’m not sure how you could go about transferring that volume of data to the computer… it certainly should be possible to stream incoming serial data into a file rather than outputting to the screen like serial monitor, but I’ve never done USB communication in a Windows program before so I’d have to do some studying.

But it would be easy to run the whole process on the MCU and only output the average errors. Try this:

float sfError = 0, cordicError = 0;
int count = 0;
for (float angle = -_2PI; angle <= _2PI; angle += 0.001, count++)
	float stlSin = sinf(angle);
	float sfSin = _sin(angle) - stlSin;
	float cordicSin = ((float)cordic_sine/(float)0x8000000) - stlSin;
	sfError += sfSin*sfSin;
	cordicError += cordicSin*cordicSin;
Serial.print("SimpleFOC sin RMS error x 1000000: ");
Serial.println(sqrtf(sfError / count) * 1e6f);
Serial.print("CORDIC sin RMS error x 1000000: ");
Serial.println(sqrtf(cordicError / count) * 1e6f);
1 Like

Oh, this is just when you print them. The internal precision is 23 bits for the mantissa and 8 for the exponent, and 1 sign bit.
You can print the values more precisely if you want Serial.print(val, 4); // 4 decimal digits
But internally they’re always stored with full precision. and the code is compiled with full precision, so if you say float x = 0.0001; this is represented as precisely as possible, even if you print it with less precision…

Yes yes, i know i know, just rub it in. I always forget that annoying monitor digit rounding :stuck_out_tongue: