Work on STM32 SPI implementation #677

josesimoes · 2018-05-03T15:01:34Z

Description

Add CLR events to STM32 SPI native code to allow for smoth Tx/Rx of large buffers
Add SPI master CLR event
Add target configuration for SPI in order to use static buffers
Fix calls to cache helpers

Motivation and Context

Improvements with SPI native base class implementation
Allow large SPI buffers without blocking CLR execution

How Has This Been Tested?

Screenshots

Types of changes

Improvement (non-breaking change that improves a feature, code or algorithm)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.
I have read the CONTRIBUTING document.
I have added tests to cover my changes.
All new and existing tests passed.

Signed-off-by: josesimoes [email protected]

nfbot · 2018-05-03T15:01:36Z

Hi @josesimoes,

I'm nanoFramework bot.
Thank you for your contribution!

A human will be reviewing it shortly. 😉

- Add CLR events to STM32 SPI native code to allow for smooth Tx/Rx of large buffers - Add sanity check for buffer overflow - Add SPI master CLR event - Add code to assert/deassert chip select line (ChibiOS driver doesn't implement that) - Add SPI configuration for all SPI enabled targets - Fix calls to cache helpers Signed-off-by: josesimoes <[email protected]>

piwi1263

With a stable spin on the SPI Display on the 429 board for running more than a day, this looks good, but the performance must be approved. Instead of 15 secs the test program runs now for 26 secs.

ghost · 2018-05-07T07:30:11Z

You still merge with a nearly 50% performance penalty ?

josesimoes · 2018-05-07T08:01:02Z

I do because (in no particular order):

It's now using CLR events making it work for any buffer size without locking the CLR execution. With the previous implementation (which was calling the driver sync API) the thread was blocking the execution making the managed app to freeze until the SPI transaction was completed (read+write, etc). Humanly unnoticeable for small buffer sizes but would start becoming noticeable as buffer sizes grow.
Considering that the previous implementation was lacking the CLR events, thus not being entirely functional, one can hardly use it for comparison. I could go and remove the CLR events on any of the other class libraries using them (serial, I2C, ADC...) they would complete their execution quicker too, but then, developers would be complaining (has they rightfully did in the past) that «the app freezes when I call SPI.Write/Read etc».
Performance is important and should be in the top places of the priority list but it can not / should not be satisfied at the expense of bending the base rules such as guaranteeing shared and equal thread execution time slots. nanoFramework is not a real time OS.
This 'poor' performance can be improved. For example I can shave ~4 seconds on the above test without changing a single line of code by, simply, switching the CLR heap to the internal RAM instead of the external SDRAM that is using now (SDRAM has slower access, for obvious reasons).
For truly high performance drivers (like the LCD one used in the mentioned test) they have to be coded in C/C++ not in C#.

Having said all that, any of these changes isn't simple, plain, black and white. nanoFramework it's a rather complex system and everything that one changes has the potential to cause ramifications that sometimes are not that obvious. Like with any other system of this nature it's not perfect and one can't always have the "best of both worlds". Anyways this is software and it's not written with fire on stone, so it can be changed at any time. 😺

ghost · 2018-05-07T10:52:46Z

I think that you are right on the analysis but wrong on the solution.

Indeed, it can be a problem if/when the managed app is freezing during large transfers. But... Except for very specific uses, like graphic drivers, the size of the transmitted data is not that big. I have no real idea of which size could be given, but let's say that 100 bytes is more or less a good value.

With your latest changes, everybody will have that 50% penalty (rouglhy rounded up, I admit), regardless of the size of the transmitted data. When they would not have noticed anything before, now they will see that penalty on the long run. Granted, noone will see 50% less performance, but the slow-down will still be noticeable.
Think at sensors like GPS, ADC, or similar : they don't send 4K buffers but rather 10, 20 or even 100 bytes at each reading, not much more. But the frequency of the readings will have to be lower, now.

Edit: what I want to say here is that 100% of the people will have the performance penalty where only a few % would have had it before (those sending large buffers only).

You are right about the managed graphic driver performances, no doubt about this. But this example is perfectly showing the difference in performance that will impact everyone now with these changes.

I personnally would have prefered either the choice sync/async or some kind of "education", teaching that sending large buffer can be counter-productive and that the managed program should try to handle large buffers in chuncks.
But even here, perhaps the temporary freeze can be safely "ignored", it will depend on the application.

ghost · 2018-05-07T10:56:59Z

Anyway, that's useless discussion since the PR is closed and only people subscribed to this channel will see it.

ghost · 2018-05-07T11:11:13Z

Also, could you elaborate on this, please ?

Considering that the previous implementation was lacking the CLR events, thus not being entirely functional,

josesimoes · 2018-05-07T13:13:11Z

I agree with your assessment on the use cases and the performance impact of adding the CLR events to properly deal with large buffers.
Providing that we can do it (in a reasonable manner without too much craziness in the code and/or configuration files) I advocate that the "system" should be able to automatically adapt to these differences. Exposing options is not always a good approach becuase one could end up adding a bunch of configuration options that make it more difficult to the board architect understand those, potentially require a lot of documentation and such.

In this particular case it's quite easy to tackle this without exposing extra configuration options and having to produce massive documentation. It's a simple matter of evaluating if the requested SPI transaction will take longer than a thread execution slot. If it is: setup a CLR event for continuation (the DMA will nicelly handle it in the background) if it's not then perform it in a single stroke without any further ceremony.
This is the plan since the begining. But, because all this require a set of consecutive changes in the code, I tought it would be better to do those in steps, being able to validate each change set.
The final set is almost done and a PR is eminent.

On your question above: not having the CLR events to deal with large buffers the implementation would be freezing the execution thus not being entirely functional on those situations.

josesimoes added Type: enhancement Area: Interpreter Everything related with the interpreter, execution engine and such Status: under review Series: STM32xx Everything related specifically with STM32 targets labels May 3, 2018

josesimoes requested a review from piwi1263 May 3, 2018 15:01

josesimoes force-pushed the rework-stm32-spi branch 3 times, most recently from 81c4ab9 to 59f430b Compare May 3, 2018 17:06

josesimoes force-pushed the rework-stm32-spi branch from 59f430b to 20e6f05 Compare May 3, 2018 17:48

piwi1263 approved these changes May 4, 2018

View reviewed changes

josesimoes added Status: merge OK and removed Status: under review labels May 4, 2018

josesimoes merged commit a41e6d7 into nanoframework:develop May 7, 2018

josesimoes deleted the rework-stm32-spi branch May 7, 2018 07:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Work on STM32 SPI implementation #677

Work on STM32 SPI implementation #677

josesimoes commented May 3, 2018

nfbot commented May 3, 2018

piwi1263 left a comment

ghost commented May 7, 2018

josesimoes commented May 7, 2018

ghost commented May 7, 2018 •

edited by ghost

Loading

ghost commented May 7, 2018

ghost commented May 7, 2018 •

edited by ghost

Loading

josesimoes commented May 7, 2018

Work on STM32 SPI implementation #677

Work on STM32 SPI implementation #677

Conversation

josesimoes commented May 3, 2018

Description

Motivation and Context

How Has This Been Tested?

Screenshots

Types of changes

Checklist:

nfbot commented May 3, 2018

piwi1263 left a comment

Choose a reason for hiding this comment

ghost commented May 7, 2018

josesimoes commented May 7, 2018

ghost commented May 7, 2018 • edited by ghost Loading

ghost commented May 7, 2018

ghost commented May 7, 2018 • edited by ghost Loading

josesimoes commented May 7, 2018

ghost commented May 7, 2018 •

edited by ghost

Loading

ghost commented May 7, 2018 •

edited by ghost

Loading