Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work on STM32 SPI implementation #677

Merged
merged 1 commit into from
May 7, 2018

Conversation

josesimoes
Copy link
Member

Description

  • Add CLR events to STM32 SPI native code to allow for smoth Tx/Rx of large buffers
  • Add SPI master CLR event
  • Add target configuration for SPI in order to use static buffers
  • Fix calls to cache helpers

Motivation and Context

  • Improvements with SPI native base class implementation
  • Allow large SPI buffers without blocking CLR execution

How Has This Been Tested?

Screenshots

Types of changes

  • Improvement (non-breaking change that improves a feature, code or algorithm)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

Signed-off-by: josesimoes [email protected]

@josesimoes josesimoes added Type: enhancement Area: Interpreter Everything related with the interpreter, execution engine and such Status: under review Series: STM32xx Everything related specifically with STM32 targets labels May 3, 2018
@josesimoes josesimoes requested a review from piwi1263 May 3, 2018 15:01
@nfbot
Copy link
Member

nfbot commented May 3, 2018

Hi @josesimoes,

I'm nanoFramework bot.
Thank you for your contribution!

A human will be reviewing it shortly. 😉

@josesimoes josesimoes force-pushed the rework-stm32-spi branch 3 times, most recently from 81c4ab9 to 59f430b Compare May 3, 2018 17:06
- Add CLR events to STM32 SPI native code to allow for smooth Tx/Rx of large buffers
- Add sanity check for buffer overflow
- Add SPI master CLR event
- Add code to assert/deassert chip select line (ChibiOS driver doesn't implement that)
- Add SPI configuration for all SPI enabled targets
- Fix calls to cache helpers

Signed-off-by: josesimoes <[email protected]>
Copy link
Member

@piwi1263 piwi1263 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a stable spin on the SPI Display on the 429 board for running more than a day, this looks good, but the performance must be approved. Instead of 15 secs the test program runs now for 26 secs.

@josesimoes josesimoes merged commit a41e6d7 into nanoframework:develop May 7, 2018
@josesimoes josesimoes deleted the rework-stm32-spi branch May 7, 2018 07:25
@ghost
Copy link

ghost commented May 7, 2018

You still merge with a nearly 50% performance penalty ?

@josesimoes
Copy link
Member Author

I do because (in no particular order):

  • It's now using CLR events making it work for any buffer size without locking the CLR execution. With the previous implementation (which was calling the driver sync API) the thread was blocking the execution making the managed app to freeze until the SPI transaction was completed (read+write, etc). Humanly unnoticeable for small buffer sizes but would start becoming noticeable as buffer sizes grow.
  • Considering that the previous implementation was lacking the CLR events, thus not being entirely functional, one can hardly use it for comparison. I could go and remove the CLR events on any of the other class libraries using them (serial, I2C, ADC...) they would complete their execution quicker too, but then, developers would be complaining (has they rightfully did in the past) that «the app freezes when I call SPI.Write/Read etc».
  • Performance is important and should be in the top places of the priority list but it can not / should not be satisfied at the expense of bending the base rules such as guaranteeing shared and equal thread execution time slots. nanoFramework is not a real time OS.
  • This 'poor' performance can be improved. For example I can shave ~4 seconds on the above test without changing a single line of code by, simply, switching the CLR heap to the internal RAM instead of the external SDRAM that is using now (SDRAM has slower access, for obvious reasons).
  • For truly high performance drivers (like the LCD one used in the mentioned test) they have to be coded in C/C++ not in C#.

Having said all that, any of these changes isn't simple, plain, black and white. nanoFramework it's a rather complex system and everything that one changes has the potential to cause ramifications that sometimes are not that obvious. Like with any other system of this nature it's not perfect and one can't always have the "best of both worlds". Anyways this is software and it's not written with fire on stone, so it can be changed at any time. 😺

@ghost
Copy link

ghost commented May 7, 2018

I think that you are right on the analysis but wrong on the solution.

Indeed, it can be a problem if/when the managed app is freezing during large transfers. But... Except for very specific uses, like graphic drivers, the size of the transmitted data is not that big. I have no real idea of which size could be given, but let's say that 100 bytes is more or less a good value.

With your latest changes, everybody will have that 50% penalty (rouglhy rounded up, I admit), regardless of the size of the transmitted data. When they would not have noticed anything before, now they will see that penalty on the long run. Granted, noone will see 50% less performance, but the slow-down will still be noticeable.
Think at sensors like GPS, ADC, or similar : they don't send 4K buffers but rather 10, 20 or even 100 bytes at each reading, not much more. But the frequency of the readings will have to be lower, now.

Edit: what I want to say here is that 100% of the people will have the performance penalty where only a few % would have had it before (those sending large buffers only).

You are right about the managed graphic driver performances, no doubt about this. But this example is perfectly showing the difference in performance that will impact everyone now with these changes.

I personnally would have prefered either the choice sync/async or some kind of "education", teaching that sending large buffer can be counter-productive and that the managed program should try to handle large buffers in chuncks.
But even here, perhaps the temporary freeze can be safely "ignored", it will depend on the application.

@ghost
Copy link

ghost commented May 7, 2018

Anyway, that's useless discussion since the PR is closed and only people subscribed to this channel will see it.

@ghost
Copy link

ghost commented May 7, 2018

Also, could you elaborate on this, please ?

Considering that the previous implementation was lacking the CLR events, thus not being entirely functional,

@josesimoes
Copy link
Member Author

I agree with your assessment on the use cases and the performance impact of adding the CLR events to properly deal with large buffers.
Providing that we can do it (in a reasonable manner without too much craziness in the code and/or configuration files) I advocate that the "system" should be able to automatically adapt to these differences. Exposing options is not always a good approach becuase one could end up adding a bunch of configuration options that make it more difficult to the board architect understand those, potentially require a lot of documentation and such.

In this particular case it's quite easy to tackle this without exposing extra configuration options and having to produce massive documentation. It's a simple matter of evaluating if the requested SPI transaction will take longer than a thread execution slot. If it is: setup a CLR event for continuation (the DMA will nicelly handle it in the background) if it's not then perform it in a single stroke without any further ceremony.
This is the plan since the begining. But, because all this require a set of consecutive changes in the code, I tought it would be better to do those in steps, being able to validate each change set.
The final set is almost done and a PR is eminent.

On your question above: not having the CLR events to deal with large buffers the implementation would be freezing the execution thus not being entirely functional on those situations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Interpreter Everything related with the interpreter, execution engine and such Series: STM32xx Everything related specifically with STM32 targets Status: merge OK Type: enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants