-
Notifications
You must be signed in to change notification settings - Fork 259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CubicSDR uses lots of CPU #150
Comments
Thanks; I'm guessing I can get rid of most of the CPU usage in SDRPostThread since it's processing the DC offset correction there -- which I believe the HackRF already handles. FFT seems about right; the rest I'll have to take a closer look and see; demodulator seems a lot more expensive than it should be. I may be able to reduce the FFT CPU as well by just letting FFTW do more of the work (larger FFT just crunch it visually) and easing up on the liquid-dsp decimation stages. |
So the "Auto" DC Offset correction doesn't do the trick? Yeah, FFTW is a powerhouse; it can handle the full bandwidth of the HackRF without much trouble so I'd be surprised to see it high on the list -- if I do some more work on just using simple half-band Liquid-DSP decimators and put way more load onto FFTW and just average the bins to fit the screen I think a few things will be achieved:
|
Ok, I noticed from https://github.com/jocover/SoapyHackRF/blob/master/HackRF_Settings.cpp#L214 that there's no built-in DC offset correction. So that will likely stay for now but I can give the option to turn it off to save some cycles. |
@bobobo1618 let me know if https://github.com/cjcliffe/CubicSDR/tree/soapysdr-pfbch branch helps at all. It should primarily improve running multiple demodulators. |
Very much doesn't help on the CPU front. Eats the CPU alive and is very jittery at 16MHz bandwidth and doesn't even let me set above that. It gives me this (when I enter 24000000):
I haven't updated my build in a long time so I'm not sure how much of this is unique to this branch. |
That sounds strange, I'm getting several streams at 12Msps here on my 2010 macbook without trouble; does it perform well up until that bandwidth? Did you build with release configuration? Also can you install fftw3 development files and build/install liquid-dsp with FFTW support to see if that helps -- this feature will heavily depend on FFTW being integrated I believe. |
@bobobo1618 looks like you're using a bundled app version? I've only released the soapy-pfbch in source so far and if you bundled the app locally it should be your copy of libliquid (most people don't have it, it's bundled automatically by the app configuration) Edit: on another note, the 16Mhz limit was removed many commits ago, are you seeing the new SoapySDR device selection dialog pop-up on startup? -- you may be building an old project.. |
Yup, locally built app bundle. Maybe it was but before I swapped it out it showed one thing and after I swapped it out it showed another so I'm dubious. As for performance, I think I identified where all my CPU is going. It seems it's all in the 'channelizer' and DC filter in SDRPostThread. The DC filter seems to be the most intensive of them but the channelizer isn't able to run in real time on its own either. I don't understand much of what I'm looking at here so I might be wrong but it looks like SDRPostThread is taking in the raw samples ( These might be dumb questions but can you somehow localize the DC filter to the narrow chunk of spectrum that's affected by the DC offset, so that the entire band isn't being filtered? Also can you apply the demodulator resampling and whatnot (filtering the used chunk of the spectrum) before running the channelizer? It seems these are expensive operations and reducing the amount of data they have to process could help. |
I'm checking with @jgaeddert to see if it's possible to disable unused channelizer channels; this is just a first pass but it's performing very well here; but I do have DC filter disabled for SDRPlay. The channelizer is what allows it to split up the bandwidth into amounts that the resampler can handle; without it the demodulator gets the full stream -- right now it's doing all the channels (and as a result I have to increase the channel size) but I've put in a request to see if I can just enable the channels I need -- hopefully then I can make a simple binary tree and cascade a set of channelizers to handle active areas more efficiently. Edit: on another note, there's probably enough room in the SDRThread now to do DC filtering there which may free up the post thread |
@bobobo1618 I tried 12Msps with the DC blocker on and it was jittery and unusable here -- Moving the DC Blocker to the SDRThread frees up enough resources for it to work with a few WBFM streams at 12Msps but the CPU usage is up about 15-20%. I've committed the change to the soapy-pfbch branch so let me know how it goes. Not sure if I can just apply the DC blocking to a portion of the band, but I could just hide the spike visually and apply DC correction when demodulating at the baseband.. Ultimately I'll probably create a few islands of channelizers for groups of streams with some re-samplers at the front end of each but I'd like to see how far I can push the channelizer implementation so I can optimize/generalize it for re-use later. |
The changes make it better but it's still pretty jittery and still eats a ton of CPU. The DC blocker appears to run better in this configuration (running time is only slightly higher than realtime) but the channelizer still hits the CPU limit. This is at 16MHz. I had a look at one of the SDR applications I know run well and it looks like it uses an FFT rotator rather than a FIR filter, which seems to perform rather better. Any chance that would make sense? |
From what I know it's using FFT internally to perform the channelization; the FIR is just the prototype filter for post-channelization stop-band suppression. That's why it's important to have liquid-dsp built with FFTW3 or else it will fall back to internal FFT which was rough here.. This is far from the final implementation so please keep building and posting results as I report tweaks; thanks for your help! |
Are there any special tricks you're using to build FFTW or LibLiquid? Liquid seems like it should just be automatically using FFTW if it's available but there don't seem to be calls to it coming out of the .dylib. And do you build FFTW with MPI or OpenMP or anything like that? |
Nothing too special, only points I would make with regards to fftw3 integration are:
If you have time to try tinkering with the library configuration flags and checking to find improvements that would be a good exercise to try at some point. Edit: going to have a go at adding the channelizer toggling support to liquid-dsp myself, wish me luck :) |
Already well into implementing the channel toggling.. :) This may be a lot easier than I expected. |
Ok.. so assuming I didn't just hallucinate; my liquid-dsp channelizer toggle patch I have here just let me run 12 evenly spaced 200khz FM streams at 12Msps on the SDRPlay on my macbook pro before starting to jitter (~80% CPU)... that's a fair improvement from 80% at 3. I was able to push it to almost 20 when grouping them a little closer together.. I'll update my liquid-dsp fork soon 😃 |
@bobobo1618 Update your CubicSDR soapysdr-pfbch checkout and try building my fork/branch of liquid-dsp at https://github.com/cjcliffe/liquid-dsp/tree/firpfbch_toggle_channels and let me know if that causes any magic for you |
Well it kinda did but it was really... Odd... Every single FM band I tuned to sounded exactly the same. Even if I clicked and dragged the display. It was like the demodulator was looping over a tiny band of the total spectrum. Everything displayed properly though. Really odd. Performance was great though! A few lost samples at 20MHz but that was it. |
@bobobo1618 there may be a practical channel limit which I've thoroughly uncapped, can you step up the bandwidth from 10Mhz->20Mhz and tell me where it goes wrong? |
@bobobo1618 also might be worth comparing https://github.com/cjcliffe/CubicSDR/releases/tag/0.1.9-alpha-pfbch-issue150 which will be running on the exact build I'm testing here. |
Hmm, yeah I'm getting some strange bleeds all over the place, likely some filtering issues -- I'll investigate :) |
Yeah, things are pretty unusable at the moment. The visual feedback is entirely disconnected from the audio from what I can tell. For example, in both of these screenshots, I can hear a 100% clear, pure demodulated FM signal, while the FFT and whatnot is showing me noise. Likewise, when I tune into a 'strong' signal, it's noise. |
Yeah I messed with the filter prototype to try and reduce CPU usage earlier, I'm going to take a look at that again. |
Hmm, so not my filter settings, I think what's going on is some extreme aliasing of my 400khz channels with 200khz FM stations, I'm going to tweak the channel bandwidth and see if I can make it fit the best non-aliased divisions it can. |
I haven't looked at the code, but looking at your callstack I think I see what you're doing, so you might consider using the firpfbch_crcf object rather than the firpfbch2_crcf. The firpfbch2 effectively runs the output at twice the rate of the firpfbch. So if you have an input sample rate of 16 MHz and are trying to break it into 16 channels, the firpfbch_crcf object will result in 16 channels each at 1 MHz while the firpfbch2 object will result in 16 channels each at 2 MHz. I can go into the details as to why later. |
@jgaeddert I'm primarily using the firpfbch2 since I'm assuming I wouldn't be able to demodulate signals that cross the channel boundaries -- right now I have it doing 400khz channels and I didn't see another way to have multiple 400khz demodulators with irregular placements and not cross any channel bounds than to have the bands overlap like that.. It's actually working perfectly except for my patch to try and reduce CPU usage which seems to just mash everything together.. I haven't done any deeper looking into what I've broken but it definitely gets the performance I was looking for :) |
@bobobo1618 @jgaeddert Going to try a slightly more complicated method of creating half-band resampler "islands" (grouping frequencies with bandwidth limit rules to decide splitting / merging / placement of resampling groups) essentially making my own "arbitrary channelizer". If anything I'd just like to see how it compares -- I feel like I'm trying to find a one-fits-all solution when I should just be making several paths and optimizations for various scenarios. Channelizer still seems like it's the easiest way if I can just figure out how to skip the work done in unused channels.. Edit: tried running a test set of resamplers and the CPU usage was higher for just a few channels than using firpfbch alone. |
To cover all the bases I've implemented firpfbch_crcf in place of firpfbch2_crcf as @jgaeddert recommended at https://github.com/cjcliffe/CubicSDR/tree/soapysdr-pfbch-single and it seems to perform pretty well; I notice some spots where the signals seem to alias and degrade a bit but overall it's a lot better than my firpfbch2_crcf hacks :) |
Okay, runs fine with two channels (stereo FM) at 16MHz (175% CPU) and even 20MHz (~200% CPU) now but only while the window is out of focus... When I bring the window into focus, CPU jumps to 240% and 290% respectively. I would've said this is due to rendering the FFT but CPU stays low even while the entire window is visible. Want a CPU profile? |
@bobobo1618 CPU profile would be great; I've found some spots where I can eliminate another very heavy decimation used for the upper left miniature visualizer -- When the app isn't focused it reduces the frame rate by adding a delay in the main thread so that's probably giving you enough CPU for it to squeak by. I'll have another update this afternoon that should cut down the CPU usage for the UI significantly; and then some more advanced updates for the waterfall soon which should help yet again. Edit: wow, yeah that's a nice view; and I'm guessing 50% of your CPU is going to the mini-vis in the upper left ;) |
Sorry, all of the above were at 16MHz. I can go higher if you want. |
I'm interested to see what it does with no demodulator at 20-24Mhz; I think that's where we'll find some issues. It looks like the DC filter is still chewing up a fair bit -- I need to see what I can do about that; DC filter only really needs to be applied to channel 0.. |
Yeah that looks like mostly DC filter.. I'm going to tackle reducing DC to channel 0 only since it's fairly in-line with the fix for the CPU usage in the mini-vis |
I take it the DC-blocking filter is the iirfilt_crcf call that's eating up the CPU? I haven't looked at the code in detail, but are you performing DC-blocking on each channel? or just the full stream before channelization? |
@jgaeddert I've been applying it to the entire stream up until now; but I just pushed a commit that no longer does that. @bobobo1618 soapysdr-pfbch-single has been updated and removes DC blocking from the SDR thread and only applies it to channel 0 when needed -- I'm working on using that same data to augment the waterfall/spectrum FFT to remove the visual spike. It's also now using the channel data to supply the demodulator mini-waterfall in the upper left which frees up some resources as well. |
@bobobo1618 that looks a fair bit better; are you able to demodulate any streams at 30Mhz? |
Nope. And I just noticed the frequency is being capped at 25MHz. The highest I'm able to get (a single) usable demodulated signal is 21MHz. An idea that may be helpful, could you use (or add an option to use) a polyphase decimator block when only a single demodulator is being run, rather than a channeliser? I played around with the GNU radio companion a bit and found the former a lot easier on the CPU. |
Actually I just noticed that the HackRF is only specified to reach 20MSPS... I think the issues I'm seeing can actually be safely attributed to hardware.... |
@bobobo1618 wow, I think I count 21 x 200khz Mono streams there; that's awesome progress 😃 Just doing some testing here and I also ran into the over-bandwidth issue -- if you put in a higher bandwidth than the device limit it doesn't correct the main sample rate value and just gets weird -- I accidentally set my RTL dongle to 12Mhz when I thought I was on SDRPlay and it took me a minute to figure out what was going on.. Will fix that up soon. |
It works but it starts to struggle at ~16 streams. It's still understandable at 21 but it's not really pleasant to listen to. It's fine at ~12 streams though, particularly when it's in the background. |
@bobobo1618 can you pull the latest soapysdr-pfbch-single branch and try tinkering with CubicSDRDefs.h on the following line:
And try some various rates other than 400khz? I'd be interested to see at 20Mhz input if there's an ideal channel/cpu performance ratio. At 400khz you're getting 50 channels and I'm thinking something like 500khz-800khz might be better but I'm unsure as I cap out at 12Mhz (30 channels) here. If you could do up a quick list of CPU results for each channel rate that would be great. Thanks! |
I tried changing it around but didn't see much difference. In fact, at least with the demodulator on, 400k used less CPU than 800k. The difference was tiny (5%?) though. |
@bobobo1618 ok that's good to hear; that means I can provide some adjustment for allowing higher demodulation rates without affecting immediate performance too much. |
@bobobo1618 I've merged everything down to https://github.com/cjcliffe/CubicSDR/tree/soapysdr-support branch so you can pull the latest updates from there now. |
going to close this one for now; will open some more specific optimization issues soon; thanks! |
On the profiling talk from #64
I ran some of the profiling tools on OS X and came up with this (times are a bit off since everything was built in debug mode):
TL;DR:
Working on digging into it more now.
The text was updated successfully, but these errors were encountered: