Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-warm WaveNet on creation over the size of the receptive field #71

Merged
merged 1 commit into from
Sep 13, 2023

Conversation

mikeoliphant
Copy link
Contributor

This PR switches WaveNet from using the current anti-pop method to pre-warming the model over the size of the receptive field on creation.

This has the advantage that the pre-warming isn't done during real-time processing, and it also does not need the volume ramped-in transition.

I used a single-sample buffer for simplicity, and to be conservative since we do not yet know the buffer size. I figure there isn't much cross-sample optimization happening anyway. This could be changed if we get around to telling the model its maximum audio buffer size.

@sdatkinson
Copy link
Owner

Punchline first: I'd recommend that if you've got a project that needs this, that project should define its own function to handle it like (copying your code...)

void warmup(DSP* model) {
  long receptive_field = model->get_receptive_field();  // Looks like this needs to be defined but not unreasonable to do so I think.
  NAM_SAMPLE sample = 0;
  NAM_SAMPLE* sample_ptr = &sample;
  std::unordered_map<std::string, double> param_dict = {};
  for (long i = 0; i < receptive_field; i++)
  {
    model->process(&sample_ptr, &sample_ptr, 1, 1, 1.0, 1.0, param_dict);
    model->finalize_(1);
    sample = 0;
  }
}

The reason why is that this won't work for all of the models that are currently supported by this library (specifically, parametric models). But also, I'm just not confident that it should be enshrined as "the way" to do this--like you said, this would incur a processing spike as the new net gets up to speed while an existing net continues to process samples in the RT loop. Since CPU usage is a priority for some folks, I don't want to box them out.

Other high-level comments:

  • By the same argument, yes, it doesn't make sense for WaveNet to own anti-popping. Something a little better would be to refactor anti-popping up to the abstract class that owns all of the NAM architectures. Even better would be to remove it entirely and have it be its own DSP module similar to the function I sketched above. The benefit of this is that it would be possible to run this utility on basically any model's output. (Parenthetically, one breaking refactor I think I intend to do after putting the "version 0.0.1" stake in the ground is to remove some of the yucky stuff hanging around like these methods.
  • I'm thinking about your comment in 61 where you pointed out that if you wanted something completely continuous, you'd have to feed in the samples that are currently being processed by the active model. But also, it seems that some discontinuity is inevitable if you're loading in a different model (which, as I consider it, seems like a given--why would one stage the same model as the one that's already running?) So some sort of either fade or ramp-in is inevitably necessary?
    • While I'm thinking about it, it also kind of would make sense to empower people to be able to implement a "hot switch" that pops if they thought it was preferable to the signal cut-out that the anti-pop does. Another argument in favor of factoring this out of inside the (NAM) DSP class 🙂

A few lower-level comments:

there isn't much cross-sample optimization happening anyway

Mmh, not necessarily--some of the element-wise calculations like the activation should be able to be vectorized (putting aside for a moment how well that's done right now). WaveNet is different from an LSTM in that way--you can vectorize the layer calculations instead of having to walk a single sample all the way through the net before starting on the next.

This could be changed if we get around to telling the model its maximum audio buffer size.

This is something I'd like to do, yes--_set_num_frames_ includes .resize()s that could break the cardinal DSP sin of doing allocations in the RT thread. I chose to do this (not knowing quite how serious folks were about this--but also sort of "well let's just see if it's fine anyways" about it back when I wrote it). The reason I did it anyways is that resize doesn't do anything if the requested size is the same as the current size (most calls after the first) and doesn't do any allocations if it already has capacity enough (e.g. if there's a short buffer on a loop in the DAW). (I haven't actually checked any of this.) But yes, I'd love to define e.g. set_buffer_size() to be called e.g. OnReset() in NeuralAmpModelerPlugin and clean it up.

@mikeoliphant
Copy link
Contributor Author

Punchline first: I'd recommend that if you've got a project that needs this, that project should define its own function to handle it like (copying your code...)

I guess I can't see what kind of project wouldn't need an anti-pop solution. And the solution is model-type-specific. LSTM models don't need anything at all. WaveNet is dependent on the size of the receptive field. It doesn't seem ideal to push that complexity outside of the core NAM code.

The reason why is that this won't work for all of the models that are currently supported by this library (specifically, parametric models)

I haven't used (or seen) any parametric models - can you explain why doing a pre-warm wouldn't work in that case and the current anti-pop would?

this would incur a processing spike as the new net gets up to speed while an existing net continues to process samples in the RT loop.

The fact that the pre-warm is done in the background (and potentially on a lower-priority thread) is one of the key benefits. It doesn't take cycles away from the real-time processing throughput (which the current anti-pop method does do ).

@sdatkinson
Copy link
Owner

I guess I can't see what kind of project wouldn't need an anti-pop solution.

A few examples off the top of my head:

  • A case where the model starts with volume at zero.
  • Cases where warmup is used instead and is judged to be adequate.

There's a lot of use cases that beyond the implementation in NeuralAmpModelerPlugin, which is only intended as an example.

I haven't used (or seen) any parametric models - can you explain why doing a pre-warm wouldn't work in that case and the current anti-pop would?

The value of params you gave, as an empty dict, would be invalid. In terms of e.g. a model like the amp in the parameteric modeling video, the params would look like

{
 "Gain": 2.0,
 "Low": 6.0,
 "Med": 3.0,
 "High": 7.0,
 "Presence": 4.0,
 "Resonance": 4.0,
 "Master": 2.5
}

and providing an empty dict wouldn't be what the model expects.

By contrast, the current anti-pop only needs the output buffer to operate on.

The fact that the pre-warm is done in the background (and potentially on a lower-priority thread) is one of the key benefits.

Good point. Though the code I suggested could be run in that same thread if that's a consideration for you? (Like I said, there are multiple valid ways to assess the trade-off, and I don't want the code to force one over the others).

@mikeoliphant
Copy link
Contributor Author

I guess I can't see what kind of project wouldn't need an anti-pop solution.

A few examples off the top of my head:

  • A case where the model starts with volume at zero.
  • Cases where warmup is used instead and is judged to be adequate.

Sorry - by "anti-pop" solution I meant either the current one, or the pre-warm. It seems like you would always want something and not the un-stabilized initial behavior.

providing an empty dict wouldn't be what the model expects

It seems like in that case, it would make sense to pass an initial set of parameters at model creation time, and pre-warm on that.

My key points remain:

  • Pre-warm gives a better user experience (immediate transition instead of volume cut-out and ramp-up)
  • Pre-warm does not happen during real-time processing
  • Pre-warm and anti-pop requirements are model type and model configuration specific
  • This kind of complexity is best handled transparently by the NAM core

From my perspective as a NAM core API consumer, I just want it to work - I don't care about model types, receptive fields, etc...

@sdatkinson
Copy link
Owner

[Thanks for your patience / Apologies as some of this will probably be repeating myself from the discussion on #61]

I see the rationale for your key points, and I personally agree with many of them. The key sticking point is whether this would be a breaking change.

I checked back through the code to see if there's a chance that this might "just work". While it happens to be the (convenient!) case that passing an empty params dict to process() causes the model to continue using whatever parameters it was using previously (realizing this made me sincerely excited that I might be able to accept this!), the sticking point is that parametric models don't provide initial parameter values on construction, so this change (removing the anti-pop specifically) would likely cause an undesirable change in the behavior of those models (breaking? sort of a gray area in terms of behavior like that, but subjectively I'd say that a pop would be bad to a musician so it's effectively a "bug"/breaking). That's the sticking point.

However, I'm kind of keen to put a stake in the ground and get on with making some breaking changes for v0.1. And I'd be happy to accept this PR under those conditions.

So here's what I plan on doing, in order:

  1. Do some last checks
  2. Release v0.0.1
  3. Merge your PR and we can get on with prioritizing snapshot models for v0.1.

Sound good?

@mikeoliphant
Copy link
Contributor Author

👍

Copy link
Owner

@sdatkinson sdatkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍🏻


long receptive_field = 1;
for (size_t i = 0; i < this->_layer_arrays.size(); i++)
receptive_field += this->_layer_arrays[i].get_receptive_field();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a receptive field getter not defined for WaveNet? That'd be handy/surely this exists somewhere else in the code?

...looks like no! Huh.

@sdatkinson
Copy link
Owner

On toward v0.1!

@sdatkinson sdatkinson merged commit 885a535 into sdatkinson:main Sep 13, 2023
@mikeoliphant
Copy link
Contributor Author

👍

@mikeoliphant mikeoliphant deleted the prewarm-wavenet branch October 2, 2023 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants