PRiSM Music Gesture Recognition is a software tool for creating musical gesture datasets and real-time recognition of musical gestures based on audio input. It utilizes machine learning techniques to classify and interpret musical gestures, enabling applications in interactive music performance, composition, and more.
- Custom Gesture Samples: Create and record custom gesture samples for personalized datasets.
- Machine Learning Model Training: Train a machine learning model with your own gesture recordings.
- Real-Time Recognition: Utilize pre-trained models for real-time gesture recognition from audio input.
- OSC/MIDI Output: Send recognition results through OSC or MIDI for further musical application.
- Playback Mapping: Map recognized gestures to audio playback for interactive experiences.
- Persistence: Save and load trained machine learning models and configurations for consistent performance.
- Multi-Channel Support: Accommodate multiple input channels for diverse audio setups.
-
Visit the GitHub release page for PRiSM Music Gesture Recognition.
-
Download the latest software package compatible with your system. (Currently available for macOS only.)
-
Unzip and move the application to your Applications folder.
-
Open PRiSM Music Gesture Recognition from your Applications.
Troubleshooting: If you encounter a security warning, please refer to Apple's guide on opening an app from an unidentified developer.
- Press
AudioStatu
to configure your audio settings. - Enable or disable input channels as required.
- Engage or bypass compression as needed.
Important: Consistent sampling rates are crucial for reliable results.
- Choose a directory for your recordings with
SelectFolder
. - Create a new gesture with
Create
. Ensure unique, simple names.
Make sure the gesture name is unique and does not contain spaces and other special characters! - Record samples with
Record
. Optionally, useAmp
for automatic triggering. - Save your samples with
Save
. - Review and play samples using the dropdown menu and
Play
.
Using your existing dataset: If you choose a folder that already has the gesture sample files, ensure that the files in the folder follow the same naming convention: name-label-index
. The 'name' can be any preferred name, but by default, it is the folder name. The 'label' is the actual gesture name used for training and prediction purposes. The 'index' is not as crucial here and is mostly used for reference.
Note: You can use the dropdown menu
to select a saved sample and the Play
button to listen to it.
Apply random pitch and time stretch to existing samples to generate new files and enhance the dataset.
- Set the number of files to generate with
NumFiles
. - Enable and set random pitch range with
PitchRange
. - Enable and set random time streth range with
StrethRange
. - Click the
Activate
button to preprocess the data.
Click the Activate
button to preprocess the data.
Adjust Spectrum Components
in the setting window for more detailed feature sets
- Initiate training with
Train
and monitor the loss levels. - Stop training manually or let it auto-stop at a loss below 0.05.
- Enable real-time recognition with
Prediction
. - Save your model with
Save
and load withLoad
.
Click the Setting
button to open the setting window where you can find adjustable parameters.
Some parameters can be controlled with OSC messages, the receive port is 1123
Parameter Name | Description | Default Value | Range | OSC address |
---|---|---|---|---|
On Threshold | Amplitude gate level, used to trigger listening. | -39dB | -60dB - 0dB | /OnThreshold |
Off Threshold | Amplitude gate level, used to ending listening and trigger prediction. | -59dB | -60dB - 0dB | /OffThreshold |
Accuracy Threshold | Filtering the predict result below the threshold. | 0. | 0. - 1. | /AccuracyThreshold |
Timer | The system reports after listening. If the timer is shorter than the default, it refreshes the buffer and forces a prediction. If longer than the default, it is disabled. | Default is the longest duration in the training files but no more than 10 seconds. | 50ms - 10000ms | /Timmer |
Spectum Components | The number of frequency components in the spectrogram to be included in the feature. | f0, f1 | f0 - f7 | 🚫 Will need to re-preprocess data and re-train the model once this is changed |
Prediction | Disable and enable prediction | 0 | 0 / 1 | /Prediction |
Test your model with the Player & Validation module.
- Using the
dropdown menu
andPlay
button to test the trained model with saved samples. - Set the number of validation and toggle to enable the automatic random validation. After the auto-validation is finished, it will display each gestuers' accuracy and
average accuracy
.
Click the OSC
button to enable OSC output and open the OSC setting window to configure the OSC IP address and port.
By default, the recognition results are sent to 127.0.0.1:9001
with the message address /PRiSM_GR
.
Click the MIDI
button to enable MIDI output and open the MIDI setting window to configure the MIDI output, for instance, change the output MIDI Channel.
The recognition results are automatically mapped to MIDI notes starting from MIDI note 60
. For example, the first gesture corresponds to MIDI note 60
, the second gesture to MIDI note 61
, and so on.
Click the Audio
button to enable gesture audio playback and open the Gesture AudioPlayer window to configure the Gestures-Audio mapping.
- After training is finished, the gesture-audio cells will spawn.
- Click the
SelectFolder
button to load the playback audio files folder. - Using each cell to enable/disable and configure the mapping.
Contributions to the PRiSM Music Gesture Recognition project are welcome! If you encounter any issues or have ideas for improvements, please submit them as GitHub issues or create a pull request with your proposed changes.
PRiSM Music Gesture Recognition is licensed under the MIT License. You are free to use and distribute the software in accordance with the terms of the license. If you use this in your project kindly give credit to the RNCM PRiSM team.
This work is supported by PRiSM, The RNCM Centre for Practice & Research in Science & Music, funded by the Research England fund Expanding Excellence in England (E3). About