The evaluation results can be accesssed in multiple ways. The results can be accessed as a dictionary or a Pandas.DataFrame (Section 3.1), or they can be rendered as Bokeh plots (Section 3.2). Also, the ground truth samples and the generations can be rendered to piano roll, audio or midi files (Section 3.3).
Note All codes provided below in this section is available here
The evaluation results can be accessed as a dictionary or a Pandas.DataFrame. The following numerical results are automatically computed and compiled into a dictionary or pandas dataframe:
- Quality of Hits: Hit counts for gt and pred samples, as well as, cross comparison of hits (accuracy, PPV, ...)
- Quality of Velocities: Velocity distributions for gt and pred samples
- Quality of Offsets: Offset distributions for gt and pred samples
- Rhythmic Distances: Distance of pred samples from gt samples using the rhythm distance metrics implemented in HVO_Sequence (l1, l2, cosine, hamming, ...)
- Global features: Global features of the gt and pred samples, these features are the features implemented in HVO_Sequence (NoI, Midness, ...)
These results can be accessed as raw data, that is, per sample values extracted from the evaluation, or as aggregated statistics, that is, the mean and standard deviation of the per sample values.
Use the get_pos_neg_hit_scores(return_as_pandas_df=False)
method to access the following hit performance scores
for each of the samples or pairs of gt/pred samples:
- 'Relative - Accuracy' : [Accuracy of Hits in Predicted Sample 1 with respect to Ground Truth Sample 1, ...]
- 'Relative - Precision'
- 'Relative - Recall'
- 'Relative - F1_Score',
- 'Relative - MCC (Hit/Silence Classification)',
- 'Relative - MCC (Correct Number of Instruments at each step)',
- 'Relative - TPR',
- 'Relative - FPR',
- 'Relative - PPV',
- 'Relative - FDR',
- 'Relative - Ratio of Silences Predicted as Hits',
- 'Relative - Ratio of Hits Predicted as Silences',
- 'Hit Count - Ground Truth', [Number of Hits in Ground Truth Sample 1, ...]
- 'Hit Count - Total Predictions', [Number of Hits in Predicted Sample 1, ...]
- 'Hit Count - True Predictions (Matching GMD)', [Number of Hits in Predicted Sample 1 that match the ground truth, ...]
- 'Hit Count - False Predictions (Different from GMD)' [Number of Hits in Predicted Sample 1 that do not match the ground truth, ...]
Note The analysis of hits is done in terms of total counts and also their location of occurrence.
These features are automatically computed and compiled into a dictionary or pandas dataframe (using the return_as_pandas_df
argument).
hit_scores = evaluator_test_set.get_pos_neg_hit_scores(return_as_pandas_df=False)
Moreover, boxplot statistics (mean
, std
, median
, q1
, q3
, min
, max
) of the per sample/sample-pair collections can be automatically
calculated:
hit_scores = statistics_of_hit_scores = evaluator_test_set.get_statistics_of_pos_neg_hit_scores(
hit_weight=1, trim_decimals=1, csv_file="demos/GrooveEvaluator/misc/hit_scores.csv")
The results are available as a pandas dataframe and can be also stored to a csv file if needed
Relative - Accuracy ... Hit Count - False Predictions (Different from GMD)
mean 0.8 ... 44.0
std 0.1 ... 32.0
min 0.5 ... 0.0
max 0.9 ... 130.0
median 0.8 ... 47.0
q1 0.8 ... 24.2
q3 0.9 ... 55.0
[7 rows x 16 columns]
Similar to the hit scores, the velocity distributions can be accessed as raw data or as aggregated statistics.
Note The analysis of velocities is done on the
mean
andstd
of velocity valuaes in each of the samples. To better understand the generations, the velocities are analyzed at (1) all locations (2) locations correpondint to true hits (3) locations corresponding to false hits.
To get the velocity distributions as raw data, use the get_velocity_distributions(return_as_pandas_df=False)
method:
velocitiy_distributions = evaluator_test_set.get_velocity_distributions(return_as_pandas_df=False)
Using this method, the following distributions are extracted:
- 'Means Per Sample - Total Ground Truth Hits',
- 'STD Per Sample - Total Ground Truth Hits',
- 'Means Per Sample - Total Predicted Hits',
- 'Means Per Sample - True Predicted Hits',
- 'Means Per Sample - False Predicted Hits',
- 'STD Per Sample - Total Predicted Hits',
- 'STD Per Sample - True Predicted Hits',
- 'STD Per Sample - False Predicted Hits'
The results are available as a dictionary or pandas dataframe and can be also stored to a csv file if needed:
Means Per Sample - Total Ground Truth Hits ... STD Per Sample - False Predicted Hits
0 0.668933 ... 0.164607
1 0.732283 ... 0.133475
2 0.594488 ... 0.000000
To get the boxplot statistics (mean
, std
, median
, q1
, q3
, min
, max
):
statistics_of_velocitiy_distributions = evaluator_test_set.get_statistics_of_velocity_distributions(
trim_decimals=1, csv_file="demos/GrooveEvaluator/misc/vel_stats.csv")
Similar to the velocity distributions, the offset distributions can be accessed as raw data or as aggregated statistics.
offset_distributions = evaluator_test_set.get_offset_distributions(return_as_pandas_df=False)
statistics_of_offsetocitiy_distributions = evaluator_test_set.get_statistics_of_offset_distributions(
trim_decimals=1, csv_file="demos/GrooveEvaluator/misc/offset_stats.csv")
The rhythmic distances are computed as the distances between the hits in the ground truth and the hits in the predicted sample. Different distance measures are used for calculating the rhythmic distances:
- 'cosine-distance': cosine distance between pred and gt using hit, velocity and offsets combined
- 'cosine-similarity: 1 - cosine_distance
- 'fuzzy_hamming_distance-not_weighted': fuzzy hamming distance between pred and gt using velocity and offset information
- 'fuzzy_hamming_distance-weighted': metrically weighted fuzzy hamming distance
- 'hamming_distance -5partKit_not_weighted ': hamming distance between pred and gt using velocity information only
- 'hamming_distance -5partKit_weighted ',
- 'hamming_distance -all_voices_not_weighted ',
- 'hamming_distance -all_voices_weighted ',
- 'hamming_distance -low_mid_hi_not_weighted ',
- 'hamming_distance -low_mid_hi_weighted ',
- 'l1_distance -h': l1 norm between pred and gt using hit information only
- 'l1_distance -hvo': l1 norm between pred and gt using hit, velocity and offset information
- 'l1_distance -o': l1 norm between pred and gt using offset information only
- 'l1_distance -v': l1 norm between pred and gt using velocity information only
- 'l2_distance -h': l2 norm between pred and gt using hit information only
- 'l2_distance -hvo',
- 'l2_distance -o',
- 'l2_distance -v',
- 'structural_similarity-structural_similarity':
The distances are available as both raw data or as aggregated statistics.
rhythmic_distances = evaluator_test_set.get_rhythmic_distances_of_pred_to_gt(return_as_pandas_df=False)
rhythmic_distances_statistics_df = evaluator_test_set.get_statistics_of_rhythmic_distances_of_pred_to_gt(
tag_by_identifier=False, csv_dir="demos/GrooveEvaluator/misc/distances", trim_decimals=3)
Global rhythmic features are extracted from both the ground truth and the predicted samples. The following features are extracted:
- 'Statistical::NoI',
- 'Statistical::Total Step Density',
- 'Statistical::Avg Voice Density',
- 'Statistical::Lowness',
- 'Statistical::Midness',
- 'Statistical::Hiness',
- 'Statistical::Vel Similarity Score',
- 'Statistical::Weak to Strong Ratio',
- 'Statistical::Poly Velocity Mean',
- 'Statistical::Poly Velocity std',
- 'Statistical::Poly Offset Mean',
- 'Statistical::Poly Offset std',
- 'Syncopation::Combined',
- 'Syncopation::Polyphonic',
- 'Syncopation::Lowsync',
- 'Syncopation::Midsync',
- 'Syncopation::Hisync',
- 'Syncopation::Lowsyness',
- 'Syncopation::Midsyness',
- 'Syncopation::Hisyness',
- 'Syncopation::Complexity',
- 'Auto-Correlation::Skewness',
- 'Auto-Correlation::Max',
- 'Auto-Correlation::Centroid',
- 'Auto-Correlation::Harmonicity',
- 'Micro-Timing::Swingness',
- 'Micro-Timing::Laidbackness',
- 'Micro-Timing::Accuracy'
The features are available as both raw data or as aggregated statistics:
global_features = evaluator_test_set.get_global_features_values(return_as_pandas_df=False)
get_statistics_of_global_features_df = evaluator_test_set.get_statistics_of_global_features(
calc_gt=True, calc_pred=True, csv_file="demos/GrooveEvaluator/misc/global_features_statistics.csv", trim_decimals=3)
The results in section 3.1 can also be automatically rendered as Bokeh plots. These plots are violin plots, super-imposed with boxplots and the raw scatter data. The plots are separated by Tabs for each set of analysis results.
Note All codes provided below in this section is available here
pos_neg_hit_plots = evaluator_test_set.get_pos_neg_hit_plots(
save_path="demos/GrooveEvaluator/misc/pos_neg_hit_plots.html",
plot_width=1200, plot_height=800,
kernel_bandwidth=0.05)
velocity_plots = evaluator_test_set.get_velocity_distribution_plots(
save_path="demos/GrooveEvaluator/misc/velocity_plots.html", plot_width=1200, plot_height=800,
kernel_bandwidth=0.05)
offset_plots = evaluator_test_set.get_velocity_distribution_plots(
save_path="demos/GrooveEvaluator/misc/offset_plots.html", plot_width=1200, plot_height=800,
kernel_bandwidth=0.05)
rhythmic_distances_plot = evaluator_test_set.get_rhythmic_distances_of_pred_to_gt_plot(
save_path="demos/GrooveEvaluator/misc/rhythmic_distances_plots.html", plot_width=1200, plot_height=800,
kernel_bandwidth=0.05)
evaluator_test_set.get_global_features_plot(only_combined_data_needed=False,
save_path="demos/GrooveEvaluator/misc/global_features_all.html",
plot_width=1200, plot_height=800,
kernel_bandwidth=0.05)
evaluator_test_set.get_global_features_plot(only_combined_data_needed=True,
save_path="demos/GrooveEvaluator/misc/global_features_combinedOnly.html",
plot_width=1200, plot_height=800,
kernel_bandwidth=0.05)
evaluator_test_set.get_velocity_heatmaps(
s=(2, 4), bins=[32 * 4, 64], regroup_by_drum_voice=True,
save_path="demos/GrooveEvaluator/misc/velocity_heatmaps.html")
Note All codes provided below in this section is available here
piano_rolls = evaluator_test_set.get_piano_rolls(save_path="demos/GrooveEvaluator/misc/piano_rolls.html")
Note The piano roll tabs correspond to the audio files generated
You can render ground truth and predicted audio files as seperate files, or as a single file where the gt audio is followed by 1 sec of silence and then the generated pattern.
# get audios - separate files for ground truth and predictions
audio_tuples = evaluator_test_set.get_audio_tuples(
sf_path="hvo_sequence/soundfonts/Standard_Drum_Kit.sf2",
save_directory="demos/GrooveEvaluator/misc/audios",
concatenate_gt_and_pred=False)
Note The audio file names correspond to the piano roll tabs
# get audios - a single file containing ground truth and predictions with a 1sec silence in between
audio_tuples = evaluator_test_set.get_audio_tuples(
sf_path="hvo_sequence/soundfonts/Standard_Drum_Kit.sf2",
save_directory="demos/GrooveEvaluator/misc/audios",
concatenate_gt_and_pred=True)
evaluator_test_set.export_to_midi(need_gt=True, need_pred=True, directory="demos/GrooveEvaluator/misc/midi")
Note All codes provided below in this section is available here
Use the get_logging_media
method to compile the logging media into a single dictionary.
logging_media = evaluator_test_set.get_logging_media()
Also, you can automatically export all the requested media given a save_directory path
logging_media = evaluator_test_set.get_logging_media(save_directory="demos/GrooveEvaluator/misc/logged))
The resulting dictionary has the following keys:
['hit_score_plots', 'velocity_distribution_plots', 'offset_distribution_plots',
'rhythmic_distance_plots', 'heatmap_plots', 'global_feature_plots',
'piano_roll_plots', 'audios']
Note The
get_logging_media
method only returns the media flags set toTrue
in theGrooveEvaluator
constructor. (flags:need_hit_scores
,need_velocity_distributions
,need_offset_distributions
,
need_rhythmic_distances
,need_heatmap
,need_global_features
,need_audio
,need_piano_roll
) If you want to get an artifact not requested in the constructor, you can manually request the artifact by passing the flag as a parameter ( exampleget_logging_media(need_piano_roll=True
) ).
The logging media can also be requested in a format easily compatible with WandB artifacts. To request this version,
pass the prepare_for_wandb
flag as True
in the get_logging_media
method.
logging_media_wandb = evaluator_test_set.get_logging_media(prepare_for_wandb=True)
To facilitate the use of the GrooveEvaluator
class, we provide a set of ready-to-use templates for the most common use cases.
Below are the available templates:
Template Name | Description | need_hit_scores | need_velocity_distributions | need_offset_distributions | need_rhythmic_distances | need_heatmap | need_global_features | need_audio | need_piano_roll |
---|---|---|---|---|---|---|---|---|---|
asf | All artifacts except audio | True | True | True | True | True | True | False | True |