Solos: A Dataset for Audio-Visual Music Source Separation and Localization
Solos is a YouTube gathered dataset containing music excerpts of players playing different instruments for auditions.
This dataset is complementary to other datasets of this nature such us MUSIC and MUSICes.
Solos provides frame-wise human skeletons for the soloist computed using OpenPose. For each frame we provide meaningful joints, namely, the upperbody joints + hand detection.
Solos has been presented at IEEE MMSP 2020
Slides of the presentation can be downloaded from: Google Drive PDFCategories
Solos contains the same categories as the URMP Dataset This is intended in order to be able to use URMP as test set.Dataset Statistics
In the following table we show the statistics of the dataset:
Category | # Recordings | Mean duration | Median resolution |
Violin | 66 | 6:16 | 1080x720 |
Viola | 55 | 5:31 | 1280x720 |
Cello | 134 | 7:21 | 640x480 |
DoubleBass | 58 | 8:53 | 1280x720 |
Flute | 48 | 4:00 | 640x360 |
Oboe | 53 | 5:45 | 1280x720 |
Clarinet | 49 | 3:23 | 640x360 |
Bassoon | 56 | 5:08 | 1280x720 |
Saxophone | 45 | 2:42 | 1280x720 |
Trumpet | 50 | 1:14 | 640x360 |
Horn | 50 | 5:11 | 1280x720 |
Trombone | 50 | 5:03 | 1280x720 |
Tuba | 41 | 2:49 | 640x360 |
TOTAL | 755 | 5:16 | 854x480 |
Results
In the following Table we show a comparison between Sound of Pixels trained in MUSIC, trained in SOLOS, trained in MUSIC and fine tuned in solos and a Multi-Head UNet trained on Solos.
SDR $\uparrow$ | SIR $\uparrow$ | SAR $\uparrow$ | |
SoP | $-3.76\pm4.00$ | $-1.45\pm4.68$ | $7.56\pm3.13$ |
SoP-Solos | $-2.98\pm5.07$ | $0.46\pm6.76$ | $6.37\pm2.94$ |
SoP-ft | $-2.57\pm4.99$ | $0.47\pm6.43$ | $6.89\pm2.48$ |
MHU-Net | $ -0.56\pm5.96 $ | $ 1.04\pm7.24 $ | $ 10.37\pm3.48 $ |
Citation
@inproceedings{montesinos2020solos, author = {Juan F. Montesinos and Olga Slizovskaia and Gloria Haro}, title = {Solos: A Dataset for Audio-Visual Music Analysis}, booktitle = {22st {IEEE} International Workshop on Multimedia Signal Processing, {MMSP} 2020, Tampere, Finland, September 21-24, 2020}, publisher = {IEEE}, year = {2020}, }