Open Collective
Open Collective
Loading
New major release - radio / youtube / data quality distillation
Published on September 3, 2019 by Alexander Veysov

Read more here.

TLDR:

  1. 855 GB (in .wav format in int16) non archived;
  2. (new!) A new domain - radio;
  3. (new!) A larger YouTube dataset with 1000+ additional hours;
  4. (new!) A small (300 hours) YouTube dataset downloaded in maximum quality;
  5. (new!) 18 hours in 3 validation sets for YouTube / books / public calls with ground truth annotations;
  6. See the distilled files with "bad" data in this issue;