New major release - radio / youtube / data quality distillation
Published on September 3, 2019 by Alexander Veysov
Read more here.
TLDR:
- 855 GB (in .wav format in int16) non archived;
- (new!) A new domain - radio;
- (new!) A larger YouTube dataset with 1000+ additional hours;
- (new!) A small (300 hours) YouTube dataset downloaded in maximum quality;
- (new!) 18 hours in 3 validation sets for YouTube / books / public calls with ground truth annotations;
- See the distilled files with "bad" data in this issue;