Benchmarks & Datasets

Summary

sLM-21

Datasets

Set Language Dataset Source Type Train Set (Duration/Speakers Test Set (Duration Speakers) Dev Set (Duration/Speakers)
lexical(sWuggy) English
syntactic(sSIMI) English
semantic/synthetic English
semantic/librispeech English audiobook
Train-Librispeech English Librispeech audiobook libriSpeech,Libri-light, etc.