The following are the publicly available datasets produced as part of the MuMMER project. If you want access to any other project data, please contact 

  • Depth Images with Humans Dataset: This dataset has been created for human body landmark detection and human pose estimation from depth images. It contains over 250k synthetic images with annotations, and 3500 annotated real images.
  • The MuMMER dataset: This dataset contains 100 minutes of human/robot interaction in entertainment scenarios. This dataset was manually annotated for head faces and identities: all faces are annotated and each identity is consistent across the entire sequences and dataset. There are 500,000 annotated faces for 28 identities.

  • The Sound Source Localization for Robot dataset: This dataset is collection of real robot audio recordings with Pepper. It includes over 24 hours of audio data with labels of sound source locations as well as speaker IDs. It serves as a standard benchmark dataset for the research of learning-based sound source localization.