Public Repository

Last pushed: a year ago
Short Description
Estimating arousal, valence, age, gender, big 5 personality traits from audio
Full Description

RESTful webservice developed by Hesam Sagha, Chair of Intelligent and Complex Systems, University of Passau, Germany. Open source code and more info at

To run this module run:
docker run -it --rm -p 8888:8080 audioanalysis


getdims: desired dimensions separated by comma (arousal,valence,age,gender,big5O,big5C,big5E,big5A,big5N)
url: the url of the video/audio or the name of the uploaded file
timing: start and end of the segments (in seconds). start1,end1;start2,end2

To upload an audio/video file use curl:
Windows: curl -v -H "Content-Type:multipart/form-data" --user meuser -i -X POST -F "file=@D:\path\to\sample.wav" http://localhost:8888/er/aer/upload
Linux: curl -v -H "Content-Type:multipart/form-data" --user meuser -i -X POST -F 'file=@./sample.wav' http://localhost:8888/er/aer/upload

Moreover, this repository handles the fusion of audio and video outputs.
Run this command to fuse the results of audio and video outpus:
wget "localhost:8080/er/general/fuse?video=cat json_video_plain.txt&audio=cat json_audio_plain.txt"
In which the files should have the following entities.
Note: keep ':time=start,end' in the "@id" section.
See http://localhost:8888/er/general for more information


In case of using this module, please cite the following papers:

  • EYBEN, F., WENINGER, F., GROSS, F., AND SCHULLER, B. Recent Developments in openSMILE, the Munich Open-Source Multimedia Feature Extractor. In Proceedings of the 21st ACM International Conference on Multimedia, MM 2013 (Barcelona, Spain, October 2013), ACM, ACM, pp. 835–838.
  • SCHMITT, M., RINGEVAL, F., AND SCHULLER, B. At the Border of Acoustics and Linguistics: Bag-of-Audio-Words for the Recognition of Emotions in Speech. In Proceedings INTERSPEECH 2016, 17th Annual Conference of the International Speech Communication Association
Docker Pull Command