Geoffroy Peeters

Project leader - Audio and Music Indexing Research Projects - IRCAM R&D

Main Technologies Publications Supervision Commitees Projects Teaching Test-Sets


Technologies

You will find here a description of the technology (C++ software or C++ libraries) related to automatic music description that we produce. If you are interrested in integrating these technologies into your product please contact IRCAM marketing department.

ircambeat (for automatic beat, downbeat, tempo, meter, ... estimation)

Ircambeat performs the automatic estimation of the global and time-variable tempo and meter of a music file, as well as the estimation of the position of the beats and downbeats in a music file. For this, each digital music file is analyzed in terms of its time and frequency content in order to detect salient musical events. Periodicities of the musical events are then analyzed over time at various scales to get the tempo and meter. Beats and downbeats positions are estimated using music templates based on machine learning and musical theory, to get a precise time positioning.

Click here for more information on ircambeat

ircamsummary (for music structure discovery estimation and music audio summary generation)

Ircamsummary performs the automatic generation of music audio summaries. It uses various strategies: the most representative extract (in terms of content repetition and content position), down-beat synchronous concatenation of the most representative parts. The summary can also be parameterized by the user in terms of duration of the summary (from 10s to 30s). Ircamsummary also provides the estimation of the structure of music files in terms of repetition of parts (such as verse, chorus bridge ... but without explicit labeling of the parts). For this, Ircamsummary extracts the timbral, harmonic and rhythmic content of a music file over time and analyzes content repetition using two strategies: sequence repetition and state repetition. The generation of the audio summary is parametrizable in type (continuous summary/or summary obtained by concatenating the most informative parts) and in duration. The estimation of the structure is parametrizable in terms of number of parts and part's type (sequence or state).

Click here for more information on ircambeat

ircamkeychord (for key and chords estimation)

Ircamchord performs the automatic estimation of the chord succession of a music track using a 24 chord dictionary (C-Major, C-minor ...). For this, the harmonic content of a music file is first extracted in a beat-synchronous way. A statistical model (double-state hidden Markov model) representing music theory (chord transition), expected downbeat positions and estimated local-key is used for a precise estimation.

Automatic classification (for auto-tagging -genre, mood, instrumentation-)

Ircammusicgenre and Ircammusicmood are based on the Ircamclassifier technology. Ircamclassifier allows to learn new concepts related to music contents by training on example databases. For this, a large set of audio features are extracted from labeled music items and are used to find relationships between the labels and the example audio contents. Ircamclassifier uses over 500 different audio features, performs automatic feature selection and statistical model parameter selection. Ircamclassifier uses a full-binarization process of the labels and a set of SVM classifiers. Mono-labeling and multi-labeling are obtained from the set of SVM decisions. Performances and computation time of the resulting trained system are then optimized for a specific tasks given a ready-to-use system for music-genre or musicmood.

ircamaudiosim (for search in a database by acoustic similarity)

Ircamaudiosim estimates the acoustical similarity between two audio tracks. For this, each music track of a database is first analyzed in terms of its acoustical content (timbre, rhythm, harmony). An efficient representation of this content is used, that allows a fast comparison between two music tracks. Because of this, the system is scalable to large databases. Given a target music track, the most similar (in terms of acoustical content) items of the database can be found quickly and then be used to provide recommendation to the listener.

AudioPrint (for search in a database by audio fingerprint identification)

AudioPrint is an efficient technology for live or offline recognition of musical tracks, within a database of learnt tracks. It captures the acoustical properties of the audio signal by computing a symbolic representation of the sound profile that is robust to common alterations. Moreover, it provides a very precise estimation of the temporal offset within the detected musical track. This offset estimation can be used as a means to synchronize devices.