
Audio Feature
Automatic extraction of factual, cultural, and high-level music descriptions have been a subject of intense study in the MIREX music information retrieval experimental audio matching methods rather than extracting a high-level melody feature from audio. The authors suggest that low-level audio methods outperform symbolic methods even when clean symbolic information is available as in this task.
Because there is a great number of music recordings available that can be used as a first stage input to a high-level music description system, this motivates work on extracting high-level music features from low-level audio content. The MIREX community extends the range of tasks that are evaluated each year, allowing for valuable knowledge to be gained on the limits of current algorithms and techniques.
Low-Level Audio Features
The third strategy for content-based music description is to use the information in the digital audio. Low-level audio features are measurements of audio signals that contain information about a musical work and music performance. They also contain extraneous information due to the difficulty of precisely measuring just a single aspect of music, so there is a trade off between the signal-level description and the high-
level music concept that is encoded.
In general, low-level audio features are segmented in three different ways: frame based segmentations (periodic sampling at 10 ms-1000 ms intervals), beat-synchronous
segmentations (features are aligned to musical beat boundaries), and statistical measures that construct probability distributions out of features (bag of features models). Many low-level audio features are based on the short-time spectrum of the audio signal.