Panneaux/articles courts acceptés
Partenariats et subventions
ISMIR 2002 Web pages will be regularly updated
[Abstract1]This paper describes the design policy and specifications of the RWC Music Database, a music database that gives researchers freedom of common use and research use. Various commonly available databases have been built in other research fields and have made a significant contribution to the research in those fields. The field of musical information processing, however, has lacked a commonly available music database. We therefore built the RWC Music Database containing four original databases: Popular Music Database (100 pieces), Royalty-Free Music Database (15 pieces), Classical Music Database (50 pieces), and Jazz Music Database (50 pieces). These databases enable researchers to compare and evaluate various methods by using them as a common benchmark. We also expect that they will accelerate the progress of various forms of research that use statistical methods. In addition, researchers can use the databases for research publication and presentation without copyright restrictions. The music compact discs of these databases are now available in Japan at a cost equal to only duplication, ship- ping, and handling charges (virtually for free), and we plan to make them available outside Japan in 2003. We hope that our database will encourage further advances in musical information processing research.
[Abstract2]The success of information retrieval depends heavily on the quality of data input into them. Musical scores, as a complex visual format with small details, are particularly difficult to digitally capture and deliver well. Virtually all capture decisions should be made with a clear idea of the purpose of the resulting digital images. Master images must be flexible enough to fulfill unanticipated future uses as well. In order to provide a framework for decision-making in musical score capture projects, best practices for detail and color capture are presented for creating an archival image containing all relevant data from the print source, based on commonly defined purposes of digital capture. Options and recommendations for file formats for archival storage, web delivery and printing of musical materials are presented.
[Abstract3]Opuscope is an initiative targeted at sharing musical corpora and their analyses between researchers. The Opuscope repository will contain musical corpora of high quality which can be annotated with hand-made or algorithmic musical analyses. So, analytical results obtained by others can be used as a starting point for one's own investigations. Experiments performed on Opuscope corpora can easily be compared to other approaches, since an unequivocal mechanism for describing a certain corpus will be provided.
[Abstract4]We study the use of content-based techniques to form playlists from a given seed song. Our techniques use as a basis our previously presented audio similarity measure. This measure compares songs according to the novelty of their frequency spectrum and has been shown to have good performance on a non-trivial database. In this paper we investigate extensions to this basic technique. Specifically, we study playlists formed by trajectories through the distance space and playlists formed using automatic relevance feedback. We report results on a database of over 8000 songs. We find that when information about the songs’ genre is added, improvements over the basic distance measure are obtained, suggesting both approaches are suitable for incorporating user input or labeling information if available.
[Abstract5]This paper analyzes a set of 161 music-related information requests posted to the rec.music.country.old-time newsgroup. These postings are categorized by the types of detail used to characterize the poster's information need, the type of music information requested, the intended use for the information, and additional social and contextual elements present in the postings. The results of this analysis suggest that similar studies of 'native' music information requests can be used to inform the design of effective, usable music information retrieval interfaces.
[Abstract6]This article describes MIR research in Carnatic music (from southern India), which is characterised by aural transmission and improvisation. These features have profound implications for the relative importance and accessibility of different forms of music information available and for the indigenous attitude towards dissemination of Carnatic music information. Following Smiraglia`s  methodology, the author identifies the crucial MIR problems in designing an information resource in this music for Western users as (i) understanding the indigenous view of the music and (ii) embedding this understanding in the organisation of the information resource. The indigenous representation of raga is summarised and illustrated by sample WAV files and their more detailed analysis, which are downloadable from the author`s home Web page. The relationship of raga to compositions and consequently the relationship of improvised performance to cultural and social meanings is also explained. The author then details the issues arising in the embedding of this representation in the organisation of an information resource. Colleagues' views (e.g. on auditory quality and technical feasibility) and participation (e.g. in tool sharing and experimental digital audio editing) are sought.
[Abstract7]We claim that the core mechanism of a sufficiently general MIR system should be expressed in symbolic terms. We defend the idea that music database should be pre-analyzed before being scanned for MIR queries. We suggest a new vision of automated pattern analysis that generalizes the multiple viewpoint approach by adding a new paradigm based on analogy and temporal approach of musical scores. Through a chronological scanning of the score, analogies are inferred between local relationships -- namely, notes and intervals -- and global structures -- namely, patterns -- whose paradigms are stored inside an abstract pattern tree (APT). Basic mechanisms for inference of new patterns are described and illustrated. The same pattern-matching algorithm used for pattern discovery during pre-analysis of musical works is reused during MIR applications. Such an elastic vision of music enables a generalized understanding of its plastic expression. This project, in an early stage, introduces a broader paradigm of automated music analysis.
[Abstract8]We present a methods for characterizing both the rhythm and tempo of music. We also present ways to quantitatively measure the rhythmic similarity between two or more works of music. This allows rhythmically similar works to be retrieved from a large collection. A related application is to sequence music by rhythmic similarity, thus providing an automatic "disc jockey" function for musical libraries. Besides specific analysis and retrieval methods, we present small-scale experiments that demonstrate ranking and retrieving musical audio by rhythmic similarity.
[Abstract9]A system is described which segments musical signals according to the presence or absence of drum instruments. Two different yet approximately equally accurate approaches were taken to solve the problem. The first is based on periodicity detection in the amplitude envelopes of the signal at subbands. The band-wise periodicity estimates are aggregated into a summary autocorrelation function, the characteristics of which reveal the drums. The other mechanism applies straightforward acoustic pattern recognition approach with mel-frequency cepstrum coefficients as features and a Gaussian mixture model classifier. The integrated system achieves 88% correct segmentation over a database of 28 hours of music from different musical genres. For the both methods, errors occur for borderline cases with soft percussive-like drum accompaniment, or transient-like instrumentation without drums.
[Abstract10]Optical Music Recognition is the process of converting a graphical representation of music (such as sheet music) into a symbolic format of use to music software. Music notation is rich in structural information, and the relative positions of objects can often help to identify them. When objects are unidentified or mis-identified, many current systems "coerce" the set of objects into some semantic representation, for example by modifying the detected durations. This could cause correctly identified symbols to be modified. The knowledge that the current set of identified symbols can not be semantically parsed could instead be used to re-examine some of the symbols before deciding whether or not the classification is correct. This paper describes work in progress involving the use of feedback between the various phases of the optical music recognition process to automatically correct mistakes, such as symbolic classification errors or mis-detected staff systems.
[Abstract11]The M-MIMOR approach presented here makes productive use of the multidimensionality of music retrieval. It integrates heterogeneous poly-representation into a self adapting system. The different perspectives of users can be expressed by relevance feedback and serve as direction for a learning process which ultimately leads to an optimal solution for a user within a certain context. The paper explores the diversity within music retrieval stemming from an abundance of approaches for representing musical objects and searching for similarity. As a result, the system designer is usually confronted with a large number of arbitrary decisions. These challenges are discussed within the M-MIMOR framework which provides an appropriate solution. A fusion with linear combination guarantees that every perspective is integrated. The weight and therefore the strength of one perspective is reflected by the weight of the representation scheme or matching algorithm in the fusion. These weights are adapted according to their success in previous retrieval tasks.
[Abstract12]This paper compares the relative ease of creating a useful quantization of time from linear and log2 representations. The quantization is created by mapping these timing representations onto different size alphabets and studying the ability of a simple string-matcher to differentiate between themes in a melodic corpus when different representations are used. The results indicate that time is better represented by a logarithmic scale than a linear one. We also compare the merits of representing timing between events as Inter Onset Intervals (IOIs) and that taking the ratio of adjacent IOI values, looking at the kind of information preserved by each and the kinds of variation each minimizes.
[Abstract13]In this paper, we study transposition-invariant content-based music retrieval (TI-CBMR) in polyphonic music. The aim is to find transposition invariant occurrences of a given query pattern called a template, in a database of polyphonic music called a dataset. Between the musical events (represented by points) in the dataset that have been found to match points in the template, there may be any finite number of other intervening musical events. For this task, we introduce an algorithm, called SIA(M)ESE, which is based on the SIA pattern induction algorithm. The algorithm is first introduced in abstract mathematical form, then we show how we have implemented it using sophisticated techniques and equipped it with appropriate heuristics. The resulting efficient algorithm has a worst case running time of O(mn log(mn)), where m and n are the size of the template and the dataset, respectively. Moreover, the algorithm is generalizable to any arbitrary, multidimensional translation invariant pattern matching problem, where the events considered can be represented by points in a multidimensional dataset.
[Abstract14]The audio processing and post-processing of singing hold a fundamental role in the context of query-by-humming applications. Through the analysis of a sung query, we should perform some kind of meta-information extraction and this topic deserves the interest of the present paper. Some considerations are presented aiming to give a systematic view to a number of issues related to the transcription of singing into music. A critical review of previous approaches and findings is followed with novel experimental results. Starting from the similarities between speech sounds and sung notes, the peculiar facets of singing voices are introduced and analyzed in accordance with three different directions: extraction of a microintonation contour (or pitch contour at frame level), note estimation and study of singing accuracy. A segmentation algorithm has been developed combining the Spectral Flatness Measure with pitch and envelope information. A practical implementation for smoothing raw output from pitch tracking and a rule-based schema for reducing the pitch contour to a sequence of note-duration pairs are illustrated. Finally, we report an experiment on the deviations from pure tone intonation in performances of untrained singers.
[Abstract15]The use of audio queries for searching multimedia content has increased rapidly with the rise of music information retrieval; there are now many Internet-accessible systems that take audio queries as input. However, testing the robustness of such a system can be a large part of the development process. A corpus of audio queries would aid researchers in the development of both audio signal processing techniques and audio query systems. Such a corpus would also be of use for making empirical comparisons between different systems and methods. We propose the creation of a set of audio queries taken from attendees of the ISMIR 2002 Conference that would be made readily available to MIR researchers.
[Abstract16]One of the problems encountered in music transcription is to produce an algorithm that detects whether a note should be repeated, when a new onset is found during its duration, or not; with other words whether two or more shorter notes should be produced instead of a single longer note. The paper describes our approach to solving this problem, implemented within our system for transcription of piano music. The approach is based on a multilayer perceptron neural network, trained to recognize repeated notes. We compare this method to a more naive method that tracks the amplitude of the first partial of each note and also present performance statistics of our system on transcriptions of several real piano recordings.
[Abstract17]Electronic music distribution, the internet success of MP3 and the actual activities concerning the semantic web of music require for convenient music information retrieval, resp. question-answering systems. In this paper we will give an overview about the concepts behind our "super-convenience" approach for MIR. By using natural language as input for human-oriented queries to large-scale music collections we were able to address the needs of non-musicians. The entire system is applicable for future semantic web services, existing music web-sites and future electronic devices such as cd-chargers for cars, or PDAs. It is a full-fledged architecture combining state-of-the-art approaches from different research disciplines. We customized in a cross-discipline approach techniques from natural language understanding phonetic matching, automatic analysis of audio for meta tag construction, content-based classification and music ontologies as a backbone for the representation of musical knowledge. Beside the basic framework we present a novel idea to incorporate the processing of lyrics based on standard information retrieval methods, i.e the vector space model. This work has been performed at the German Research Center for AI and the authors spin-off company -- sonicson -- specialized in music web services.
[Abstract18]In this article, a heuristic version of Multidimensional Scaling named FastMap, is used for audio retrieval and browsing. FastMap, like MDS, maps objects into an Euclidean space, such that similarities are preserved. In addition of being more efficient than MDS it allows query-by-example type of query, which makes it suitable for a content-based retrieval purposes.
[Abstract19]The increasing availability of digital music has created a greater need for methods to organize large collections of music. The eXtensible PlayList (XPL) representation allows users to express playlists with varying degrees of specificity. XPL handles references to exact files or URLs as well as rules for selecting content based on metadata constraints. XPL also allows the transitions between tracks in a playlist to be specified. This paper describes the features of XPL, a system for rendering XPL specifications and use of an advanced XPL renderer in an existing application.
[Abstract20]Current XML encoding systems for music focus almost exclusively on western music from the 17th century onwards, and on the western notation system. In order to ensure that music information retrieval (MIR) systems have full theoretical generality, and wide practical application, we have begun a project to explore the representation, in XML, of a genre of traditional Korean music which has a distinctive notation system (Chôngganbo). Our project takes seriously the specific notational expression of musical intention and intends to ultimately contribute to the analysis of theoretical issues in music representation, as well as to the improvement of methods for representing Korean music specifically. The present paper is an introduction to the music and its notation, and to our exploratory XML representation system.
[Abstract21]Singing is the characteristic vocal part in popular music, retrieval by singing with lyrics is a natural way for popular music. Unlike some music retrieval systems, which used melody contour to represent music and string matching to retrieval music, we match acoustic singing input directly with vocal part of popular music, it seems very difficult to exactly matching of them, while, they are represented by self-similarity sequence to eliminate error propagation. Our approach deals with raw audio music in WAV, independent Component Analysis (ICA) is employed to separate singing from the accompaniment, we use AbstractCCs as the features to calculate self-similarity sequence, the weights of recurrent neural network are used as indices on music database, retrieval list is generated by correlation degree.
[Abstract22]A melody recognition system with a voice-only user interface is presented in this paper. By integrating speech recognition and music recognition technology we have built an end-to-end melody recognition system that allows voice controlled melodic queries and melody generation using a dial-in service with a mobile phone. In this paper we present the system behind the service, report user evaluation results and consider the strengths and weaknesses of such service.