Conference program
Detailed
program
Keynote speakers
Accepted posters/short papers Hosted bySponsors and partners
|
ISMIR
2002
|
The
ISMIR 2002 Web pages will be regularly updated
|
[Abstract1]This article describes MIR research in Carnatic
music (from southern India), which is characterised by aural transmission and
improvisation. These features have profound implications for the relative importance
and accessibility of different forms of music information available and for the
indigenous attitude towards dissemination of Carnatic music information.
Following Smiraglia`s [2001] methodology, the author identifies the crucial MIR
problems in designing an information resource in this music for Western users
as (i) understanding the indigenous view of the music and (ii) embedding this
understanding in the organisation of the information resource. The indigenous
representation of raga is summarised and illustrated by sample WAV files and
their more detailed analysis, which are downloadable from the author`s home Web
page. The relationship of raga to compositions and consequently the
relationship of improvised performance to cultural and social meanings is also
explained. The author then details the issues arising in the embedding of this
representation in the organisation of an information resource. Colleagues'
views (e.g. on auditory quality and technical feasibility) and participation
(e.g. in tool sharing and experimental digital audio editing) are sought.
[Abstract2]Optical Music Recognition is the process of
converting a graphical representation of music (such as sheet music) into a
symbolic format of use to music software. Music notation is rich in structural
information, and the relative positions of objects can often help to identify
them. When objects are unidentified or mis-identified, many current systems
"coerce" the set of objects into some semantic representation, for example
by modifying the detected durations. This could cause correctly identified
symbols to be modified. The knowledge that the current set of identified
symbols can not be semantically parsed could instead be used to re-examine some
of the symbols before deciding whether or not the classification is correct.
This paper describes work in progress involving the use of feedback between the
various phases of the optical music recognition process to automatically
correct mistakes, such as symbolic classification errors or mis-detected staff
systems.
[Abstract3]The success of information retrieval depends
heavily on the quality of data input into them. Musical scores, as a complex visual
format with small details, are particularly difficult to digitally capture and
deliver well. Virtually all capture decisions should be made with a clear idea
of the purpose of the resulting digital images. Master images must be flexible
enough to fulfill unanticipated future uses as well. In order to provide a
framework for decision-making in musical score capture projects, best practices
for detail and color capture are presented for creating an archival image
containing all relevant data from the print source, based on commonly defined
purposes of digital capture. Options and recommendations for file formats for
archival storage, web delivery and printing of musical materials are presented.
[Abstract4]Current XML encoding systems for music focus
almost exclusively on western music from the 17th century onwards, and on the
western notation system. In order to ensure that music information retrieval
(MIR) systems have full theoretical generality, and wide practical application,
we have begun a project to explore the representation, in XML, of a genre of
traditional Korean music which has a distinctive notation system (Chôngganbo).
Our project takes seriously the specific notational expression of musical
intention and intends to ultimately contribute to the analysis of theoretical
issues in music representation, as well as to the improvement of methods for
representing Korean music specifically. The present paper is an introduction to
the music and its notation, and to our exploratory XML representation system.
[Abstract5]We present a methods for characterizing both
the rhythm and tempo of music. We also present ways to quantitatively measure
the rhythmic similarity between two or more works of music. This allows rhythmically
similar works to be retrieved from a large collection. A related application is
to sequence music by rhythmic similarity, thus providing an automatic
"disc jockey" function for musical libraries. Besides specific
analysis and retrieval methods, we present small-scale experiments that
demonstrate ranking and retrieving musical audio by rhythmic similarity.
[Abstract6]This paper compares the relative ease of
creating a useful quantization of time from linear and log2 representations. The
quantization is created by mapping these timing representations onto different
size alphabets and studying the ability of a simple string-matcher to
differentiate between themes in a melodic corpus when different representations
are used. The results indicate that time is better represented by a logarithmic
scale than a linear one. We also compare the merits of representing timing
between events as Inter Onset Intervals (IOIs) and that taking the ratio of
adjacent IOI values, looking at the kind of information preserved by each and
the kinds of variation each minimizes.
[Abstract7]We claim that the core mechanism of a
sufficiently general MIR system should be expressed in symbolic terms. We defend
the idea that music database should be pre-analyzed before being scanned for
MIR queries. We suggest a new vision of automated pattern analysis that
generalizes the multiple viewpoint approach by adding a new paradigm based on
analogy and temporal approach of musical scores. Through a chronological
scanning of the score, analogies are inferred between local relationships --
namely, notes and intervals -- and global structures -- namely, patterns --
whose paradigms are stored inside an abstract pattern tree (APT). Basic
mechanisms for inference of new patterns are described and illustrated. The
same pattern-matching algorithm used for pattern discovery during pre-analysis
of musical works is reused during MIR applications. Such an elastic vision of music
enables a generalized understanding of its plastic expression. This project, in
an early stage, introduces a broader paradigm of automated music analysis.
[Abstract8]A system is described which segments musical
signals according to the presence or absence of drum instruments. Two different
yet approximately equally accurate approaches were taken to solve the problem.
The first is based on periodicity detection in the amplitude envelopes of the
signal at subbands. The band-wise periodicity estimates are aggregated into a
summary autocorrelation function, the characteristics of which reveal the
drums. The other mechanism applies straightforward acoustic pattern recognition
approach with mel-frequency cepstrum coefficients as features and a Gaussian
mixture model classifier. The integrated system achieves 88% correct
segmentation over a database of 28 hours of music from different musical
genres. For the both methods, errors occur for borderline cases with soft
percussive-like drum accompaniment, or transient-like instrumentation without
drums.
[Abstract9]One of the problems encountered in music
transcription is to produce an algorithm that detects whether a note should be
repeated, when a new onset is found during its duration, or not; with other
words whether two or more shorter notes should be produced instead of a single
longer note. The paper describes our approach to solving this problem,
implemented within our system for transcription of piano music. The approach is
based on a multilayer perceptron neural network, trained to recognize repeated
notes. We compare this method to a more naive method that tracks the amplitude
of the first partial of each note and also present performance statistics of
our system on transcriptions of several real piano recordings.
[Abstract10]In this article, a heuristic version
of Multidimensional Scaling named FastMap, is used for audio retrieval and
browsing. FastMap, like MDS, maps objects into an Euclidean space, such that
similarities are preserved. In addition of being more efficient than MDS it
allows query-by-example type of query, which makes it suitable for a
content-based retrieval purposes.
[Abstract11]The M-MIMOR approach presented here
makes productive use of the multidimensionality of music retrieval. It
integrates heterogeneous poly-representation into a self adapting system. The
different perspectives of users can be expressed by relevance feedback and
serve as direction for a learning process which ultimately leads to an optimal
solution for a user within a certain context. The paper explores the diversity
within music retrieval stemming from an abundance of approaches for
representing musical objects and searching for similarity. As a result, the
system designer is usually confronted with a large number of arbitrary
decisions. These challenges are discussed within the M-MIMOR framework which
provides an appropriate solution. A fusion with linear combination guarantees
that every perspective is integrated. The weight and therefore the strength of
one perspective is reflected by the weight of the representation scheme or
matching algorithm in the fusion. These weights are adapted according to their
success in previous retrieval tasks.
[Abstract12]A melody recognition system with a
voice-only user interface is presented in this paper. By integrating speech
recognition and music recognition technology we have built an end-to-end melody
recognition system that allows voice controlled melodic queries and melody
generation using a dial-in service with a mobile phone. In this paper we
present the system behind the service, report user evaluation results and
consider the strengths and weaknesses of such service.
[Abstract13]Singing is the characteristic vocal
part in popular music, retrieval by singing with lyrics is a natural way for
popular music. Unlike some music retrieval systems, which used melody contour
to represent music and string matching to retrieval music, we match acoustic
singing input directly with vocal part of popular music, it seems very
difficult to exactly matching of them, while, they are represented by
self-similarity sequence to eliminate error propagation. Our approach deals
with raw audio music in WAV, independent Component Analysis (ICA) is employed
to separate singing from the accompaniment, we use AbstractCCs as the features
to calculate self-similarity sequence, the weights of recurrent neural network
are used as indices on music database, retrieval list is generated by
correlation degree.
[Abstract14]In this paper, we study
transposition-invariant content-based music retrieval (TI-CBMR) in polyphonic
music. The aim is to find transposition invariant occurrences of a given query
pattern called a template, in a database of polyphonic music called a dataset.
Between the musical events (represented by points) in the dataset that have
been found to match points in the template, there may be any finite number of
other intervening musical events. For this task, we introduce an algorithm,
called SIA(M)ESE, which is based on the SIA pattern induction algorithm. The
algorithm is first introduced in abstract mathematical form, then we show how
we have implemented it using sophisticated techniques and equipped it with
appropriate heuristics. The resulting efficient algorithm has a worst case
running time of O(mn log(mn)), where m and n are
the size of the template and the dataset, respectively. Moreover, the algorithm
is generalizable to any arbitrary, multidimensional translation invariant
pattern matching problem, where the events considered can be represented by
points in a multidimensional dataset.
[Abstract15]The audio processing and
post-processing of singing hold a fundamental role in the context of
query-by-humming applications. Through the analysis of a sung query, we should
perform some kind of meta-information extraction and this topic deserves the
interest of the present paper. Some considerations are presented aiming to give
a systematic view to a number of issues related to the transcription of singing
into music. A critical review of previous approaches and findings is followed
with novel experimental results. Starting from the similarities between speech
sounds and sung notes, the peculiar facets of singing voices are introduced and
analyzed in accordance with three different directions: extraction of a
microintonation contour (or pitch contour at frame level), note estimation and
study of singing accuracy. A segmentation algorithm has been developed
combining the Spectral Flatness Measure with pitch and envelope information. A
practical implementation for smoothing raw output from pitch tracking and a
rule-based schema for reducing the pitch contour to a sequence of note-duration
pairs are illustrated. Finally, we report an experiment on the deviations from
pure tone intonation in performances of untrained singers.
[Abstract16]This paper describes the design policy
and specifications of the RWC Music Database, a music database that gives
researchers freedom of common use and research use. Various commonly available
databases have been built in other research fields and have made a significant
contribution to the research in those fields. The field of musical information
processing, however, has lacked a commonly available music database. We
therefore built the RWC Music Database containing four original databases:
Popular Music Database (100 pieces), Royalty-Free Music Database (15 pieces),
Classical Music Database (50 pieces), and Jazz Music Database (50 pieces).
These databases enable researchers to compare and evaluate various methods by
using them as a common benchmark. We also expect that they will accelerate the
progress of various forms of research that use statistical methods. In
addition, researchers can use the databases for research publication and
presentation without copyright restrictions. The music compact discs of these
databases are now available in Japan at a cost equal to only duplication, ship-
ping, and handling charges (virtually for free), and we plan to make them
available outside Japan in 2003. We hope that our database will encourage
further advances in musical information processing research.
[Abstract17]The use of audio queries for
searching multimedia content has increased rapidly with the rise of music
information retrieval; there are now many Internet-accessible systems that take
audio queries as input. However, testing the robustness of such a system can be
a large part of the development process. A corpus of audio queries would aid
researchers in the development of both audio signal processing techniques and
audio query systems. Such a corpus would also be of use for making empirical
comparisons between different systems and methods. We propose the creation of a
set of audio queries taken from attendees of the ISMIR 2002 Conference that
would be made readily available to MIR researchers.
[Abstract18]Opuscope is an initiative targeted
at sharing musical corpora and their analyses between researchers. The Opuscope
repository will contain musical corpora of high quality which can be annotated
with hand-made or algorithmic musical analyses. So, analytical results obtained
by others can be used as a starting point for one's own investigations.
Experiments performed on Opuscope corpora can easily be compared to other
approaches, since an unequivocal mechanism for describing a certain corpus will
be provided.
[Abstract19]The increasing availability of
digital music has created a greater need for methods to organize large
collections of music. The eXtensible PlayList (XPL) representation allows users
to express playlists with varying degrees of specificity. XPL handles
references to exact files or URLs as well as rules for selecting content based
on metadata constraints. XPL also allows the transitions between tracks in a
playlist to be specified. This paper describes the features of XPL, a system
for rendering XPL specifications and use of an advanced XPL renderer in an
existing application.
[Abstract20]We study
the use of content-based techniques to form playlists from a given seed song.
Our techniques use as a basis our previously presented audio similarity
measure. This measure compares songs according to the novelty of their
frequency spectrum and has been shown to have good performance on a non-trivial
database. In this paper we investigate extensions to this basic technique.
Specifically, we study playlists formed by trajectories through the distance
space and playlists formed using automatic relevance feedback. We report
results on a database of over 8000 songs. We find that when information about
the songs’ genre is added, improvements over the basic distance measure are
obtained, suggesting both approaches are suitable for incorporating user input
or labeling information if available.
[Abstract21]Electronic music distribution, the
internet success of MP3 and the actual activities concerning the semantic web
of music require for convenient music information retrieval, resp.
question-answering systems. In this paper we will give an overview about the
concepts behind our "super-convenience" approach for MIR. By using
natural language as input for human-oriented queries to large-scale music
collections we were able to address the needs of non-musicians. The entire
system is applicable for future semantic web services, existing music web-sites
and future electronic devices such as cd-chargers for cars, or PDAs. It is a
full-fledged architecture combining state-of-the-art approaches from different
research disciplines. We customized in a cross-discipline approach techniques
from natural language understanding phonetic matching, automatic analysis of
audio for meta tag construction, content-based classification and music
ontologies as a backbone for the representation of musical knowledge. Beside
the basic framework we present a novel idea to incorporate the processing of
lyrics based on standard information retrieval methods, i.e the vector space
model. This work has been performed at the German Research Center for AI and
the authors spin-off company -- sonicson -- specialized in music web services.
[Abstract22]This paper analyzes a set of 161
music-related information requests posted to the rec.music.country.old-time
newsgroup. These postings are categorized by the types of detail used to
characterize the poster's information need, the type of music information
requested, the intended use for the information, and additional social and
contextual elements present in the postings. The results of this analysis
suggest that similar studies of 'native' music information requests can be used
to inform the design of effective, usable music information retrieval
interfaces.