In this thesis we approach the task of audio identification via audio fingerprinting, with the special emphasis on the complex task of designing a system that is highly robust to various signal modifications. We build a system that can account for linear and non-linear time-stretching, pitch-shifting and speed changes of query audio excerpts, as well as for severe noise distortions. We motivate the design of yet another fingerprinting method to complement the rich number of proposed methods and research in this field. In this thesis we propose a novel, efficient, highly accurate and precise fingerprinting method that works on geometric hashes of local maxima of the spectrogram representation of audio signals. We propose to perform the matching of features using efficient range-search, and to subsequently integrate a verification stage for match hypotheses to maintain high precision and specificity on challenging datasets. We gradually refine this method from its early concept to a practically applicable system that is evaluated on queries against a database of 430000 tracks, with a total duration of 3.37 years of audio content.
Our proposed method is the first in the academic literature that is shown to be able to cope with severe signal modifications while being applicable to large reference collections. This claim is supported via rich evaluation on manually crafted data that is modified in the range of +-30% in speed, time-stretching and pitch scale modifications. We further evaluate the system on noise-distorted queries, and show the influence of various parameters on the resulting identification performance and processing run times. We identify the task of DJ mix monitoring to be one of the most challenging application areas for audio fingerprinting, due to the vast amount of signal modifications that can be introduced by performers. We observe that the identification performance of systems can suffer tremendously when applied to DJ mixes, much more so than on manually crafted evaluation datasets, since it is hard to create test cases that cover the variety of modifications that can be encountered in DJ-mixes. To close this gap in evaluation methodology, we manually compile and annotate a free dataset of DJ mixes to support the research community in investigating and evaluating particular strengths and weaknesses of proposed systems. In this thesis we make use of this dataset for extensive evaluation of our method. Finally, we show the possibility of building a sequence detection program on top of the fingerprinter, to enable the monitoring of long query recordings for either interactive analysis or fully automated result reporting.