This thesis is about the automatic detection and classification of sound events (e.g. notes or percussive sounds) in musical audio. It deals with four different sub-aspects, namely (i) the detection of the timing of these events (onset detection), (ii) their position inside the metrical grid (beat and downbeat tracking), (iii) the estimation of the dominant periodicity (tempo estimation), as well as (iv) identifying the frequency of the played notes (note onset transcription). Historically, beat tracking, tempo estimation, and note transcription systems were built upon onset detection algorithms. Most of them incorporated hand-crafted features, designed specifically for the given task, certain sounds or music styles. Unlike previous approaches, we avoid hand-crafted features almost entirely, but rather learn them directly from audio. We present several algorithms addressing the before mentioned tasks to detect and classify the sound events. All proposed methods perform state-of-the-art in their respective field over a wide range of sounds and music styles, and show the superiority of learned features both in regard to overall performance as well as generalisation capabilities. Reference implementations of the algorithms developed in this thesis are released as an open-source audio processing and music information retrieval (MIR) library written in Python. Additionally, we make the data used to develop and train the algorithms publicly available, stimulating further research and development in this area.