Due to the exploding amount of available music in recent years, media collections cannot be managed manually any more, which makes automatic audio analysis crucial for content-based search, organisation, and processing of data. This thesis focuses on the automatic extraction of a metrical grid, determined by beats, downbeats, and time signature, from a music piece. I propose several algorithms to tackle this problem, all comprising three stages: First, (low-level) features are extracted from the audio signal. Second, an acoustic model transfers these features into probabilities in the music domain. Third, a probabilistic sequence model finds the most probable sequence of labels under the model assumptions. This thesis provides contributions to the second and third stage. I (i) explore acoustic models based on machine learning methods, and (ii) develop models and algorithms for efficient probabilistic inference for both online and offline scenarios. Further, I design applications such as an automatic drummer which listens to and accompanies a musician in a live setting. The most recent algorithms developed in this thesis exhibit state-of-the-art performance and clearly demonstrate the superiority of systems incorporating machine learning over hand-designed systems, which were prevalent at the time of starting this thesis. All algorithms developed in this thesis are publicly available as open-source software. I also publish beat and downbeat annotations for the Ballroom dataset to foster further research in this area.