The rise of Next Generation Sequencing (NGS) techniques has enabled the production of large amounts of sequencing data in shorter time and with lower costs than previously. However, equally powerful bioinformatic tools are needed to analyze the data in order to fully exploit the information that is encoded in the sequenced DNA. Segments of DNA, that are identical by descent (IBD) in two or more individuals because they were inherited from a common ancestor, can be used to uncover relationships from Neandertals to present day families. In this thesis the recently developed IBD detection methods HapFABIA and HapRFN were applied to whole genome sequencing (WGS) data from the 1000 Genomes Project to uncover relationships between and within populations as well as with Neandertals and Denisovans. We extracted two types of very old IBD segments that are shared with Neandertals/Denisovans: (1) longer segments primarily found in East Asians, South Asians, and Europeans that confirm already reported introgression events outside of Africa; (2) shorter segments mainly shared by Africans that may indicate events involving ancestors of humans and other ancient hominins within Africa.
In clinical diagnostics, NGS techniques, especially targeted NGS panels, have largely replaced Sanger sequencing for the detection of single-nucleotide variants and small insertions/deletions. However, for the detection of copy-number variations (CNVs), previous computational methods had shortcomings regarding accuracy, quality control (QC), incidental findings, and user-friendliness. With the aim to address all these shortcomings, panelcn.MOPS was developed as part of this thesis. panelcn.MOPS is built upon the successful cn.MOPS model, which was adapted for targeted NGS panel data and especially for the usage in a clinical diagnostic setting. In addition to an increase in sensitivity, the extension includes the implementation of QC criteria for samples and genetic regions of interest (ROIs) and a filter for user-selected genes to avoid incidental findings. Furthermore, panelcn.MOPS was made freely available as R package and standalone software with graphical user interface that is easy to use for clinical geneticists without any programming experience. This thesis demonstrates the value of bioinformatics, and especially of machine learning methods, not only for gaining new insights into human history, but also for facilitating routine clinical genetic diagnostics.