Rätsch, G.Hepp, B.Schulze, U.Ong, C. S.Huson, DanielKohlbacher, OliverLupas, AndreiNieselt, KayZell, Andreas2019-08-122019-08-122006978-3-88579-177-5https://dl.gi.de/handle/20.500.12116/24204Despite many years of research on how to properly align sequences in the presence of sequencing errors, alternative splicing and micro-exons, the correct alignment of mRNA sequences to genomic DNA is still a challenging task. We present a novel approach based on large margin learning that combines kernel based splice site predictions with common sequence alignment techniques. By solving a convex optimization problem, our algorithm – called PALMA – tunes the parameters of the model such that the true alignment scores higher than all other alignments. In an experimental study on the alignments of mRNAs containing artificially generated micro-exons, we show that our algorithm drastically outperforms all other methods: It perfectly aligns all 4358 sequences on an hold-out set, while the best other method misaligns at least 90 of them. Moreover, our algorithm is very robust against noise in the query sequence: when deleting, inserting, or mutating up to 50% of the query sequence, it still aligns 95% of all sequences correctly, while other methods achieve less than 36% accuracy. For datasets, additional results and a stand-alone alignment tool see http://www.fml.mpg.de/raetsch/projects/palma.enPALMA: Perfect alignments using large margin algorithmsText/Conference Paper1617-5468