Quantitative comparison of genomic-wide protein domain distributions

Parikesit, Arli A.Stadler, Peter F.Prohaska, Sonja J.Schomburg, DietmarGrote, Andreas2019-01-172019-01-172010978-3-88579-267-3https://dl.gi.de/handle/20.500.12116/19676Investigations into the origins and evolution of regulatory mechanisms require quantitative estimates of the abundance and co-occurrence of functional protein domains among distantly related genomes. Currently available databases, such as the SUPERFAMILY, are not designed for quantitative comparisons since they are built upon transcript and protein annotations provided by the various different genome annotation projects. Large biases are introduced by the differences in genome annotation protocols, which strongly depend on the availability of transcript information and well-annotated closely related organisms. Here we show that the combination of de novo gene predictors and subsequent HMM-based annotation of SCOP domains in the predicted peptides leads to consistent estimates with acceptable accuracy that in particular can be utilized for systematic studies of the evolution of protein domain occurrences and co-occurrences. As an application, we considered four major classes of DNA binding domains: zink-finger, leucine-zipper, winged-helix, and HMG-box. We found that different types of DNA binding domains systematically avoid each other throughout the evolution of Eukarya. In contrast, DNA binding domains belonging to the same superfamily readily co-occur in the same protein.enQuantitative comparison of genomic-wide protein domain distributionsText/Conference Paper1617-5468