"Everything should be made as simple as possible, but not simpler." - Albert Einstein

Seminar Announcement

Predicting Motifs Using Multiple Species Regression Models

Katerina Kechris , Biostatistics & Informatics, University of Colorado, Denver

Monday, March 28, 2011

4:00 p.m., room 223, Weber Bldg

ABSTRACT

Transcription factors (TF) are important for regulating gene expression. By binding to their recognition sites, TF can help activate or repress gene activity. Transcription factor binding sites (TFBS) are located within the genome but are difficult to predict computationally because they are relatively short sequence patterns buried in long genomic regions. Earlier methods for identifying TFBS motifs incorporated genome-wide expression data and genomic sequences into a linear model framework, regressing values of gene expression onto counts of putative TFBSs in sequences for a single species. More recently, the growing availability of both genomic sequences and
expression data from multiple species makes it possible to explore the use of multivariate regression models for TFBS motif prediction. We have developed methods to expand the search space to both sequence and expression information from multiple species and to incorporate the evolutionary relationships among species. Using data from yeast, we show that the multiple-species methods result in an improvement in the prediction of TFBS over the single species method based on several evaluation criteria.