NPEST: a nonparametric method and a database for transcription start site prediction

Tatiana Tatarinova, Alona Kryshchenko, Martin Triska, Mehedi Hassan, Denis Murphy, Michael Neely, Alan Schumitzky

Research output: Contribution to journalArticlepeer-review


In this paper we present NPEST, a novel tool for the analysis of expressed sequence tags (EST) distributions and transcription start site (TSS) prediction. This method estimates an unknown probability distribution of ESTs using a maximum likelihood (ML) approach, which is then used to predict positions of TSS. Accurate identification of TSS is an important genomics task, since the position of regulatory elements with respect to the TSS can have large effects on gene regulation, and performance of promoter motif-finding methods depends on correct identification of TSSs. Our probabilistic approach expands recognition capabilities to multiple TSS per locus that may be a useful tool to enhance the understanding of alternative splicing mechanisms. This paper presents analysis of simulated data as well as statistical analysis of promoter regions of a model dicot plant Arabidopsis thaliana. Using our statistical tool we analyzed 16520 loci and developed a database of TSS, which is now publicly available at

Original languageEnglish
Pages (from-to)261-271
Number of pages11
JournalQuantitative Biology
Issue number4
Publication statusPublished - Dec 2013


  • transcription start site
  • TSS
  • nonparametric maximumlikelihood


Dive into the research topics of 'NPEST: a nonparametric method and a database for transcription start site prediction'. Together they form a unique fingerprint.

Cite this