Discriminative seeding DNA motif discovery  

Welcome to the Seeder website!


Seeder is an exact discriminative seeding DNA motif discovery algorithm designed for fast and reliable prediction of cis-regulatory elements in eukaryotic promoters.

The algorithm starts by enumerating all words of a given length. For each word, it calculates the Hamming distance (HD) between the word and its best matching subsequence (we call this distance the substring minimal distance-SMD) in each sequence of a background set. This data is used to produce a word-specific background probability distribution for the SMD. For each word, it then calculates the sum of SMDs to sequences in a positive set. The P-value for this sum is calculated using the word-specific background probability distribution. The word for which the P-value is minimal is retained, and a seed PWM is built from the closest matches to this word found in every positive sequence. The seed PWM is extended to full motif width and sites maximizing the score to the extended PWM are selected, one in each positive sequence. A new PWM is built from those sites and the process is iterated until convergence, or a maximum number of iterations is reached.

Key features of the algorithm:
 • The enumerative-guaranteed optimality of seed selection;
 • A background model based on empirical distribution of SMDs;
 • Efficient data structures that make computations relatively fast;

Website design and mastering
François Fauteux © 2008-2009