Friday, July 25, 2008

BiRG minutes: 7-18-2008

Comparing Two Sequences

 Objectives:

          Get the basics about dot plots

          Know how to interpret the most common patterns in a dot plot

          Use Dotlet

          Use Lalign to extract local alignments

 
Why compare two?

          Database searches are useful for finding homologues

          Database searches don't provide precise comparisons

          More precise tools are needed to analyze the sequences in detail including

        Dot plots for graphic analysis

        Local or global alignments for residue/residue analysis

          The alignment of two sequences is called a pairwise alignment

 
Dot Plot:

          A dot plot is a graphic representation of pairwise similarity

          The simplicity of dot plots prevents artifacts

          Ideal for looking for features that may come in different orders

          Reveal complex patterns

          Benefit from the most sophisticated statistical-analysis tool in the universe . . . your brain

 
Choosing your two sequences:

          Making pairwise comparisons takes time

          Use BLAST to rapidly select your sequences

        More than 70% identity for DNA

        More than 25% identity for proteins

          If your sequences are too similar, comparing them yields no useful information

 What can you analyze with Dot Plot?

          Any pair of sequences

        DNA

        Proteins

        RNA

          DNA with proteins

        Dotlet is an appropriate tool

        To compare full genomes, install the program locally

          Sequences longer than 1000 symbols are hard to analyze online

           Divergent sequences where only a segment is homologous

          Long insertions and deletions

          Tandem repeats

The square shape of the pattern is characteristic of these repeats

Using Dotlet:

          Dotlet is one of the handiest tools for making dot plots

          Dotlet is a Java applet

          Open and download the applet at the following site:

        www.isrec.isb-sib.ch/java/dotlet

          Dotlet slides a window along each sequence

          If the windows are more similar than the threshold, Dotlet prints a dot at their intersection

          You can control the similarity threshold with the little window on the left

          Every dot has a score given by the window comparison

          When the score is

        Below threshold 1                           ó black dot

        Between thresholds 1 and 2       ó grey dot

        Above threshold 2                           ó white dot

          The blue curve is the distribution of scores in the sequences

          The peak ó most common score,

        Most common ó less informative

          Window size and the stringency control the aspect of your dot plot

        Very stringent = clean dot plot, little signal

        Not stringent enough = noisy dot plot, too much signal

          Play with the threshold until a usable signal appears

 
          The square shape is typical of tandem repeats

          The repeats are not perfect because the sequences have diverged after their duplication

Comparing a Gene and its Product:

          Eukaryotic genes are transcribed into RNA

          The RNA is then spliced to remove the introns' sequences

          It may be necessary to compare the gene and its product

          Dotlet makes this comparative analysis easy

 
Aligning Sequences:

          Dotlet dot plots are a good way to provide an overview

          Dot plots don't provide residue/residue analysis

          For this analysis you need an alignment

          The most convenient tool for making precise local alignments is Lalign

 Lalign and BLAST:

          Lalign is like a very precise BLAST

          It works on only two sequences at a time

          You must provide both sequences

 
Going Farther:

          If you need to align coding DNA with a protein, try these sites:

        www.tcoffee.org => protogene

        coot.embl.de/pal2nal

          If you need to align very large sequences, try this site:

        www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi

          If you need a precise estimate of your alignment's statistical significance, use PRSS

        The program is available at fasta.bioch.virginia.edu

        Low E-value ó good alignment

No comments:

IU News: Science

IU News: Technology