Nottingham

Undergraduate Bioinformatics Practical

Do not close this window during the session or your answers will be lost

Sequence Comparison

A basic introduction to BLAST as a tool for sequence comparison, including the various concepts of homology and types of database available.


The demonstrator will give you a brief description of the concepts behind sequence similarity comparisons and the meaning of homology. This will include a definition of orthologues & paralogues, and the reason why the description ' % homology ' is a meaningless concept.

IMPORTANT: READ ALL OF THE INSTRUCTIONS
Virtual Experiment

You have experimentally sequenced a random clone from a cDNA library of Arabidopsis. Please find potential candidates for these genes using blastn


Protocol for gene 1:
  • Go to The blast server at NASC
  • DO NOT change any of the choices.
  • Read all of the text
  • Note the differences between the blast programs

Now - put the following DNA sequence into the appropriate box and perform a blast search

GAAGCATACTGTGACATGTTGGTTAAATATCGTGAGGAGCTAA
CAAGGCCCATTCAGGAAGCAATGGAGTTTATACGTCGTATTGA
ATCTCAGCTTAGCATGTTGTGTCAGAGTCCCATTCACATCCTCA

Wait for a match to appear

What is the chromosome position of the top match ?
Type it here:

What is its 'E-value'? :

What is the chromosome position of the next highest match ?
Type it here:

What is its 'E-value'? :

'Mouse over' the best match on the chromosome view.
Use the dropdown for the best match to see the ContigView (context).

What is the title of the gene ? :


SO what is an E-value and what does Blast do ?
Please find some answers here

Now, knowing the procedure please repeat for this sequence:

GTAGGCTGTAACGCTTTATCACTTCTGGACACTTTTGGAATGCAAAACTACTCAACTGC
ATGCTTGTCATTATGCGATTCTCCCCCAGAGGCTGATGGAGAATGTAATGGTAGAGGTT
What is the chromosome position of the top match ?:

What is its 'E-value'? :

What is the chromosome position of the next highest match ?

What is its 'E-value'? :

In the contigview, what is the title of the gene ? :


Protocol for part two:
  • Please - go to The EBI
    Note, the default setting for program is different to the one you used above.
    Change the program to be the same as the default that you used for DNA
  • blastn the sequence of the gene(s) (from above) again.
Gene 1: EMBL accession number using EBI

Gene 1: E-value using EBI

Gene 2: EMBL accession number using EBI

Gene 2: E-value using EBI

Are the answers the same when comparing EBI and AtEnsembl (above) ? Would you expect this ? Why ?


Please answer the following questions (briefly) using the help files at any of the links above (or Google):

What is blastp, how does it differ to blastn ?

What is blastX ?

What is TblastX ?


With the following protein sequence, please find a match using blastp at EBI

  • QHMLFPHMSSLLPQTTENCF
IF the search fails, reconsider the database that you are searching and how BLASTp differs from BLASTn. Retry the search.

What is the name and proposed gene function that is conserved between the best matches ?