Thursday, November 12, 2009

Bioinformatics:FASTA format:

ADVERTISEMENTS

Bioinformatics:General Bioinformatics:FASTA format:

FASTA is the name of the most popular and used Alignement and Databases scanning program, created by W.R. Pearson and D.J. Lipman in 1988.

The sequences used by FASTA have to be like this:

>My_Sequence_Name
MKLWLVIVISVFSFVIMGTGVVQKAEAELSEEGRQVAKPNAEAQLSGEEYKKANVVQKVEAEL

And that is the general format, the first line have the ">" at the beginning followed by a difinition of your protein or DNA sequencs.
The second line is where begins your protein or DNA sequence.

Notes:
* The first line can contain informations like:
1- Database name like sp which means SwissProt.
2- Database accession number like (Q3LGA9) .
3- Protein or DNA sequence name.
4- Organism for example Homo sapiens...........etc
* The sequences use one capital letter codes, then the software begins to scan the second line after the first that contains the ">" sign until the end of the sequence (it there is only one sequence inthe file).

The FASTA format is the default sequences format because its easy to parse, thats why most of analysis Softwares uses FASTA format like BLAST, CLUSTALW.
Some programs uses the RAW format which is FASTA format without the first line (definition line).

Any questions? you're welcome.

0 comments:

Post a Comment