|
Description
|
|
•
|
FASTA is a plaintext format for storing protein or nucleic acid (DNA or RNA) data as character sequences. It is a popular interchange format for molecular biology software.
|
|
|
Details on the FASTA format
|
|
•
|
The FASTA format employs the following standard IUB/IUPAC conventions for encoding protein or nucleic acid sequences as alphabetic characters.
|
•
|
In addition to codes specifying particular nucleic acids or amino acids, the convention supports codes for ambiguous sequences where a position may be occupied by more than one possible nucleic acid or amino acid. For example the code R matches either adenine (A) or guanine (G).
|
|
Table 1: Nucleic Acid Codes
|
|
|
|
Code
|
Meaning
|
Description
|
Code
|
Meaning
|
Description
|
A
|
A
|
Adenine
|
B
|
{C,G,T,U}
|
Not A
|
C
|
C
|
Cytosine
|
D
|
{A,G,T,U}
|
Not C
|
G
|
G
|
Guanine
|
H
|
{A,C,T,U}
|
Not G
|
T
|
T
|
Thymine
|
V
|
{A,C,G}
|
Not T or U
|
U
|
U
|
Uracil
|
N
|
{A,C,G,T,U}
|
Any Nucleic acid
|
R
|
{A,G}
|
Purine
|
Y
|
{C,T,U}
|
Pyramidine
|
K
|
{G,T,U}
|
Ketone
|
M
|
{A,C}
|
Amino
|
S
|
{C,G}
|
Strong interaction
|
W
|
{A,T,U}
|
Weak interaction
|
|
|
|
Table 2: Amino Acid Codes
|
|
|
|
Code
|
Description
|
Code
|
Description
|
Code
|
Description
|
A
|
Alanine
|
J
|
I or L
|
S
|
Serine
|
B
|
D or N
|
K
|
Lysine
|
T
|
Threonine
|
C
|
Cysteine
|
L
|
Leucine
|
U
|
Selenocysteine
|
D
|
Aspartic acid
|
M
|
Methionine
|
V
|
Valine
|
E
|
Glutamic acid
|
N
|
Asparagine
|
W
|
Tryptophan
|
F
|
Phenylalanine
|
O
|
Pyrrolysine
|
|
|
G
|
Glycine
|
P
|
Proline
|
Y
|
Tyrosine
|
H
|
Histidine
|
Q
|
Glutamine
|
Z
|
E or Q
|
I
|
Isoleucine
|
R
|
Arginine
|
|
|
X
|
any amino acid
|
*
|
translation stop
|
-
|
gap of indeterminate length
|
|
|
|
|
Notes
|
|
•
|
Content-Type: chemical/seq-aa-fasta, chemical/seq-na-fasta
|
|
|
Examples
|
|
Import a DNA sequence from a FASTA file.
>
|
|
Read the descriptor for the first sequence in the file.
| (1) |
Examine positions 100 through 150 in this sequence.
| (2) |
Count the frequency of each of the nucleotide base pairs within the sequence.
>
|
|
| (3) |
>
|
|
|
|
References
|
|
|
|
|