FASTA - Maple Help

Online Help

All Products    Maple    MapleSim


FASTA (.fasta) File Format

FASTA file format

 

Description

Details on the FASTA format

Notes

Examples

References

Description

• 

FASTA is a plaintext format for storing protein or nucleic acid (DNA or RNA) data as character sequences.  It is a popular interchange format for molecular biology software.

• 

The commands Import and Export support this format.

Details on the FASTA format

• 

The FASTA format employs the following standard IUB/IUPAC conventions for encoding protein or nucleic acid sequences as alphabetic characters.

• 

In addition to codes specifying particular nucleic acids or amino acids, the convention supports codes for ambiguous sequences where a position may be occupied by more than one possible nucleic acid or amino acid. For example the code R matches either adenine (A) or guanine (G).

 

Table 1: Nucleic Acid Codes

 

Code

Meaning

Description

Code

Meaning

Description

A

A

Adenine

B

{C,G,T,U}

Not A

C

C

Cytosine

D

{A,G,T,U}

Not C

G

G

Guanine

H

{A,C,T,U}

Not G

T

T

Thymine

V

{A,C,G}

Not T or U

U

U

Uracil

N

{A,C,G,T,U}

Any Nucleic acid

R

{A,G}

Purine

Y

{C,T,U}

Pyramidine

K

{G,T,U}

Ketone

M

{A,C}

Amino

S

{C,G}

Strong interaction

W

{A,T,U}

Weak interaction

 

Table 2: Amino Acid Codes

 

Code

Description

Code

Description

Code

Description

A

Alanine

J

I or L

S

Serine

B

D or N

K

Lysine

T

Threonine

C

Cysteine

L

Leucine

U

Selenocysteine

D

Aspartic acid

M

Methionine

V

Valine

E

Glutamic acid

N

Asparagine

W

Tryptophan

F

Phenylalanine

O

Pyrrolysine

 

 

G

Glycine

P

Proline

Y

Tyrosine

H

Histidine

Q

Glutamine

Z

E or Q

I

Isoleucine

R

Arginine

 

 

X

any amino acid

*

translation stop

-

gap of indeterminate length

Notes

• 

Content-Type: chemical/seq-aa-fasta, chemical/seq-na-fasta

Examples

Import a DNA sequence from a FASTA file.

DNASequenceImportexample/humanmtDNA.fasta,base=datadir:

Read the descriptor for the first sequence in the file.

DNASequence1,1

Human mitochondrial genome,HVR2,CR,HVR1

(1)

Examine positions 100 through 150 in this sequence.

DNASequence1,2100..150

GGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATC

(2)

Count the frequency of each of the nucleotide base pairs within the sequence.

frequenciesStringTools:-CharacterFrequenciesDNASequence1,2,dna

frequenciesA=5118,C=5185,G=2175,T=4092

(3)

Statistics:-ColumnGraphfrequencies

References

  

IUPAC code for incomplete nucleic acid specification, National Center for Biotechnology Information.

  

A One-Letter Notation for Amino Acid Sequences, International Union of Pure and Applied Chemistry.

See Also

Formats

Formats,FASTQ

Formats,GenBank