Skip to content

Required FASTA header format? #10

@bioinfoMMS

Description

@bioinfoMMS

Hello,

I wonder if you could give me more information about the required format for the fasta headers. I have been running pasteTaxID and while I don't get any errors, the tax ids do not show up in the results.

This header:

'>acc|GENBANK|AB866984.1|Human_immunodeficiency_virus_1_gene_for_pol_protein,_partial_cds,_isolate:_F10-5112353-1.|Human_immunodeficiency_virus_1|VRL|25-JUL-2014'

Comes out as:

'>ti||acc|GENBANK|AB866984.1|Human_immunodeficiency_virus_1_gene_for_pol_protein,_partial_cds,_isolate:_F10-5112353-1.|Human_immunodeficiency_virus_1|VRL|25-JUL-2014'

I am guessing it may have something to do with the header format. I did try to remove the GENBANK part so the header was:

'>acc|AB866984.1|Human_immunodeficiency_virus_1_gene_for_pol_protein,_partial_cds,_isolate:_F10-5112353-1.|Human_immunodeficiency_virus_1|VRL|25-JUL-2014'

A few of the tax ids were found, but most were not. For example:

' >ti||acc|FJ640294.1|Uncultured_marine_virus_isolate_CBSM-188_genomic_sequence.|uncultured_marine_virus|ENV|07-APR-2009'
'>ti|186617|acc|FJ640295.1|Uncultured_marine_virus_isolate_CBSM-189_genomic_sequence.|uncultured_marine_virus|ENV|07-APR-2009'

Any help would be appreciated.

Thanks,
Maddy

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions