bcdatabaser

A pipeline to create reference databases for arbitrary markers and taxonomic groups from NCBI data

This project is maintained by molbiodiv

back to the index

BCdatabaser output files

Directory name format

In the webinterface, the name format of the zip file/directory is determined by input paramters, with the scheme:

<MARKER>.<TAXONOMIC-RANGE>.<TAXA-RESTRICTION-LIST>.<DATE>

e.g.:

coi.insecta.DE-Frankonia.2019-10-24

In this extracted directory the following files are present. We recommend to also follow this file format in the command line version, especially if deposited in a database as a reference set (see also: Public deposition).

Sequence data

Syntax of taxonomy

Taxonomy is included in a syntax directly in the FASTA header as used by a variety of classifiers. Slight modifications might be necessary for some software tools accpt this format (see also the classification documentation). In more detail, we follow strictly the SINTAX nomenclature as described in the USEARCH manual:

><UNIQUEID>;tax=k:Kingdom,p:Phylum,c:Class,o:Order,f:Family,g:Genus,s:Species;

e.g.:

>LS453445;tax=k:Metazoa,p:Arthropoda,c:Insecta,o:Coleoptera,f:Carabidae,g:Molops,s:Molops_piceus;

The syntax is very similar to the RDP variant of the nomenclature:

>LS453445	Metazoa;Arthropoda;Insecta;Coleoptera;Carabidae;Molops;Molops_piceus

and can be parsed with this command:

sed -e "s/;tax=k:/\t/" -e "s/,[^:]:/;/g" sequences.tax.fa  > sequences.tax.rdp.fa 

Also it is very similar to the Greengenes variant of the nomenclature:

>LS453445  k__Metazoa;p__Arthropoda;c__Insecta;o__Coleoptera;f__Carabidae;g__Molops;s__:Molops_piceus

and can be parsed with this command:

sed -e "s/;tax=k:/\t/" -e "s/,/;/g" -e "s/:/__/g" sequences.tax.fa  > sequences.tax.gg.fa 

Visualisation as Krona Charts

In the directory, there is also a visual and interactive summary of sequences included in the final dataset, named as taxonomy.krona.html. This file can be viewed and interacted with using a standard internet browser. For large databases, the data of the chart may also be located in a corresponding subdirectory.

Example Krona Chart

Additional files

Also in the directory will be more files which may be interesting: