This is Ugene's (http://ugene.net/) fork of the CLARK tool (http://clark.cs.ucr.edu/Tool/), with supports building DB directly from gzip & 7z packed RefSeq files CLARK: CLAssifier based on Reduced K-mers The problem of DNA sequence classification is central to several application domains in molecular biology, genomics, metagenomics and genetics. The problem is computationally challenging due to the size of datasets generated by modern sequencing instruments and the growing size of reference sequence databases. CLARK is a novel method for supervised sequence classification based on discriminative k-mers. Somewhat unique among other metagenomic and genomic classification methods, CLARK provides a confidence score for its assignments which can be used in downstream analysis. The utility of CLARK is demonstrated on two distinct specific classification problems: 1) the assignment of metagenomic reads to known bacterial genomes 2) the assignment of BAC clones and transcript to chromosome arms (in the absence of a finished assembly for the reference genome). Three classifiers or variants in the CLARK framework are provided : CLARK (default): created for powerful workstation, it may require a significant amount of RAM to run with large database (e.g., all bacterial genomes from NCBI/RefSeq). This classifier queries k-mers with exact matching. CLARK-l (light): created for workstations with limited memory, this software tool provides precise classification on small metagenomes. Indeed, for metagenomics analysis, CLARK-l works with a sparse or "light" database (up to 4 GB of RAM) that is built using distant and non-overlapping k-mers. This classifier queries k-mers with exact matching. CLARK-S (spaced): created for powerful workstation exploiting spaced k- mers, this classifier requires a higher RAM usage than CLARK or CLARK-l, but it does offer a higher sensitivity. CLARK-S completes the CLARK series of classifiers.