summaryrefslogtreecommitdiffstats
path: root/academic/clark-ugene/README
diff options
context:
space:
mode:
Diffstat (limited to 'academic/clark-ugene/README')
-rw-r--r--academic/clark-ugene/README39
1 files changed, 39 insertions, 0 deletions
diff --git a/academic/clark-ugene/README b/academic/clark-ugene/README
new file mode 100644
index 0000000000..4e9386f2ff
--- /dev/null
+++ b/academic/clark-ugene/README
@@ -0,0 +1,39 @@
+This is Ugene's (http://ugene.net/) fork of the CLARK tool
+(http://clark.cs.ucr.edu/Tool/), with supports building DB directly from
+gzip & 7z packed RefSeq files
+
+CLARK: CLAssifier based on Reduced K-mers
+
+The problem of DNA sequence classification is central to several
+application domains in molecular biology, genomics, metagenomics and
+genetics. The problem is computationally challenging due to the size of
+datasets generated by modern sequencing instruments and the growing size
+of reference sequence databases.
+
+CLARK is a novel method for supervised sequence classification based on
+discriminative k-mers. Somewhat unique among other metagenomic and
+genomic classification methods, CLARK provides a confidence score for
+its assignments which can be used in downstream analysis. The utility of
+CLARK is demonstrated on two distinct specific classification problems:
+
+1) the assignment of metagenomic reads to known bacterial genomes
+2) the assignment of BAC clones and transcript to chromosome arms (in
+ the absence of a finished assembly for the reference genome).
+
+Three classifiers or variants in the CLARK framework are provided :
+CLARK (default): created for powerful workstation, it may require a
+significant amount of RAM to run with large database (e.g., all
+bacterial genomes from NCBI/RefSeq). This classifier queries k-mers
+with exact matching.
+
+CLARK-l (light): created for workstations with limited memory, this
+software tool provides precise classification on small metagenomes.
+Indeed, for metagenomics analysis, CLARK-l works with a sparse or
+"light" database (up to 4 GB of RAM) that is built using distant and
+non-overlapping k-mers. This classifier queries k-mers with exact
+matching.
+
+CLARK-S (spaced): created for powerful workstation exploiting spaced k-
+mers, this classifier requires a higher RAM usage than CLARK or CLARK-l,
+but it does offer a higher sensitivity. CLARK-S completes the CLARK
+series of classifiers.