CLOSET -- CLoud Open SequencE clusTering

What is CLOSET?

CLOSET (CLoud Open SequencE clusTering) is a map-reduce framework for taxonomy independent metagenomic clustering using sketching and quasi-clique enumeration. It is highly scalable and accurate, and can handle metagenomic collections consisting of millions of reads. Although is has been designed for 16S rRNA 454 amplicon reads, it is applicable to a wider range of problems. CLOSET operates in two stages: first it builds a graph capturing pairwise similarity between reads. Then, it performs clustering by enumerating maximal quasi-cliques in the similarity graph. Thanks to the application of DNA sketching technique CLOSET can construct similarity graphs without resorting to the expensive all vs. all comparison. CLOSET is implemented in C++ and HADOOP Pipes, combined with the libhdfs library. CLOSET has been developed by Jaroslaw Zola, Xiao Yang and Srinivas Aluru. You can find more details about CLOSET in this white paper. If you are interested in CLOSET you may also want to consider ELaSTIC.


  • HADOOP cluster – CLOSET is a map-reduce framework implemented in HADOOP. We tested HADOOP 0.23 and 1.0.
  • Java SDK – Java SDK 1.6 or newer. We tested Oracle Java SE 1.6 update 21.
  • C++ compiler – we suggest to use GNU compilers collection, version 4.6 or newer.
  • GNU make – we use make to build CLOSET. Make is routinely available with most Linux distributions.


The latest version of CLOSET is r78, and it has been released on Oct 9, 2012. CLOSET is distributed under the MIT Licence, with some components covered by the Boost Software License.

click here to download

The package contains documentation explaining the installation requirements and procedure, as well as examples illustrating how to configure and run CLOSET. For any questions or feedback, please contact Jaroslaw Zola.


When using CLOSET please cite:

X. Yang, J. Zola, S. Aluru, “Large Scale Metagenomic Clustering on Map-Reduce Clusters”, Journal of Bioinformatics and Computational Biology 11(1):1340001, 2013.

X. Yang, J. Zola, S. Aluru, “Parallel Metagenomic Sequence Clustering Via Sketching and Maximal Quasi-clique Enumeration on Map-Reduce Clouds”, In Proc. IEEE Int. Parallel and Distributed Processing Symposium (IPDPS), pp. 1223-1233, 2011.