Data Mining & Bioinformatics Lab
Home Project Personnel Publication Seminar Software  
 

The Dragon ToolKit

Designed for Languge Modeling and Information Retrieval
The Dragon Tooolkit is a cute Java-based development package for academic research use in langaguge modeling (LM) and information retrieval (IR). Language modeling has recently emerged as an attractive new framework for text information retrieval and text mining (TM). However, most Java-based free search engines such as Rucene does not support LM very well. The Lemur toolkit is designed for LM and IR, but written in C and C++, which may be a hindrance to people who prefer Java programming. Basically, the dragon toolkit is tailored for researchers who work on large-scale LM and IR and prefer Java programming. Moreover, different from Lucene and Lemur, it provides built-in supports for semantic-based IR and TM. The dragon tookit seamlessly intergrates and implements a set of NLP tools, which enable the toolkit to index text collections with various representation schemes including words, phrases, ontology-based concepts and relationships. However, to minimize the learning time, we intentionally keep the package small and simple. The toolkit does not have some features including distributed IR and cross-language IR which are part of Lemur toolkit.
The Personnel

Supervisor of the Dragon Project:
Xiaohua (Tony) Hu, thu@ischool.drexel.edu

System Analysis and Design:
Xiaohua (Davis) Zhou
, xiaohua.zhou@drexel.edu

Chief Programmers:
Xiaohua (Davis) Zhou, xiaohua.zhou@drexel.edu
Xiaodan (Tom) Zhang, xzhang@ischool.drexel.edu

The Examples for Development
To help developers get on the track of programming with the Dragon Toolkit, we provides some examples for different applications: (1) MaxMatcher (Biolgoical Term Extraction), (2) Text Retrieval (Language Modeling Information Retrieval with Semantic Smoothing), and (3) Text Clustering (Model-based Document Clustering with Semantic Smoothing).
How to Cite Dragon Toolkit

If you are using the Dragon Toolkit for research work, please cite it in your published papers:

Zhou, X., Zhang, X., and Hu, X., The Dragon Toolkit, Data Mining & Bioinformatics Lab, iSchool at Drexel University, http://www.ischool.drexel.edu/dmbio/dragontool

Download Dragon Toolkit

Get the Dragon Toolkit source code and binary libraries (including external libraries) and necessary supporting data. Click here to download.