You are on page 1of 2

inf.uni-hamburg.

de

ASVToolbox

Reinhard Zierke
3–4 minutes

ASV Toolbox

This is the mirrored homepage for the ASV Toolbox project, which has not been actively
supported since 2008. For legacy reasons, you can download version 1.0 of the ASV
Toolbox. It is written in JAVA (compiled with version 1.5). The toolbox is distributed under
the MIT license.
The program was developed at ASV Leipzig.

Introduction

ASV Toolbox is a modular collection of tools for the exploration of written language data.
They work either on word lists or text and solve several linguistic classification and
clustering tasks. The topics covered contain language detection, POS-tagging, base form
reduction, named entity recognition, and terminology extraction. On a more abstract level,
the algorithms deal with various kinds of word similarity, using pattern based and statistical
approaches. The collection can be used to work on large real world data sets as well as
for studying the underlying algorithms. The ASV Toolbox can work on plain text files and
connect to a MySQL database. While it is especially designed to work with corpora of the
Leipzig Corpora Collection, it can easily be adapted to other sources.

Installation

Download the zip file and unzip it into a directory of your choice.
ASV Toolbox modules and modules resources (examples, documentation, languages, ...)
Download the zip file. Unzip the zip file to the directory containing the ASV Toolbox home.
Windows users might simply use "extract here", UNIX users should use "unzip -o
<filename>.zip"
If you download a module you have edit the file toolbox.start which you will find in config
folder in your ASV toolbox home. Every module has a copy of this file named
toolbox.start.modulename. After unzipping the module, this file is located in the config
folder. Copy the line into the toolbox.start file (use a new line). Example: if you want to
include Genetomorph and ViterbiTagger, your toolbox.start file should look like this:
de.uni_leipzig.asv.toolbox.genetoMorph.GenetoMorph
de.uni_leipzig.asv.toolbox.viterbitagger.gui.ViterbiTagger

The complete ASV Toolbox package contains the following modules:


• Chinese Whispers: graph clustering tool
• Levenshtein: spell checking tool
• Baseforms: baseform reduction and splitting compound nouns tool
• Pretree: training tool for pretrees and classify tool
• TE: terminology extraction tool
• Pendulum: gazetteer bootstrapping tool (for Named Entity Recognition)
• Namerec: Named Entity Recognition system
• JLanI: language identification tool
• Viterbitagger: POS tagging tool
• Zipfel: tool for Zipf's law
• AHC: agglomerative hierarchical clustering tool
• Genetomorph: finding morphological structure with a genetic algorithm
• Your Tool: template tool for your program
Version: 1.0
file format: zip
file size: 258MB
file link: ASV Toolbox.zip
alternative file link: ASVToolbox.zip

Citing ASV Toolbox

Chris Biemann, Uwe Quasthoff, Gerhard Heyer and Florian Holz (2008): ASV Toolbox: a
Modular Collection of Language Exploration Tools. Proceedings of LREC-08, p. 1760-1767
Marrakech, Morocco (pdf)

You might also like