langutils - A natural language toolkit for common lisp
Download
Overview
The library provides a heirarchy of major functions and auxiliary
functions related to the structured analysis and processing of
open text. The major functions working from raw text up are:
- String tokenization (string -> string)
- Part of speech tagging (string -> tokens -> vector-document)
- Phrase chunking (vector-document -> phrases)
Detailed guide to the main parts of the API can be found in the
distribution README file. Also included is a paper
presented at the 2005 Lisp Conference
discussing aspects of the library implementation.
Tested platforms:
- Allegro 7.0 under Tiger 10.3/10.4.1
I hope to test soon under sbcl and clisp for the Mac platform and perhaps clisp for the
PC. If you have successfully loaded the libraries and main code on a platform
please write me at the address at the bottom of the page so I can update this page.
Functions
Strings:
- Tokenize a string (separate punctuation from word tokens)
- POS tag a string or file returning a file, string or vector-document
- Identify suspicious strings that may become tokens
Tokens: