
The primary mission of CL Research is to investigate the structure of dictionaries (computational lexicons) and their role in natural language processing applications, as well as enhancing existing dictionary databases with publicly available lexical resource data (see Dictionary Analysis Services).
The structure of computational lexicons is investigated using DIMAP (DIctionary MAintenance Program). This Windows program provides a generalized structure for creating entries with multiple senses (focus on computational lexicography). Unlike ordinary dictionaries, DIMAP provides specific capabilities for representing superordinate and instance links, feature attributes and values, and generalized semantic relations to other entries and their senses. DIMAP includes functionality which permits the following types of computations within the lexicon (focus on computational lexicology), along with a range of maintenance functions:
- parsing definitions to identify superordinates and other types of semantic relations;
- analyzing the definitional hierarchies established by superordinates to identify definitional cycles (digraph analysis); and
- mapping between entries in different dictionaries.
DIMAP dictionaries have been created for several publicly-available lexicons, as well as electronic versions of published dictionaries, including
- an alphabetic version of the publicly-available WordNet 3.0 (see Electronic Dictionaries),
- an alphabetic version of the publicly-available UMLS Specialist Lexicon (August 2021) - with a raw form for the July 2021 version) (see Electronic Dictionaries),
- an alphabetic version of FrameNet 1.5, incorporating all lexical items, frame characterizations, and frame relations (see Electronic Dictionaries), created with FrameNet Explorer for Windows (which also can be used to explore FrameNet frames, frame elements, and lexical units, as well as creating samples for testing frame identification).
- a FrameNet frame element dictionary, used to create a frame element taxonomy,identifying hypernymic links between frame elements and the number of frames in which these frame elements appear (see Electronic Dictionaries and an online version that allows exploration of this taxonomy),
- a dictionary of all English prepositions (courtesy of Oxford University Press), further developed and analyzed in The Preposition Project, with an online version and broken down into preposition classes with digraphs showing derivational relationships and with preposition corpora containing 80,000 sentences, including tokenized, lemmatized, and dependency parsed versions in CoNLL-X format (see The Preposition Project Corpora). A corpus pattern analysis of preposition behavior, with a Pattern Dictionary of English Prepositions (PDEP), has been initiated (with an online version), following principles developed by Patrick Hanks (with current data available for download in MySQL files), and
- the Oxford Dictionary of English (1st and 2nd editions) and the Macquarie Dictionary.
The electronic versions of the Oxford and Macquarie lexical resources are not publicly available, but may be licensed through CL Research for research purposes.
CL Research has implemented the content analysis tool Minnesota Contextual Content Analysis (MCCA), used for statistical characterization and analysis of texts, from sentences to books, tweets and Likert scales, newspaper articles and blogs, including multiple person texts such as transcripts of focus groups, interviews, hearings, and TV scripts or plays such as Hamlet. CL Research converted the original MCCA Fortran program and its dictionary (into a DIMAP dictionary) for use in personal computers. MCCALite is free for non-commercial use.
The results of our research in examining the role of computational lexicons are incorporated in the Knowledge Management System (KMS), which is a unified platform for
- parsing and analyzing text (from most formats, including Word, PDF, XML, and web pages),
- answering free-form natural language questions,
- summarizing one or more documents, generally or topic-based,
- extracting information,
- exploring document contents, and
- dynamically creating ontological representations of document contents.
KMS is accompanied by several supporting programs, parts of which are incorporated directly, that feed into KMS or provide specialized analysis functions. These include:
- a Text Parser, which can be used to provide background parsing and processing of large numbers of texts into XML representations for KMS use,
- an XML Analyzer, which provides more specialized XML tools for examining XML documents of any type, and
Fully-functional demonstration versions of these supporting programs are available (see Demos).
CL Research has also developed a Windows-based utility, FrameNet Explorer (FNE), for examining the FrameNet database (see Demos). CL Research is using FNE in supporting an in-depth comprehensive, publicly-available characterization of the behavior of English prepositions (their semantic roles and the properties of the preposition complements and attachment points) in The Preposition Project.
Ken Litkowski of CL Research was previously the webmaster for the Association for Computational Linguistics Special Interest Group on the Lexicon and was a guest editor for a special issue of Computational Linguistics on semantic role labeling.
CL Research also provides consulting assistance on
- lexicon development, with a particular expertise in the bioinformatics domain (using the UMLS Metathesaurus and the UMLS Specialist Lexicon),
- research on the creation of ontologically-oriented lexicons out of standard lexicons and dictionaries, and
- advanced text processing and natural language applications, with a focus on extracting textual content from documents, particularly using KMS.
Maintained by Ken Litkowski.
© 1992-2021 CL Research
Modified: Sep 25, 2021