Oxford University Press-CL Research Collaboration

Oxford University Press has designated CL Research as an agent for licensing the machine-readable version of the New Oxford Dictionary of English (NODE) to the academic and commercial research community. CL Research has created a machine-tractable version of NODE in its DIMAP dictionary maintenance programs, adding syntactic and semantic information in the conversion. Using functionality to parse the dictionary definitions, DIMAP has further enhanced NODE through the addition of many semantic links, including hypernyms, synonyms, and other semantic relations, thus making NODE+DIMAP a semantic network of the English language. (Details on contents of NODE+DIMAP.)

NODE is considered "a major achievement" in lexicography, integrating principles of native-speaker and learner dictionaries and providing considerable linguistic and encyclopedic information. Importantly, NODE provides a new entry style, clearly identifying an entry's core meaning ("in ordinary modern usage") and the subsenses that specialize the core meaning. NODE bases the "ordinary modern usage" on the British National Corpus and Oxford's reading program. NODE contains considerable information useful in natural language processing (NLP): subcategorization patterns for verbs, lexical preferences of verb subjects and objects and the modificands of adjectives, syntactic features for nouns and adjectives, and collocative evidence. Collocations are provided in three forms, with extensive phrasal main entries, distinct phrases associated with a head word, and corpus examples chosen to reflect actual usage and phrasal patterns.

In making NODE machine-tractable, CL Research has converted the NODE fields into clearly identified fields suitable for use in NLP, based on CL Research's experience in creating lexicons for word-sense disambiguation, question-answering, and information extraction. E.g., variable multiword units are converted into regular expressions containing lexical and syntactic preferences. After conversion, NODE data are available for many forms of analysis using DIMAP functionality, most notably the parsing of definitions to populate the data with semantic relation links, making the dictionary into a vast semantic network (rooted in lexicographically sound data). Inside DIMAP, the data are available for further analysis (most notably, CL Research's digraph-based primitive finding and dictionary mapping routines). DIMAP also provides considerable flexibility in searching for definitional patterns (regular expression searches on headwords, definitions, hypernyms, features, and other semantic links), extracting subdictionaries, and comparing entries with an integrated WordNet. (More details on CL Research's conversion and analysis of dictionary data are available in "The Synergy of NLP and Computational Lexicography".)

NODE+DIMAP provides an unparalleled resource for research into the nature and application of the lexicon. Moreover, the CL Research collaboration with Oxford University Press lexicographers is an ongoing process of extracting and mining the data. CL Research is available for customizing NODE to meet your requirements.

An academic research license is $1,500 for NODE and $2,000 for NODE+DIMAP for a two-year period. A commercial research license is $7,500 for NODE and $10,000 for NODE+DIMAP for two years. The license agreement, also available as a Microsoft Word template, specifies the terms of the license.

Contact Ken Litkowski, CL Research, 9208 Gue Road, Damascus, MD 20872 (301-482-0237) for further details on licensing arrangements. Please inquire about other lexical resources available from Oxford University Press (the New Oxford Thesaurus of English, bilingual dictionaries, and a lemma<-->inflection/derived-form list).

This document maintained by Kenneth Litkowski ken@clres.com .
Material Copyright © 2001 CL Research