Alphabetic WordNet 3.0 WordNet (3.0) has been totally converted into alphabetical format into a DIMAP dictionary. This conversion uses the official WordNet distribution.See the full description of how the alphabetic version of WordNet 3.0 was created, then follow the link here to register your download. A separate entry has been made for each distinct word (including underscore words) in every synset of WordNet, with a distinct sense for each synset in which the word appears. All information available in WordNet has been converted into DIMAP format. All hypernyms have been entered as DIMAP superconcepts; all hyponyms as DIMAP instances; and all other relations (synonyms, meronyms, holonyms,troponyms, antonyms, pertainyms, entailments, causes, similars, and also sees) as distinct DIMAP roles. All verb frames have been explicitly converted into various kinds of features in DIMAP senses, with complex frames explicitly represented as collocation patterns. Each sense has been explicitly identified with an id feature corresponding to the WordNet file number and sense number. Adjective types are explicitly identified in DIMAP features. Glosses have been taken apart into definitional components, example usages, and grammatical patterns (usually prepositional accompaniments). Full details of the conversion can be found in the DIMAP help file. A separate Heads dictionary contains one sense for each word that appears as either the final word of mutiword and hyphenated noun and adjective entries or the first word of multiword verb entries. DIMAP dictionaries are available for earlier versions of WordNet (from version 1.5 onward). The size of the compressed files is approximately 20 MB; uncompressed, the size of the DIMAP dictionaries is 55 MB.
UMLS Specialist Lexicon The UMLS Specialist Lexicon (2021) has been totally converted into alphabetical format into a DIMAP dictionary. The Specialist Lexicon of the Unified Medical Language System is designed for the specialized lexical needs of medical community. This lexicon contains 524,323 terms and was developed to provide the lexical information needed for the SPECIALIST Natural Language Processing System. Alphabetic DIMAP dictionaries have been created for 504,524 main entries, as well as for a variants dictionary of 345,679 entries. These dictionaries also provide comprehensive coverage of general English, in addition to the extensive coverage of biomedical terms. The data elements in the lexicon describe syntactic characteristics of each entry, including inflection codes, case, gender, syntactic category, complements for verbs and nouns, modification types for adverbs, and more. This is lexicon was developed as a free, publicly available resource, with only moderate restrictions (e.g., you can't claim it as your own). The DIMAP distribution includes an extensive help file that describes how each element of Specialist has been handled, along with the Perl scripts used to create the files uploaded into DIMAP. The online version of the help file provides sufficient details to use this latest version of Specialist. The size of the installation file is approximately 50.5 MB; the sizes of the DIMAP dictionaries are 62.9 MB and 38.8 MB. These dictionaries are very large and take considerable time to load (2:40 for the full dictionary and 1:44 for the variant dictionary).
UMLS Specialist Lexicon (Raw) The UMLS Specialist Lexicon (2021) has been totally converted into alphabetical format readied for uploading into a DIMAP dictionary. The raw data has been uploaded into a DIMAP dictionary (as above). The Specialist Lexicon of the Unified Medical Language System is designed for the specialized lexical needs of the medical community. This raw form may be useful as a lexicon for researchers in natural language processing. The online version of the help file provides sufficient details to use this latest version of Specialist. Specialist contains 524,323 terms that were converted into 504,524 DIMAP entries (i.e., about 20,000 entries with multiple senses) and a variants dictionary of 345,679 entries. These dictionaries provide comprehensive coverage of general English, in addition to the extensive coverage of biomedical terms. The data elements in the lexicon describe syntactic characteristics of each entry, including inflection codes, case, gender, syntactic category, complements for verbs and nouns, modification types for adverbs, and more. This lexicon was developed as a free, publicly available resource, with only moderate restrictions (e.g., you can't claim it as your own). The CL Research distribution consists of a zipped file, containing the raw data, Perl scripts for processing the SPECIALIST LEXICON, and a Raw SPECIALIST Lexicon Use file describing the file contents.
Alphabetic FrameNet Dictionary The FrameNet 1.5 data have been converted into an alphabetic dictionary. This dictionary contains 11053 entries, with 8568 entries for lexical items (many having multiple senses with different parts of speech) and 2485 entries that encode the frames and frame relations. Details of these items can be found through the main FrameNet site. The help file accompanying the FrameNet Dictionary provides a more detailed description of the dictionary and how it was constructed.
FrameNet Frame Element Dictionary The FrameNet 1.3 frame-to-frame relations and frame element defintions have been analyzed to create a dictionary of frame elements (see details). This dictionary contains 1004 entries, with hypernymic links between frame elements that permit the creation of a frame element taxonomy. (An online version also permits examination of this taxonomy.) The distribution also includes files used in the creation of a MySQL database of the taxonomy.
The Preposition Project Data Data from The Preposition Project Online include a DIMAP dictionary of all English prepositions (November 2008) (courtesy of Oxford University Press), containing much of the data and with disambiguated hypernymic relationships as used in the digraph analysis of preposition classes.
The Preposition Project Corpora This package contains three preposition corpora: (1) the training and test sets used in the SemEval-2007 task on preposition disambiguation, drawn from FrameNet (FN) (24,481 sentences), (2) a set of 7,650 sentences from the Oxford English Corpus (OEC) as examples for senses in the Oxford Dictionary of English (ODE), and (3) a set of 48,000 sentences from the written portion of the British National Corpus, drawn with methodology used in the Corpus Pattern Analysis project (CPA). The first corpus covers 34 prepositions, while the latter two include all single-word prepositions and many phrasal prepositions. Each corpus consists of sentences following the SemEval format. In addition, each sentence has been lemmatized, part-of-speech tagged, and parsed with a dependency parser. These corpora are described in an accompanying paper, The Preposition Project Corpora.
Pattern Dictionary of English Prepositions Data All data from Pattern Dictionary of English Prepositions (PDEP) are available in a 46.7 MB zipped file. This file contains (1) a script to create the vertical files uploaded to Sketch Engine, with all supporting data and results, (2) three MySQL files suitable for upload into a MySQL database (definitions for all 1040 senses (patterns) of 304 prepositions, properties for each sense in 27 fields, and tagged instances for all sentences in the TPP corpora), and (3) help files describing the status of the corpora and the scripts for creating the vertical files. See Pattern Dictionary of English Prepositions for a full description of PDEP. As significant changes to the PDEP are made, a new version of this data will be make available. (Latest: July 23, 2019.)
DIMAP ODE Dictionary The electronic versions of the Oxford Dictionary of English are not publicly available, but may be licensed through CL Research for research purposes. See Oxford University Press - CL Research Collaboration for details (also see paper on the synergy between NLP and computational lexicography). DIMAP versions are available for the 1st edition (1998), the 2nd edition (2003), and the next edition currently under development.
The Minnesota Contextual Content Analysis (MCCA) DIMAP Dictionary: This dictionary (which is included in MCCALite) contains 11,000 entries, many with multiple senses. Each sense has one of 116 emphasis categories as used in MCCA. MCCALite can be freely downloaded via the link.
DIMAP dictionaries have been developed over many years. These dictionaries may provide as examples of what may be possible in constructing dictionary data.
The compiled dictionary used in the Proximity Parser as integrated in DIMAP can be examined in the Proximity Parser Demo.printf("You must enter your email address in the Email field. Please return to the registration page and enter your email address."); ?>