Non Gamstop Casinos Non Gamstop Casino Non Gamstop Casino New Non Gamstop Casinos UK Migliori Bonus Casino Casino Non Aams Italia

CL Research Knowledge Management System (KMS)

The Knowledge Management System (KMS) is designed to provide a single interface for a range of text processing functions:

creating repositories of texts (frequently based on integrated web searches),
creating a single XML representation of texts for several types of analysis (incorporating full parsing, discourse analysis, discourse entity analysis, and XML tagging of text elements),
characterizing document contexts through automatic keyword generation and headline creation,
answering natural language questions,
creating general and topic-based summaries (where a topic can be described by a single word or a full paragraph),
semantic category analysis of major text elements (nouns and verbs), and
creation of single or multiple document ontologies.

(See the KMS slide show for a detailed description, including screen shots.)

CL Research is now seeking beta-testers for KMS. KMS is best viewed as a tool for some regularized knowledge intensive process, such as intelligence gathering, scientific research, litigation support, or other continuing need for information of a specific type. In working with clients, CL Research has found that each client has a different information need (that is, follows a unique user model). No general models of user behavior in making use of the technologies incorporated in KMS (i.e., question answering, summarization, information extraction, document exploration, and ontology use) have been developed in the research community. CL Research has developed a beta-testing paradigm designed to examine and characterize different user models.

The Beta-Testing Program

Acceptable beta-testers will have a reasonably well-developed and characterized information need. CL Research will provide, at no cost, all components of KMS and its supporting programs for a one-year period, upon the beta-tester signing a non-disclosure agreement and a beta-testing agreement. KMS contains an integrated component for requesting assistance, reporting bugs, and suggesting features. CL Research will not provide any direct assistance, other than attempting to incorporate user comments in revisions of KMS (unless the beta-tester wishes to enter into a separate contractual agreement). The beta-tester may keep any output generated by KMS, without any restriction. (KMS output is all in an XML format, with simple structures and may include answers to questions, keyword lists, single or multiple document summaries, and single or multiple document ontologies.) CL Research makes no promise that KMS will be released as a formal product. For further information, contact CL Research.

Core Technology

KMS incorporates the latest language engineering technologies covering the full spectrum of text processing from the word level to summaries of multiple texts. Text from a variety of common formats (such as HTML, DOC, PDF, and WPD) is converted into XML documents and is then processed into a unified framework (XML tagged) that enables full exploitation of the meaning of the text. Using a single interface to access the XML-tagged representation, the user can create general summaries of one or more documents, create topical summaries focused on events or points of view, obtain answers to fact-based questions (with the sentences in which they're found), create essay summaries answering more general questions, extract information for databases, examine a document's semantic network structure, and probe the details of documents from many perspectives. CL Research's software consists of three principal components: text processing, text summarization, and text analysis. The overall architecture of KMS is shown below.

Text Processing

CL Research's core text processing technology creates an XML representation of the text and includes the following features:

Full Text Parsing: Separates text into sentences, parses each sentence, and analyzes the semantic content of each sentence (including relationships to previous sentences in the text).
XML Representation: Creates an XML representation of the text as a nested and tagged structure of sentences, clauses, noun phrases, verbs, and prepositions (important carriers of semantic relationships), each of which is tagged with important syntactic and semantic characteristics.
Multiple File Formats: Processes arbitrary XML and SGML files (with user specifying the DTD text elements to be processed). Auxiliary programs are available to convert several web page styles (HTML, CFM, SFM), Word documents (DOC), Adobe Acrobat files (PDF), and WordPerfect documents into a common XML representation.
Rapid Processing: State of the art speed processes 400 sentences per minute.
Dictionary and Thesaurus Incorporation: Makes use of standard publicly available dictionary/thesaurus (WordNet) and, optionally, licensable dictionaries and thesauruses from major publishers. CL Research's core dictionary technology (DIMAP) facilitates the incorporation of specialized dictionaries into KMS (law dictionaries, technology dictionaries, medical dictionaries, or any dictionary/thesaurus specific to the user's needs. (Will be extended to incorporate word sense disambiguation technologies developed by CL Research.)

Text Summarization

Text summarization is performed with an XML analyzer that enables examination of one or many documents from many angles (virtually instantaneously for moderately-sized collections, such as 50 newspaper articles), including the following:

General Text Summarization: Creates an extractive summary (selecting sentences) of all or selected documents in a user-selected length, based on a unique frequency analysis of noun phrases that includes substitution of full names for referring phrases (such as pronouns or use of person last names.
Topical Text Summarization: Uses the same methods as general summarization, with the addition that the user can write a topical statement to focus the summary to a particular topic, event, or slant (the "topical statement" can be a simple list of key words).
Sentence Question Answering: Uses the same methods as topical summarization, except with "topics" phrased as questions to select the most likely sentences answering the question (user can also create a summary of the sentences).
Batch Screening against Established Topics or Questions: Allows user to specify a set of topics or questions to screen each document or all documents, to create a summary for each topic or question (optionally outputting all the answers to an XML file).

Text Analysis

Text analysis allows the user to probe more deeply into the document collection (exploiting the rich underlying XML structure) with a variety of tools.

Factual Question Answering: Using the XML query language (XPath expressions), provides the exact answer to a question, identifying the document and sentence number and providing the full sentence that answers the question.
Information Extraction: (Planned) Using the underlying technology described in batch screening summarization, allows the development of a set of syntactic and semantic patterns to extract information nuggets from the document set, for use in filling templates (e.g., a doctor's diagnosis and treatment, or an organization's mergers and acquisitions).
Frequency Counts: Generates a frequency list of words in noun phrases, uniquely substituting the full reference for co-referring expressions (such as pronouns and shortened coreferences to organizations or people), and ignoring commonly used words (a stop list).
Finding Occurrences: Finds occurrences of user-selected words from frequency counts and returns their full noun phrases, the document and sentence number, and the full sentence.
Finding Relations: For the selected words, examines the relations they hold in their occurrences, such as subject, object, and prepositional object, along with the words that govern them (such as the verbs or prepositions).
Synonyms and Hierarchical Relations: Establishes a hierarchy of the nouns and verbs in the document(s), using the publicly available WordNet, allowing users to identify sentences expressing similar concepts with different words.