Logo des Repositoriums

P029 - Natural language processing and information systems 2003

Autor*innen mit den meisten Dokumenten  

Auflistung nach:

Neueste Veröffentlichungen

1 - 10 von 22
  • Konferenzbeitrag
    From scenarios to KCPM dynamic schemas: Aspects of automatic mapping
    (Natural language processing and information systems, 2003) Fliedl, Günther; Kop, Christian; Mayr, Heinrich C.
    Scenarios are a very popular means for describing and analyzing behavioral aspects on the level of natural language. In information systems design, they form the basis for a subsequent step of conceptual dynamic modeling. To enhance this step, linguistic instruments prove applicable for transforming scenarios into conceptual schemas of various models. This transformation usually consists of three steps: linguistic analysis, component mapping and schema construction. Within this paper we investigate to which extent these steps may be performed automatically in the framework of KCPM, a conceptual predesign model which is used as an Interlingua between natural language and arbitrary conceptual models.
  • Konferenzbeitrag
    SEWISE : An ontology-based web information search engine
    (Natural language processing and information systems, 2003) Gardarin, Georges; Kou, Huaizhong; Zetourni, Karina; Meng, Xiaofeng; Wang, Haiyan
    Since the begin of the 90's, the World Wide Web (WWW) rapidly guides the world into a newly amazing electronic village, where everybody can publish everything in electronic form and find almost all required information. The volume of available information is increasing exponentially in different formats, 80% being text. It remains hard to find interesting information directly from Web sources. SEWISE is an ontology-based Web information system to support Web information description and retrieval. According to domain ontology, SEWISE can map text information from various Web sources into one uniform XML structure and make hidden semantic in text accessible to program. The textual information of interest is automatically extracted by Web Wrappers from various Web sources and then text mining techniques such as categorization and summarization are used to process retrieved text information. Finally, text descriptions are built in XML format that can be directly queried. SEWISE provides support for topic-centric Web information search. The SEWISE prototype is implemented and has been experimented using French financial Web news from several popular sites.
  • Konferenzbeitrag
    Improving the efficacy of approximate searching by personal-name
    (Natural language processing and information systems, 2003) Camps, Rafael; Daudé, Jordi
    We discuss the design and evaluation of a method to find the information of a person, using his/her name as a search key, even if it has deformations. We present a similarity function that is an edit distance function with costs based on the probabilities of the edit operations but depending on the involved letters and their position. The distance threshold varies with the length of the searched name. The evaluation of the efficacy of approximate matching methods is usually done by subjective relevance judgements. An objective comparison of five methods, reveals that the proposed function highly improves the efficacy: for a recall of 94%, a fallout of 0.2% is obtained.
  • Konferenzbeitrag
    Ontology-driven discourse analysis in GenIE
    (Natural language processing and information systems, 2003) Cimiano, Philip
  • Konferenzbeitrag
    Assessing the effectiveness of the DAML ontologies for the semantic web
    (Natural language processing and information systems, 2003) Burton-Jones, Andrew; Storey, Veda C.; Sugumaran, Vijayan; Ahluwalia, Punit
    The continued growth of the World Wide Web makes the retrieval of relevant information for a user's query increasingly difficult. Current search engines provide the user with many web pages, but varying levels of relevancy. In response, the Semantic Web has been proposed to retrieve and use more semantic information from the web. Our prior research has developed an intelligent agent to automate the processing of a user's query while taking into account the query's context. The intelligent agent uses WordNet and the DARPA Agent Markup Language (DAML) ontologies to act as surrogates for understanding the context of terms in a user's query. This research develops a set of syntactic, semantic, and pragmatic constructs to assess the effectiveness of the DAML ontologies so that the intelligent agent can select the most useful ontologies. These constructs have been implemented in a tool called the "Ontology Auditor" for use by the intelligent agent.
  • Konferenzbeitrag
    Information extraction for reorganizing specifications
    (Natural language processing and information systems, 2003) Thirunarayan, Krishnaprasad; Berkovich, Aaron; Grace, Steve; Sokol, Dan
  • Konferenzbeitrag
    Natural language interaction using a scalable reference dictionary
    (Natural language processing and information systems, 2003) Boonjing, Veera; Hsu, Cheng
    A truly natural language interface needs to be feasible for actual implementation. We developed such a new approach for database query and tested it successfully in a laboratory environment. The new result is based on metadata search, where the metadata grow in largely linear manner and the search is linguistics-free (allowing for grammatically incorrect and incomplete input). A new class of reference dictionary integrates four types of enterprise metadata: enterprise information models, database values, user-words, and query cases using an ontology-based meta-structure. The layered information models allow user-words to stay in original forms as users articulated them, as opposed to relying on permutations of individual words contained in the natural input. These properties make the approach scalable to the number of users and the size of the database. A graphical representation method turns the dictionary into searchable graphs representing all possible interpretations of the input. A branch-and-bound algorithm then identifies optimal interpretations, which lead to SQL implementation of the original queries. Query cases enhance the metadata and the search of metadata, as well as provide case-based reasoning to directly answer the queries. This design assures feasible solutions at the termination of the search - i.e., the results always contain the correct answer.
  • Konferenzbeitrag
    NLIDB templates for semantic parsing
    (Natural language processing and information systems, 2003) Stratica, Niculae; Kosseim, Leila; Desai, Bipin C.
    In this paper, we present our work in building a template-based system for translating English sentences into SQL queries for a relational database system. The input sentences are syntactically parsed using the Link Parser, and semantically parsed through the use of domain-specific templates. The system is composed of a pre-processor and a run-time module. The pre-processor builds a conceptual knowledge base from the database schema using WordNet. The knowledge base is then used at run-time to semantically parse the input and create the SQL query. The system is meant to be domain independent and has been tested with the Cindi database that contains information on a virtual library.
  • Konferenzbeitrag
    On detection of malapropisms by multistage collocation testing
    (Natural language processing and information systems, 2003) Bolshakov, Igor A.; Gelbukh, Alexander
    Malapropism is a (real-word) error in a text consisting in unintended replacement of one content word by another existing content word similar in sound but semantically incompatible with the context and thus destructing text cohesion, e.g.: they travel around the word. We present an algorithm of malapropism detection and correction based on evaluating the cohesion. As a measure of semantic compatibility of words we consider their ability to form syntactically linked and semantically admissible word combinations (collocations), e.g: travel (around the) world. With this, text cohesion at a content word is measured as the number of collocations it forms with the words in its immediate context. We detect malapropisms as words forming no collocations in the context. To test whether two words can form a collocation, we consider two types of resources: a collocation DB and an Internet search engine, e.g., Google. We illustrate the proposed method by classifying, tracing, and evaluating several English malapropisms.