Concept Indexing using Ontology and Supervised Machine Learning

Nowadays, ontologies are the only widely accepted paradigm for the management of sharable and reusable knowledge in a way that allows its automatic interpretation. They are collaboratively created across the Web and used to index, search and annotate documents. The vast majority of the ontology based approaches, however, focus on indexing texts at document level. Recently, with the advances in ontological engineering, it became clear that information indexing can largely benefit from the use of general purpose ontologies which aid the indexing of documents at word level. This paper presents a concept indexing algorithm, which adds ontology information to words and phrases and allows full text to be searched, browsed and analyzed at different levels of abstraction. This algorithm uses a general purpose ontology, OntoRo, and an ontologically tagged corpus, OntoCorp, both developed for the purpose of this research. OntoRo and OntoCorp are used in a two-stage supervised machine learning process aimed at generating ontology tagging rules. The first experimental tests show a tagging accuracy of 78.91% which is encouraging in terms of the further improvement of the algorithm.

Tagging by Combining Rules- Based Method and Memory-Based Learning

Many natural language expressions are ambiguous, and need to draw on other sources of information to be interpreted. Interpretation of the e word تعاون to be considered as a noun or a verb depends on the presence of contextual cues. To interpret words we need to be able to discriminate between different usages. This paper proposes a hybrid of based- rules and a machine learning method for tagging Arabic words. The particularity of Arabic word that may be composed of stem, plus affixes and clitics, a small number of rules dominate the performance (affixes include inflexional markers for tense, gender and number/ clitics include some prepositions, conjunctions and others). Tagging is closely related to the notion of word class used in syntax. This method is based firstly on rules (that considered the post-position, ending of a word, and patterns), and then the anomaly are corrected by adopting a memory-based learning method (MBL). The memory_based learning is an efficient method to integrate various sources of information, and handling exceptional data in natural language processing tasks. Secondly checking the exceptional cases of rules and more information is made available to the learner for treating those exceptional cases. To evaluate the proposed method a number of experiments has been run, and in order, to improve the importance of the various information in learning.

Housing Defect of Newly Completed House: An Analysis Using Condition Survey Protocol (CSP) 1 Matrix

Housing is a basic human right. The provision of new house shall be free from any defects, even for the defects that people do normally considered as 'cosmetic defects'. This paper studies about the building defects of newly completed house of 72 unit of double-storey terraced located in Bangi, Selangor. The building survey implemented using protocol 1 (visual inspection). As for new house, the survey work is very stringent in determining the defects condition and priority. Survey and reporting procedure is carried out based on CSP1 Matrix that involved scoring system, photographs and plan tagging. The analysis is done using Statistical Package for Social Sciences (SPSS). The finding reveals that there are 2119 defects recorded in 72 terraced houses. The cumulative score obtained was 27644 while the overall rating is 13.05. These results indicate that the construction quality of the newly terraced houses is low and not up to an acceptable standard as the new house should be.