Highlighting Document's Structure

In this paper, we present symbolic recognition models to extract knowledge characterized by document structures. Focussing on the extraction and the meticulous exploitation of the semantic structure of documents, we obtain a meaningful contextual tagging corresponding to different unit types (title, chapter, section, enumeration, etc.).

Information Retrieval in the Semantic LIFE Personal Digital Memory Framework

Ever increasing capacities of contemporary storage devices inspire the vision to accumulate (personal) information without the need of deleting old data over a long time-span. Hence the target of SemanticLIFE project is to create a Personal Information Management system for a human lifetime data. One of the most important characteristics of the system is its dedication to retrieve information in a very efficient way. By adopting user demands regarding the reduction of ambiguities, our approach aims at a user-oriented and yet powerful enough system with a satisfactory query performance. We introduce the query system of SemanticLIFE, the Virtual Query System, which uses emerging Semantic Web technologies to fulfill users- requirements.

Research on the Relevance Feedback-based Image Retrieval in Digital Library

In recent years, the relevance feedback technology is regarded in content-based image retrieval. This paper suggests a neural networks feedback algorithm based on the radial basis function, coming to extract the semantic character of image. The results of experiment indicated that the performance of this relevance feedback is better than the feedback algorithm based on Single-RBF.

A New Fast Skin Color Detection Technique

Skin color can provide a useful and robust cue for human-related image analysis, such as face detection, pornographic image filtering, hand detection and tracking, people retrieval in databases and Internet, etc. The major problem of such kinds of skin color detection algorithms is that it is time consuming and hence cannot be applied to a real time system. To overcome this problem, we introduce a new fast technique for skin detection which can be applied in a real time system. In this technique, instead of testing each image pixel to label it as skin or non-skin (as in classic techniques), we skip a set of pixels. The reason of the skipping process is the high probability that neighbors of the skin color pixels are also skin pixels, especially in adult images and vise versa. The proposed method can rapidly detect skin and non-skin color pixels, which in turn dramatically reduce the CPU time required for the protection process. Since many fast detection techniques are based on image resizing, we apply our proposed pixel skipping technique with image resizing to obtain better results. The performance evaluation of the proposed skipping and hybrid techniques in terms of the measured CPU time is presented. Experimental results demonstrate that the proposed methods achieve better result than the relevant classic method.

A Weighted-Profiling Using an Ontology Basefor Semantic-Based Search

The information on the Web increases tremendously. A number of search engines have been developed for searching Web information and retrieving relevant documents that satisfy the inquirers needs. Search engines provide inquirers irrelevant documents among search results, since the search is text-based rather than semantic-based. Information retrieval research area has presented a number of approaches and methodologies such as profiling, feedback, query modification, human-computer interaction, etc for improving search results. Moreover, information retrieval has employed artificial intelligence techniques and strategies such as machine learning heuristics, tuning mechanisms, user and system vocabularies, logical theory, etc for capturing user's preferences and using them for guiding the search based on the semantic analysis rather than syntactic analysis. Although a valuable improvement has been recorded on search results, the survey has shown that still search engines users are not really satisfied with their search results. Using ontologies for semantic-based searching is likely the key solution. Adopting profiling approach and using ontology base characteristics, this work proposes a strategy for finding the exact meaning of the query terms in order to retrieve relevant information according to user needs. The evaluation of conducted experiments has shown the effectiveness of the suggested methodology and conclusion is presented.

Optimal Document Archiving and Fast Information Retrieval

In this paper, an intelligent algorithm for optimal document archiving is presented. It is kown that electronic archives are very important for information system management. Minimizing the size of the stored data in electronic archive is a main issue to reduce the physical storage area. Here, the effect of different types of Arabic fonts on electronic archives size is discussed. Simulation results show that PDF is the best file format for storage of the Arabic documents in electronic archive. Furthermore, fast information detection in a given PDF file is introduced. Such approach uses fast neural networks (FNNs) implemented in the frequency domain. The operation of these networks relies on performing cross correlation in the frequency domain rather than spatial one. It is proved mathematically and practically that the number of computation steps required for the presented FNNs is less than that needed by conventional neural networks (CNNs). Simulation results using MATLAB confirm the theoretical computations.

Content-based Retrieval of Medical Images

With the advance of multimedia and diagnostic images technologies, the number of radiographic images is increasing constantly. The medical field demands sophisticated systems for search and retrieval of the produced multimedia document. This paper presents an ongoing research that focuses on the semantic content of radiographic image documents to facilitate semantic-based radiographic image indexing and a retrieval system. The proposed model would divide a radiographic image document, based on its semantic content, and would be converted into a logical structure or a semantic structure. The logical structure represents the overall organization of information. The semantic structure, which is bound to logical structure, is composed of semantic objects with interrelationships in the various spaces in the radiographic image.

Using Dempster-Shafer Theory in XML Information Retrieval

XML is a markup language which is becoming the standard format for information representation and data exchange. A major purpose of XML is the explicit representation of the logical structure of a document. Much research has been performed to exploit logical structure of documents in information retrieval in order to precisely extract user information need from large collections of XML documents. In this paper, we describe an XML information retrieval weighting scheme that tries to find the most relevant elements in XML documents in response to a user query. We present this weighting model for information retrieval systems that utilize plausible inferences to infer the relevance of elements in XML documents. We also add to this model the Dempster-Shafer theory of evidence to express the uncertainty in plausible inferences and Dempster-Shafer rule of combination to combine evidences derived from different inferences.

Exploring Performance-Based Music Attributes for Stylometric Analysis

Music Information Retrieval (MIR) and modern data mining techniques are applied to identify style markers in midi music for stylometric analysis and author attribution. Over 100 attributes are extracted from a library of 2830 songs then mined using supervised learning data mining techniques. Two attributes are identified that provide high informational gain. These attributes are then used as style markers to predict authorship. Using these style markers the authors are able to correctly distinguish songs written by the Beatles from those that were not with a precision and accuracy of over 98 per cent. The identification of these style markers as well as the architecture for this research provides a foundation for future research in musical stylometry.

SIFT Accordion: A Space-Time Descriptor Applied to Human Action Recognition

Recognizing human action from videos is an active field of research in computer vision and pattern recognition. Human activity recognition has many potential applications such as video surveillance, human machine interaction, sport videos retrieval and robot navigation. Actually, local descriptors and bag of visuals words models achieve state-of-the-art performance for human action recognition. The main challenge in features description is how to represent efficiently the local motion information. Most of the previous works focus on the extension of 2D local descriptors on 3D ones to describe local information around every interest point. In this paper, we propose a new spatio-temporal descriptor based on a spacetime description of moving points. Our description is focused on an Accordion representation of video which is well-suited to recognize human action from 2D local descriptors without the need to 3D extensions. We use the bag of words approach to represent videos. We quantify 2D local descriptor describing both temporal and spatial features with a good compromise between computational complexity and action recognition rates. We have reached impressive results on publicly available action data set

Object Recognition in Color Images by the Self Configuring System MEMORI

System MEMORI automatically detects and recognizes rotated and/or rescaled versions of the objects of a database within digital color images with cluttered background. This task is accomplished by means of a region grouping algorithm guided by heuristic rules, whose parameters concern some geometrical properties and the recognition score of the database objects. This paper focuses on the strategies implemented in MEMORI for the estimation of the heuristic rule parameters. This estimation, being automatic, makes the system a self configuring and highly user-friendly tool.

Information Retrieval: Improving Question Answering Systems by Query Reformulation and Answer Validation

Question answering (QA) aims at retrieving precise information from a large collection of documents. Most of the Question Answering systems composed of three main modules: question processing, document processing and answer processing. Question processing module plays an important role in QA systems to reformulate questions. Moreover answer processing module is an emerging topic in QA systems, where these systems are often required to rank and validate candidate answers. These techniques aiming at finding short and precise answers are often based on the semantic relations and co-occurrence keywords. This paper discussed about a new model for question answering which improved two main modules, question processing and answer processing which both affect on the evaluation of the system operations. There are two important components which are the bases of the question processing. First component is question classification that specifies types of question and answer. Second one is reformulation which converts the user's question into an understandable question by QA system in a specific domain. The objective of an Answer Validation task is thus to judge the correctness of an answer returned by a QA system, according to the text snippet given to support it. For validating answers we apply candidate answer filtering, candidate answer ranking and also it has a final validation section by user voting. Also this paper described new architecture of question and answer processing modules with modeling, implementing and evaluating the system. The system differs from most question answering systems in its answer validation model. This module makes it more suitable to find exact answer. Results show that, from total 50 asked questions, evaluation of the model, show 92% improving the decision of the system.

Learning Styles of University Students in Bangkok: The Characteristics and the Relevant Instructional Context

The purposes of this study are 1) to identify learning styles of university students in Bangkok, and 2) to study the frequency of the relevant instructional context of the identified learning styles. Learning Styles employed in this study are those of Honey and Mumford, which include 1) Reflectors, 2) Theorists, 3) Pragmatists, and 4) Activists. The population comprises 1383 students and 5 lecturers. Research tools are 2 questionnaires – one used for identifying students- learning styles, and the other used for identifying the frequency of the relevant instructional context of the identified learning styles. The research findings reveal that 32.30 percent - are Activists, while 28.10 percent are Theorists, 20.10 are Reflectors, and 19.50 are Pragmatists. In terms of the relevant instructional context of the identified 4 learning styles, it is found that the frequency level of the instructional context is totally in high level. Moreover, 2 lists of the context being conducted most frequently are 'Lead'in activity to review background knowledge,- and 'Information retrieval report.' And these two activities serve the learning styles of theorists and activists. It is, therefore, suggested that more instructional context supporting the activists, the majority of the population, learning best by doing, as well as emotional learning situation should be added.

Signed Approach for Mining Web Content Outliers

The emergence of the Internet has brewed the revolution of information storage and retrieval. As most of the data in the web is unstructured, and contains a mix of text, video, audio etc, there is a need to mine information to cater to the specific needs of the users without loss of important hidden information. Thus developing user friendly and automated tools for providing relevant information quickly becomes a major challenge in web mining research. Most of the existing web mining algorithms have concentrated on finding frequent patterns while neglecting the less frequent ones that are likely to contain outlying data such as noise, irrelevant and redundant data. This paper mainly focuses on Signed approach and full word matching on the organized domain dictionary for mining web content outliers. This Signed approach gives the relevant web documents as well as outlying web documents. As the dictionary is organized based on the number of characters in a word, searching and retrieval of documents takes less time and less space.

Classification of Acoustic Emission Based Partial Discharge in Oil Pressboard Insulation System Using Wavelet Analysis

Insulation used in transformer is mostly oil pressboard insulation. Insulation failure is one of the major causes of catastrophic failure of transformers. It is established that partial discharges (PD) cause insulation degradation and premature failure of insulation. Online monitoring of PDs can reduce the risk of catastrophic failure of transformers. There are different techniques of partial discharge measurement like, electrical, optical, acoustic, opto-acoustic and ultra high frequency (UHF). Being non invasive and non interference prone, acoustic emission technique is advantageous for online PD measurement. Acoustic detection of p.d. is based on the retrieval and analysis of mechanical or pressure signals produced by partial discharges. Partial discharges are classified according to the origin of discharges. Their effects on insulation deterioration are different for different types. This paper reports experimental results and analysis for classification of partial discharges using acoustic emission signal of laboratory simulated partial discharges in oil pressboard insulation system using three different electrode systems. Acoustic emission signal produced by PD are detected by sensors mounted on the experimental tank surface, stored on an oscilloscope and fed to computer for further analysis. The measured AE signals are analyzed using discrete wavelet transform analysis and wavelet packet analysis. Energy distribution in different frequency bands of discrete wavelet decomposed signal and wavelet packet decomposed signal is calculated. These analyses show a distinct feature useful for PD classification. Wavelet packet analysis can sort out any misclassification arising out of DWT in most cases.

Modeling Peer-to-Peer Networks with Interest-Based Clusters

In the world of Peer-to-Peer (P2P) networking different protocols have been developed to make the resource sharing or information retrieval more efficient. The SemPeer protocol is a new layer on Gnutella that transforms the connections of the nodes based on semantic information to make information retrieval more efficient. However, this transformation causes high clustering in the network that decreases the number of nodes reached, therefore the probability of finding a document is also decreased. In this paper we describe a mathematical model for the Gnutella and SemPeer protocols that captures clustering-related issues, followed by a proposition to modify the SemPeer protocol to achieve moderate clustering. This modification is a sort of link management for the individual nodes that allows the SemPeer protocol to be more efficient, because the probability of a successful query in the P2P network is reasonably increased. For the validation of the models, we evaluated a series of simulations that supported our results.

Effective Digital Music Retrieval System through Content-based Features

In this paper, we propose effective system for digital music retrieval. We divided proposed system into Client and Server. Client part consists of pre-processing and Content-based feature extraction stages. In pre-processing stage, we minimized Time code Gap that is occurred among same music contents. As content-based feature, first-order differentiated MFCC were used. These presented approximately envelop of music feature sequences. Server part included Music Server and Music Matching stage. Extracted features from 1,000 digital music files were stored in Music Server. In Music Matching stage, we found retrieval result through similarity measure by DTW. In experiment, we used 450 queries. These were made by mixing different compression standards and sound qualities from 50 digital music files. Retrieval accurate indicated 97% and retrieval time was average 15ms in every single query. Out experiment proved that proposed system is effective in retrieve digital music and robust at various user environments of web.

Application of l1-Norm Minimization Technique to Image Retrieval

Image retrieval is a topic where scientific interest is currently high. The important steps associated with image retrieval system are the extraction of discriminative features and a feasible similarity metric for retrieving the database images that are similar in content with the search image. Gabor filtering is a widely adopted technique for feature extraction from the texture images. The recently proposed sparsity promoting l1-norm minimization technique finds the sparsest solution of an under-determined system of linear equations. In the present paper, the l1-norm minimization technique as a similarity metric is used in image retrieval. It is demonstrated through simulation results that the l1-norm minimization technique provides a promising alternative to existing similarity metrics. In particular, the cases where the l1-norm minimization technique works better than the Euclidean distance metric are singled out.

Automatic Building an Extensive Arabic FA Terms Dictionary

Field Association (FA) terms are a limited set of discriminating terms that give us the knowledge to identify document fields which are effective in document classification, similar file retrieval and passage retrieval. But the problem lies in the lack of an effective method to extract automatically relevant Arabic FA Terms to build a comprehensive dictionary. Moreover, all previous studies are based on FA terms in English and Japanese, and the extension of FA terms to other language such Arabic could be definitely strengthen further researches. This paper presents a new method to extract, Arabic FA Terms from domain-specific corpora using part-of-speech (POS) pattern rules and corpora comparison. Experimental evaluation is carried out for 14 different fields using 251 MB of domain-specific corpora obtained from Arabic Wikipedia dumps and Alhyah news selected average of 2,825 FA Terms (single and compound) per field. From the experimental results, recall and precision are 84% and 79% respectively. Therefore, this method selects higher number of relevant Arabic FA Terms at high precision and recall.

Response Quality Evaluation in Heterogeneous Question Answering System: A Black-box Approach

The evaluation of the question answering system is a major research area that needs much attention. Before the rise of domain-oriented question answering systems based on natural language understanding and reasoning, evaluation is never a problem as information retrieval-based metrics are readily available for use. However, when question answering systems began to be more domains specific, evaluation becomes a real issue. This is especially true when understanding and reasoning is required to cater for a wider variety of questions and at the same time achieve higher quality responses The research in this paper discusses the inappropriateness of the existing measure for response quality evaluation and in a later part, the call for new standard measures and the related considerations are brought forward. As a short-term solution for evaluating response quality of heterogeneous systems, and to demonstrate the challenges in evaluating systems of different nature, this research presents a black-box approach using observation, classification scheme and a scoring mechanism to assess and rank three example systems (i.e. AnswerBus, START and NaLURI).