Continuous Text Translation Using Text Modeling in the Thetos System

In the paper a method of modeling text for Polish is discussed. The method is aimed at transforming continuous input text into a text consisting of sentences in so called canonical form, whose characteristic is, among others, a complete structure as well as no anaphora or ellipses. The transformation is lossless as to the content of text being transformed. The modeling method has been worked out for the needs of the Thetos system, which translates Polish written texts into the Polish sign language. We believe that the method can be also used in various applications that deal with the natural language, e.g. in a text summary generator for Polish.

Event Information Extraction System (EIEE): FSM vs HMM

Automatic Extraction of Event information from social text stream (emails, social network sites, blogs etc) is a vital requirement for many applications like Event Planning and Management systems and security applications. The key information components needed from Event related text are Event title, location, participants, date and time. Emails have very unique distinctions over other social text streams from the perspective of layout and format and conversation style and are the most commonly used communication channel for broadcasting and planning events. Therefore we have chosen emails as our dataset. In our work, we have employed two statistical NLP methods, named as Finite State Machines (FSM) and Hidden Markov Model (HMM) for the extraction of event related contextual information. An application has been developed providing a comparison among the two methods over the event extraction task. It comprises of two modules, one for each method, and works for both bulk as well as direct user input. The results are evaluated using Precision, Recall and F-Score. Experiments show that both methods produce high performance and accuracy, however HMM was good enough over Title extraction and FSM proved to be better for Venue, Date, and time.

A New Fuzzy Mathematical Model in Recycling Collection Networks: A Possibilistic Approach

Focusing on the environmental issues, including the reduction of scrap and consumer residuals, along with the benefiting from the economic value during the life cycle of goods/products leads the companies to have an important competitive approach. The aim of this paper is to present a new mixed nonlinear facility locationallocation model in recycling collection networks by considering multi-echelon, multi-suppliers, multi-collection centers and multifacilities in the recycling network. To make an appropriate decision in reality, demands, returns, capacities, costs and distances, are regarded uncertain in our model. For this purpose, a fuzzy mathematical programming-based possibilistic approach is introduced as a solution methodology from the recent literature to solve the proposed mixed-nonlinear programming model (MNLP). The computational experiments are provided to illustrate the applicability of the designed model in a supply chain environment and to help the decision makers to facilitate their analysis.

Named Entity Recognition using Support Vector Machine: A Language Independent Approach

Named Entity Recognition (NER) aims to classify each word of a document into predefined target named entity classes and is now-a-days considered to be fundamental for many Natural Language Processing (NLP) tasks such as information retrieval, machine translation, information extraction, question answering systems and others. This paper reports about the development of a NER system for Bengali and Hindi using Support Vector Machine (SVM). Though this state of the art machine learning technique has been widely applied to NER in several well-studied languages, the use of this technique to Indian languages (ILs) is very new. The system makes use of the different contextual information of the words along with the variety of features that are helpful in predicting the four different named (NE) classes, such as Person name, Location name, Organization name and Miscellaneous name. We have used the annotated corpora of 122,467 tokens of Bengali and 502,974 tokens of Hindi tagged with the twelve different NE classes 1, defined as part of the IJCNLP-08 NER Shared Task for South and South East Asian Languages (SSEAL) 2. In addition, we have manually annotated 150K wordforms of the Bengali news corpus, developed from the web-archive of a leading Bengali newspaper. We have also developed an unsupervised algorithm in order to generate the lexical context patterns from a part of the unlabeled Bengali news corpus. Lexical patterns have been used as the features of SVM in order to improve the system performance. The NER system has been tested with the gold standard test sets of 35K, and 60K tokens for Bengali, and Hindi, respectively. Evaluation results have demonstrated the recall, precision, and f-score values of 88.61%, 80.12%, and 84.15%, respectively, for Bengali and 80.23%, 74.34%, and 77.17%, respectively, for Hindi. Results show the improvement in the f-score by 5.13% with the use of context patterns. Statistical analysis, ANOVA is also performed to compare the performance of the proposed NER system with that of the existing HMM based system for both the languages.

A Study on Removal of Toluidine Blue Dye from Aqueous Solution by Adsorption onto Neem Leaf Powder

Adsorption of Toluidine blue dye from aqueous solutions onto Neem Leaf Powder (NLP) has been investigated. The surface characterization of this natural material was examined by Particle size analysis, Scanning Electron Microscopy (SEM), Fourier Transform Infrared (FTIR) spectroscopy and X-Ray Diffraction (XRD). The effects of process parameters such as initial concentration, pH, temperature and contact duration on the adsorption capacities have been evaluated, in which pH has been found to be most effective parameter among all. The data were analyzed using the Langmuir and Freundlich for explaining the equilibrium characteristics of adsorption. And kinetic models like pseudo first- order, second-order model and Elovich equation were utilized to describe the kinetic data. The experimental data were well fitted with Langmuir adsorption isotherm model and pseudo second order kinetic model. The thermodynamic parameters, such as Free energy of adsorption (AG"), enthalpy change (AH') and entropy change (ASĀ°) were also determined and evaluated.

Multi-View Neural Network Based Gait Recognition

Human identification at a distance has recently gained growing interest from computer vision researchers. Gait recognition aims essentially to address this problem by identifying people based on the way they walk [1]. Gait recognition has 3 steps. The first step is preprocessing, the second step is feature extraction and the third one is classification. This paper focuses on the classification step that is essential to increase the CCR (Correct Classification Rate). Multilayer Perceptron (MLP) is used in this work. Neural Networks imitate the human brain to perform intelligent tasks [3].They can represent complicated relationships between input and output and acquire knowledge about these relationships directly from the data [2]. In this paper we apply MLP NN for 11 views in our database and compare the CCR values for these views. Experiments are performed with the NLPR databases, and the effectiveness of the proposed method for gait recognition is demonstrated.

Data Extraction of XML Files using Searching and Indexing Techniques

XML files contain data which is in well formatted manner. By studying the format or semantics of the grammar it will be helpful for fast retrieval of the data. There are many algorithms which describes about searching the data from XML files. There are no. of approaches which uses data structure or are related to the contents of the document. In these cases user must know about the structure of the document and information retrieval techniques using NLPs is related to content of the document. Hence the result may be irrelevant or not so successful and may take more time to search.. This paper presents fast XML retrieval techniques by using new indexing technique and the concept of RXML. When indexing an XML document, the system takes into account both the document content and the document structure and assigns the value to each tag from file. To query the system, a user is not constrained about fixed format of query.

A Fuzzy Mixed Integer Multi-Scenario Portfolio Optimization Model

In this paper, we propose a multiple objective optimization model with respect to portfolio selection problem for investors looking forward to diversify their equity investments in a number of equity markets. Based on Markowitz-s M-V model we developed a Fuzzy Mixed Integer Multi-Objective Nonlinear Programming Problem (FMIMONLP) to maximize the investors- future gains on equity markets, reach the optimal proportion of the budget to be invested in different equities. A numerical example with a comprehensive analysis on artificial data from several equity markets is presented in order to illustrate the proposed model and its solution method. The model performed well compared with the deterministic version of the model.