Named Entity Recognition using Support Vector Machine: A Language Independent Approach

Named Entity Recognition (NER) aims to classify each word of a document into predefined target named entity classes and is now-a-days considered to be fundamental for many Natural Language Processing (NLP) tasks such as information retrieval, machine translation, information extraction, question answering systems and others. This paper reports about the development of a NER system for Bengali and Hindi using Support Vector Machine (SVM). Though this state of the art machine learning technique has been widely applied to NER in several well-studied languages, the use of this technique to Indian languages (ILs) is very new. The system makes use of the different contextual information of the words along with the variety of features that are helpful in predicting the four different named (NE) classes, such as Person name, Location name, Organization name and Miscellaneous name. We have used the annotated corpora of 122,467 tokens of Bengali and 502,974 tokens of Hindi tagged with the twelve different NE classes 1, defined as part of the IJCNLP-08 NER Shared Task for South and South East Asian Languages (SSEAL) 2. In addition, we have manually annotated 150K wordforms of the Bengali news corpus, developed from the web-archive of a leading Bengali newspaper. We have also developed an unsupervised algorithm in order to generate the lexical context patterns from a part of the unlabeled Bengali news corpus. Lexical patterns have been used as the features of SVM in order to improve the system performance. The NER system has been tested with the gold standard test sets of 35K, and 60K tokens for Bengali, and Hindi, respectively. Evaluation results have demonstrated the recall, precision, and f-score values of 88.61%, 80.12%, and 84.15%, respectively, for Bengali and 80.23%, 74.34%, and 77.17%, respectively, for Hindi. Results show the improvement in the f-score by 5.13% with the use of context patterns. Statistical analysis, ANOVA is also performed to compare the performance of the proposed NER system with that of the existing HMM based system for both the languages.

Hardiness vs Alienation Personality Construct Essentially Explains Burnout Proclivity and Erroneous Computer Entry Problems in Rural Hellenic Hospital Labs

Erroneous computer entry problems [here: 'e'errors] in hospital labs threaten the patients-–health carers- relationship, undermining the health system credibility. Are e-errors random, and do lab professionals make them accidentally, or may they be traced through meaningful determinants? Theories on internal causality of mistakes compel to seek specific causal ascriptions of hospital lab eerrors instead of accepting some inescapability. Undeniably, 'To Err is Human'. But in view of rapid global health organizational changes, e-errors are too expensive to lack in-depth considerations. Yet, that efunction might supposedly be entrenched in the health carers- job description remains under dispute – at least for Hellenic labs, where e-use falls behind generalized(able) appreciation and application. In this study: i) an empirical basis of a truly high annual cost of e-errors at about €498,000.00 per rural Hellenic hospital was established, hence interest in exploring the issue was sufficiently substantiated; ii) a sample of 270 lab-expert nurses, technicians and doctors were assessed on several personality, burnout and e-error measures, and iii) the hypothesis that the Hardiness vs Alienation personality construct disposition explains resistance vs proclivity to e-errors was tested and verified: Hardiness operates as a resilience source in the encounter of high pressures experienced in the hospital lab, whereas its 'opposite', i.e., Alienation, functions as a predictor, not only of making e-errors, but also of leading to burn-out. Implications for apt interventions are discussed.

Extending the Conceptual Neighborhood Graph of the Relations for the Semantic Adaptation of Multimedia Documents

The recent developments in computing and communication technology permit to users to access multimedia documents with variety of devices (PCs, PDAs, mobile phones...) having heterogeneous capabilities. This diversification of supports has trained the need to adapt multimedia documents according to their execution contexts. A semantic framework for multimedia document adaptation based on the conceptual neighborhood graphs was proposed. In this framework, adapting consists on finding another specification that satisfies the target constraints and which is as close as possible from the initial document. In this paper, we propose a new way of building the conceptual neighborhood graphs to best preserve the proximity between the adapted and the original documents and to deal with more elaborated relations models by integrating the relations relaxation graphs that permit to handle the delays and the distances defined within the relations.

Effect of Real Wastewater on Biotransformation of 17α-ethynylestradiol by Ammonia-Oxidizing Bacteria in Nitrifying Activated Sludge

17α-ethynylestradiol (EE2) is a synthetic estrogen used as a key ingredient in an oral contraceptives pill. EE2 is an endocrine disrupting compound, high in estrogenic potency. Although EE2 exhibits low degree of biodegradability with common microorganisms in wastewater treatment plants (WWTPs), this compound can be biotransformed by ammonia-oxidizing bacteria (AOB) via a co-metabolism mechanism in WWTPs. This study aimed to investigate the effect of real wastewater on biotransformation of EE2 by AOB. A preliminary experiment on the effect of nitrite and pH levels on abiotic transformation of EE2 suggested that the abiotic transformation occurred at only pH

Empirical Evidence on Equity Valuation of Thai Firms

This study aims at providing empirical evidence on a comparison of two equity valuation models: (1) the dividend discount model (DDM) and (2) the residual income model (RIM), in estimating equity values of Thai firms during 1995-2004. Results suggest that DDM and RIM underestimate equity values of Thai firms and that RIM outperforms DDM in predicting cross-sectional stock prices. Results on regression of cross-sectional stock prices on the decomposed DDM and RIM equity values indicate that book value of equity provides the greatest incremental explanatory power, relative to other components in DDM and RIM terminal values, suggesting that book value distortions resulting from accounting procedures and choices are less severe than forecast and measurement errors in discount rates and growth rates. We also document that the incremental explanatory power of book value of equity during 1998-2004, representing the information environment under Thai Accounting Standards reformed after the 1997 economic crisis to conform to International Accounting Standards, is significantly greater than that during 1995-1996, representing the information environment under the pre-reformed Thai Accounting Standards. This implies that the book value distortions are less severe under the 1997 Reformed Thai Accounting Standards than the pre-reformed Thai Accounting Standards.

Identification of Most Frequently Occurring Lexis in Winnings-announcing Unsolicited Bulke-mails

e-mail has become an important means of electronic communication but the viability of its usage is marred by Unsolicited Bulk e-mail (UBE) messages. UBE consists of many types like pornographic, virus infected and 'cry-for-help' messages as well as fake and fraudulent offers for jobs, winnings and medicines. UBE poses technical and socio-economic challenges to usage of e-mails. To meet this challenge and combat this menace, we need to understand UBE. Towards this end, the current paper presents a content-based textual analysis of nearly 3000 winnings-announcing UBE. Technically, this is an application of Text Parsing and Tokenization for an un-structured textual document and we approach it using Bag Of Words (BOW) and Vector Space Document Model techniques. We have attempted to identify the most frequently occurring lexis in the winnings-announcing UBE documents. The analysis of such top 100 lexis is also presented. We exhibit the relationship between occurrence of a word from the identified lexisset in the given UBE and the probability that the given UBE will be the one announcing fake winnings. To the best of our knowledge and survey of related literature, this is the first formal attempt for identification of most frequently occurring lexis in winningsannouncing UBE by its textual analysis. Finally, this is a sincere attempt to bring about alertness against and mitigate the threat of such luring but fake UBE.

Association of Selected Biochemical Markers and Body Mass Index in Women with Endocrine Disorders

Obesity is frequent attendant phenomenon of patients with endocrinological disease. Between BMI and endocrinological diseases is close correlation. In thesis we focused on the allocation of hormone concentration – PTH and TSH, CHOL a mineral element Ca in a blood serum. The examined group was formed by 100 respondents (women) aged 36 – 83 years, who were divided into two groups – control group (CG), group with diagnosed endocrine disease (DED). The concentration of PTH and TSH, Ca and CHOL was measured through the medium of analyzers Cobas e411 (Japan); Cobas Integra 400 (Switzerland). At individuals was measured body weight as well as stature and thereupon from those data we enumerated BMI. On the basis of Student T-test in biochemical parameter of PTH and Ca we found out significantly meaningful difference (p

Applications of Artificial Neural Network to Building Statistical Models for Qualifying and Indexing Radiation Treatment Plans

The main goal in this paper is to quantify the quality of different techniques for radiation treatment plans, a back-propagation artificial neural network (ANN) combined with biomedicine theory was used to model thirteen dosimetric parameters and to calculate two dosimetric indices. The correlations between dosimetric indices and quality of life were extracted as the features and used in the ANN model to make decisions in the clinic. The simulation results show that a trained multilayer back-propagation neural network model can help a doctor accept or reject a plan efficiently. In addition, the models are flexible and whenever a new treatment technique enters the market, the feature variables simply need to be imported and the model re-trained for it to be ready for use.

A Study of Filmmakers Interaction through Social Exchange Theory

Film, as an art form playing a vital role and is a powerful tool in documenting, influencing and shaping the society. Films are the collective creation of a large number of separate individuals, each contributing with creative input, unique talents, and technical expertise to the project. Recently, the Malaysian Independent (or “Indie") filmmakers have made their presence felt by winning awards at various international film festivals. Working in the digital video (DV) format, a number of independent filmmakers really hit their stride with a range of remarkably strong titles and international recognition has been quick in coming and their works are now regularly in exhibition or in competition, winning many top prizes at prestigious festivals around the world. The interaction factors among crewmembers are emphasized as imperative for group success. An in-depth interview is conducted to analyze the social interactions and exchanges between filmmakers through Social Exchanges Theory (SET). Certainly the new millennium that was marked as the digital technology revolution has changed the face of filmmaking in Malaysia. There is a clear need to study the Malaysian independent cinema especially from the perspective of understanding what causes the independent filmmakers to work so well given all of the difficulties and constraints.

Extended Study on Removing Gaussian Noise in Mechanical Engineering Drawing Images using Median Filters

In this paper, an extended study is performed on the effect of different factors on the quality of vector data based on a previous study. In the noise factor, one kind of noise that appears in document images namely Gaussian noise is studied while the previous study involved only salt-and-pepper noise. High and low levels of noise are studied. For the noise cleaning methods, algorithms that were not covered in the previous study are used namely Median filters and its variants. For the vectorization factor, one of the best available commercial raster to vector software namely VPstudio is used to convert raster images into vector format. The performance of line detection will be judged based on objective performance evaluation method. The output of the performance evaluation is then analyzed statistically to highlight the factors that affect vector quality.

Mining News Sites to Create Special Domain News Collections

We present a method to create special domain collections from news sites. The method only requires a single sample article as a seed. No prior corpus statistics are needed and the method is applicable to multiple languages. We examine various similarity measures and the creation of document collections for English and Japanese. The main contributions are as follows. First, the algorithm can build special domain collections from as little as one sample document. Second, unlike other algorithms it does not require a second “general" corpus to compute statistics. Third, in our testing the algorithm outperformed others in creating collections made up of highly relevant articles.

Localizing and Recognizing Integral Pitches of Cheque Document Images

Automatic reading of handwritten cheque is a computationally complex process and it plays an important role in financial risk management. Machine vision and learning provide a viable solution to this problem. Research effort has mostly been focused on recognizing diverse pitches of cheques and demand drafts with an identical outline. However most of these methods employ templatematching to localize the pitches and such schemes could potentially fail when applied to different types of outline maintained by the bank. In this paper, the so-called outline problem is resolved by a cheque information tree (CIT), which generalizes the localizing method to extract active-region-of-entities. In addition, the weight based density plot (WBDP) is performed to isolate text entities and read complete pitches. Recognition is based on texture features using neural classifiers. Legal amount is subsequently recognized by both texture and perceptual features. A post-processing phase is invoked to detect the incorrect readings by Type-2 grammar using the Turing machine. The performance of the proposed system was evaluated using cheque and demand drafts of 22 different banks. The test data consists of a collection of 1540 leafs obtained from 10 different account holders from each bank. Results show that this approach can easily be deployed without significant design amendments.

Connectionist Approach to Generic Text Summarization

As the enormous amount of on-line text grows on the World-Wide Web, the development of methods for automatically summarizing this text becomes more important. The primary goal of this research is to create an efficient tool that is able to summarize large documents automatically. We propose an Evolving connectionist System that is adaptive, incremental learning and knowledge representation system that evolves its structure and functionality. In this paper, we propose a novel approach for Part of Speech disambiguation using a recurrent neural network, a paradigm capable of dealing with sequential data. We observed that connectionist approach to text summarization has a natural way of learning grammatical structures through experience. Experimental results show that our approach achieves acceptable performance.

Semi-Automatic Approach for Semantic Annotation

The third phase of web means semantic web requires many web pages which are annotated with metadata. Thus, a crucial question is where to acquire these metadata. In this paper we propose our approach, a semi-automatic method to annotate the texts of documents and web pages and employs with a quite comprehensive knowledge base to categorize instances with regard to ontology. The approach is evaluated against the manual annotations and one of the most popular annotation tools which works the same as our tool. The approach is implemented in .net framework and uses the WordNet for knowledge base, an annotation tool for the Semantic Web.

IFDewey: A New Insert-Friendly Labeling Schemafor XML Data

XML has become a popular standard for information exchange via web. Each XML document can be presented as a rooted, ordered, labeled tree. The Node label shows the exact position of a node in the original document. Region and Dewey encoding are two famous methods of labeling trees. In this paper, we propose a new insert friendly labeling method named IFDewey based on recently proposed scheme, called Extended Dewey. In Extended Dewey many labels must be modified when a new node is inserted into the XML tree. Our method eliminates this problem by reserving even numbers for future insertion. Numbers generated by Extended Dewey may be even or odd. IFDewey modifies Extended Dewey so that only odd numbers are generated and even numbers can then be used for a much easier insertion of nodes.

Elimination of Redundant Links in Web Pages– Mathematical Approach

With the enormous growth on the web, users get easily lost in the rich hyper structure. Thus developing user friendly and automated tools for providing relevant information without any redundant links to the users to cater to their needs is the primary task for the website owners. Most of the existing web mining algorithms have concentrated on finding frequent patterns while neglecting the less frequent one that are likely to contain the outlying data such as noise, irrelevant and redundant data. This paper proposes new algorithm for mining the web content by detecting the redundant links from the web documents using set theoretical(classical mathematics) such as subset, union, intersection etc,. Then the redundant links is removed from the original web content to get the required information by the user..

Ethnobotany and Distribution of Wild Edible Tubers in Pulau Redang and Nearby Islands of Terengganu, Malaysia

An ethnobotanical study was conducted to document local knowledge and potentials of wild edible tubers that has been reported and sighted and to investigate and record their distribution in Pulau Redang and nearby islands of Terengganu, Malaysia. Information was gathered from 42 villagers by using semi-structured questionnaire. These respondents were selected randomly and no appointment was made prior to the visits. For distribution, the locations of wild edible tubers were recorded by using the Global Positioning System (GPS). The wild edible tubers recorded were ubi gadung, ubi toyo, ubi kasu, ubi jaga, ubi seratus and ubi kertas. Dioscorea or commonly known as yam is reported to be one of the major food sources worldwide. The majority of villagers used Dioscorea hispida Dennst. or ubi gadung in many ways in their life such as for food, medicinal purposes and fish poison. The villagers have identified this ubi gadung by looking at the morphological characteristics; that include leaf shape, stem and the color of the tuber-s flesh.

Sensory Evaluation of the Selected Coffee Products Using Fuzzy Approach

Knowing consumers' preferences and perceptions of the sensory evaluation of drink products are very significant to manufacturers and retailers alike. With no appropriate sensory analysis, there is a high risk of market disappointment. This paper aims to rank the selected coffee products and also to determine the best of quality attribute through sensory evaluation using fuzzy decision making model. Three products of coffee drinks were used for sensory evaluation. Data were collected from thirty judges at a hypermarket in Kuala Terengganu, Malaysia. The judges were asked to specify their sensory evaluation in linguistic terms of the quality attributes of colour, smell, taste and mouth feel for each product and also the weight of each quality attribute. Five fuzzy linguistic terms represent the quality attributes were introduced prior analysing. The judgment membership function and the weights were compared to rank the products and also to determine the best quality attribute. The product of Indoc was judged as the first in ranking and 'taste' as the best quality attribute. These implicate the importance of sensory evaluation in identifying consumers- preferences and also the competency of fuzzy approach in decision making.

Web Page Watermarking: XML files using Synonyms and Acronyms

Advent enhancements in the field of computing have increased massive use of web based electronic documents. Current Copyright protection laws are inadequate to prove the ownership for electronic documents and do not provide strong features against copying and manipulating information from the web. This has opened many channels for securing information and significant evolutions have been made in the area of information security. Digital Watermarking has developed into a very dynamic area of research and has addressed challenging issues for digital content. Watermarking can be visible (logos or signatures) and invisible (encoding and decoding). Many visible watermarking techniques have been studied for text documents but there are very few for web based text. XML files are used to trade information on the internet and contain important information. In this paper, two invisible watermarking techniques using Synonyms and Acronyms are proposed for XML files to prove the intellectual ownership and to achieve the security. Analysis is made for different attacks and amount of capacity to be embedded in the XML file is also noticed. A comparative analysis for capacity is also made for both methods. The system has been implemented using C# language and all tests are made practically to get the results.

Scope and Application of Collaborative Tools and Digital Manufacturing in Dentistry

It is necessary to incorporate technological advances achieved in the field of engineering into dentistry in order to enhance the process of diagnosis, treatment planning and enable the doctors to render better treatment to their patients. To achieve this ultimate goal long distance collaborations are often necessary. This paper discusses the various collaborative tools and their applications to solve a few burning problems confronted by the dentists. Customization is often the solution to most of the problems. But rapid designing, development and cost effective manufacturing is a difficult task to achieve. This problem can be solved using the technique of digital manufacturing. Cases from 6 major branches of dentistry have been discussed and possible solutions with the help of state of art technology using rapid digital manufacturing have been proposed in the present paper. The paper also entails the usage of existing tools in collaborative and digital manufacturing area.