Abstract: Named Entity Recognition (NER) aims to classify each word of a document into predefined target named entity classes and is now-a-days considered to be fundamental for many Natural Language Processing (NLP) tasks such as information retrieval, machine translation, information extraction, question answering systems and others. This paper reports about the development of a NER system for Bengali and Hindi using Support Vector Machine (SVM). Though this state of the art machine learning technique has been widely applied to NER in several well-studied languages, the use of this technique to Indian languages (ILs) is very new. The system makes use of the different contextual information of the words along with the variety of features that are helpful in predicting the four different named (NE) classes, such as Person name, Location name, Organization name and Miscellaneous name. We have used the annotated corpora of 122,467 tokens of Bengali and 502,974 tokens of Hindi tagged with the twelve different NE classes 1, defined as part of the IJCNLP-08 NER Shared Task for South and South East Asian Languages (SSEAL) 2. In addition, we have manually annotated 150K wordforms of the Bengali news corpus, developed from the web-archive of a leading Bengali newspaper. We have also developed an unsupervised algorithm in order to generate the lexical context patterns from a part of the unlabeled Bengali news corpus. Lexical patterns have been used as the features of SVM in order to improve the system performance. The NER system has been tested with the gold standard test sets of 35K, and 60K tokens for Bengali, and Hindi, respectively. Evaluation results have demonstrated the recall, precision, and f-score values of 88.61%, 80.12%, and 84.15%, respectively, for Bengali and 80.23%, 74.34%, and 77.17%, respectively, for Hindi. Results show the improvement in the f-score by 5.13% with the use of context patterns. Statistical analysis, ANOVA is also performed to compare the performance of the proposed NER system with that of the existing HMM based system for both the languages.
Abstract: Erroneous computer entry problems [here: 'e'errors] in hospital labs threaten the patients-–health carers- relationship, undermining the health system credibility. Are e-errors random, and do lab professionals make them accidentally, or may they be traced through meaningful determinants? Theories on internal causality of mistakes compel to seek specific causal ascriptions of hospital lab eerrors instead of accepting some inescapability. Undeniably, 'To Err is Human'. But in view of rapid global health organizational changes, e-errors are too expensive to lack in-depth considerations. Yet, that efunction might supposedly be entrenched in the health carers- job description remains under dispute – at least for Hellenic labs, where e-use falls behind generalized(able) appreciation and application. In this study: i) an empirical basis of a truly high annual cost of e-errors at about €498,000.00 per rural Hellenic hospital was established, hence interest in exploring the issue was sufficiently substantiated; ii) a sample of 270 lab-expert nurses, technicians and doctors were assessed on several personality, burnout and e-error measures, and iii) the hypothesis that the Hardiness vs Alienation personality construct disposition explains resistance vs proclivity to e-errors was tested and verified: Hardiness operates as a resilience source in the encounter of high pressures experienced in the hospital lab, whereas its 'opposite', i.e., Alienation, functions as a predictor, not only of making e-errors, but also of leading to burn-out. Implications for apt interventions are discussed.
Abstract: The recent developments in computing and
communication technology permit to users to access multimedia
documents with variety of devices (PCs, PDAs, mobile phones...)
having heterogeneous capabilities. This diversification of supports
has trained the need to adapt multimedia documents according to
their execution contexts. A semantic framework for multimedia
document adaptation based on the conceptual neighborhood graphs
was proposed. In this framework, adapting consists on finding
another specification that satisfies the target constraints and which is
as close as possible from the initial document. In this paper, we
propose a new way of building the conceptual neighborhood graphs
to best preserve the proximity between the adapted and the original
documents and to deal with more elaborated relations models by
integrating the relations relaxation graphs that permit to handle the
delays and the distances defined within the relations.
Abstract: 17α-ethynylestradiol (EE2) is a synthetic estrogen
used as a key ingredient in an oral contraceptives pill. EE2 is an
endocrine disrupting compound, high in estrogenic potency.
Although EE2 exhibits low degree of biodegradability with common
microorganisms in wastewater treatment plants (WWTPs), this
compound can be biotransformed by ammonia-oxidizing bacteria
(AOB) via a co-metabolism mechanism in WWTPs. This study
aimed to investigate the effect of real wastewater on
biotransformation of EE2 by AOB. A preliminary experiment on the
effect of nitrite and pH levels on abiotic transformation of EE2
suggested that the abiotic transformation occurred at only pH
Abstract: This study aims at providing empirical evidence on a
comparison of two equity valuation models: (1) the dividend discount
model (DDM) and (2) the residual income model (RIM), in
estimating equity values of Thai firms during 1995-2004. Results
suggest that DDM and RIM underestimate equity values of Thai
firms and that RIM outperforms DDM in predicting cross-sectional
stock prices. Results on regression of cross-sectional stock prices on
the decomposed DDM and RIM equity values indicate that book
value of equity provides the greatest incremental explanatory power,
relative to other components in DDM and RIM terminal values,
suggesting that book value distortions resulting from accounting
procedures and choices are less severe than forecast and
measurement errors in discount rates and growth rates.
We also document that the incremental explanatory power of book
value of equity during 1998-2004, representing the information
environment under Thai Accounting Standards reformed after the
1997 economic crisis to conform to International Accounting
Standards, is significantly greater than that during 1995-1996,
representing the information environment under the pre-reformed
Thai Accounting Standards. This implies that the book value
distortions are less severe under the 1997 Reformed Thai Accounting
Standards than the pre-reformed Thai Accounting Standards.
Abstract: e-mail has become an important means of electronic
communication but the viability of its usage is marred by Unsolicited
Bulk e-mail (UBE) messages. UBE consists of many types
like pornographic, virus infected and 'cry-for-help' messages as well
as fake and fraudulent offers for jobs, winnings and medicines. UBE
poses technical and socio-economic challenges to usage of e-mails.
To meet this challenge and combat this menace, we need to
understand UBE. Towards this end, the current paper presents a
content-based textual analysis of nearly 3000 winnings-announcing
UBE. Technically, this is an application of Text Parsing and
Tokenization for an un-structured textual document and we approach
it using Bag Of Words (BOW) and Vector Space Document Model
techniques. We have attempted to identify the most frequently
occurring lexis in the winnings-announcing UBE documents. The
analysis of such top 100 lexis is also presented. We exhibit the
relationship between occurrence of a word from the identified lexisset
in the given UBE and the probability that the given UBE will be
the one announcing fake winnings. To the best of our knowledge and
survey of related literature, this is the first formal attempt for
identification of most frequently occurring lexis in winningsannouncing
UBE by its textual analysis. Finally, this is a sincere
attempt to bring about alertness against and mitigate the threat of
such luring but fake UBE.
Abstract: Obesity is frequent attendant phenomenon of patients
with endocrinological disease. Between BMI and endocrinological
diseases is close correlation. In thesis we focused on the allocation of
hormone concentration – PTH and TSH, CHOL a mineral element Ca
in a blood serum. The examined group was formed by 100
respondents (women) aged 36 – 83 years, who were divided into two
groups – control group (CG), group with diagnosed endocrine disease
(DED). The concentration of PTH and TSH, Ca and CHOL was
measured through the medium of analyzers Cobas e411 (Japan);
Cobas Integra 400 (Switzerland). At individuals was measured body
weight as well as stature and thereupon from those data we
enumerated BMI. On the basis of Student T-test in biochemical
parameter of PTH and Ca we found out significantly meaningful
difference (p
Abstract: The main goal in this paper is to quantify the quality of
different techniques for radiation treatment plans, a back-propagation
artificial neural network (ANN) combined with biomedicine theory
was used to model thirteen dosimetric parameters and to calculate
two dosimetric indices. The correlations between dosimetric indices
and quality of life were extracted as the features and used in the ANN
model to make decisions in the clinic. The simulation results show
that a trained multilayer back-propagation neural network model can
help a doctor accept or reject a plan efficiently. In addition, the
models are flexible and whenever a new treatment technique enters
the market, the feature variables simply need to be imported and the
model re-trained for it to be ready for use.
Abstract: Film, as an art form playing a vital role and is a powerful tool in documenting, influencing and shaping the society. Films are the collective creation of a large number of separate individuals, each contributing with creative input, unique talents, and technical expertise to the project. Recently, the Malaysian Independent (or “Indie") filmmakers have made their presence felt by winning awards at various international film festivals. Working in the digital video (DV) format, a number of independent filmmakers really hit their stride with a range of remarkably strong titles and international recognition has been quick in coming and their works are now regularly in exhibition or in competition, winning many top prizes at prestigious festivals around the world. The interaction factors among crewmembers are emphasized as imperative for group success. An in-depth interview is conducted to analyze the social interactions and exchanges between filmmakers through Social Exchanges Theory (SET). Certainly the new millennium that was marked as the digital technology revolution has changed the face of filmmaking in Malaysia. There is a clear need to study the Malaysian independent cinema especially from the perspective of understanding what causes the independent filmmakers to work so well given all of the difficulties and constraints.
Abstract: In this paper, an extended study is performed on the
effect of different factors on the quality of vector data based on a
previous study. In the noise factor, one kind of noise that appears in
document images namely Gaussian noise is studied while the previous
study involved only salt-and-pepper noise. High and low levels of
noise are studied. For the noise cleaning methods, algorithms that were
not covered in the previous study are used namely Median filters and
its variants. For the vectorization factor, one of the best available
commercial raster to vector software namely VPstudio is used to
convert raster images into vector format. The performance of line
detection will be judged based on objective performance evaluation
method. The output of the performance evaluation is then analyzed
statistically to highlight the factors that affect vector quality.
Abstract: We present a method to create special domain
collections from news sites. The method only requires a single
sample article as a seed. No prior corpus statistics are needed and the
method is applicable to multiple languages. We examine various
similarity measures and the creation of document collections for
English and Japanese. The main contributions are as follows. First,
the algorithm can build special domain collections from as little as
one sample document. Second, unlike other algorithms it does not
require a second “general" corpus to compute statistics. Third, in our
testing the algorithm outperformed others in creating collections
made up of highly relevant articles.
Abstract: Automatic reading of handwritten cheque is a computationally
complex process and it plays an important role in financial
risk management. Machine vision and learning provide a viable
solution to this problem. Research effort has mostly been focused
on recognizing diverse pitches of cheques and demand drafts with an
identical outline. However most of these methods employ templatematching
to localize the pitches and such schemes could potentially
fail when applied to different types of outline maintained by the
bank. In this paper, the so-called outline problem is resolved by
a cheque information tree (CIT), which generalizes the localizing
method to extract active-region-of-entities. In addition, the weight
based density plot (WBDP) is performed to isolate text entities and
read complete pitches. Recognition is based on texture features using
neural classifiers. Legal amount is subsequently recognized by both
texture and perceptual features. A post-processing phase is invoked
to detect the incorrect readings by Type-2 grammar using the Turing
machine. The performance of the proposed system was evaluated
using cheque and demand drafts of 22 different banks. The test data
consists of a collection of 1540 leafs obtained from 10 different
account holders from each bank. Results show that this approach
can easily be deployed without significant design amendments.
Abstract: As the enormous amount of on-line text grows on the
World-Wide Web, the development of methods for automatically
summarizing this text becomes more important. The primary goal of
this research is to create an efficient tool that is able to summarize
large documents automatically. We propose an Evolving
connectionist System that is adaptive, incremental learning and
knowledge representation system that evolves its structure and
functionality. In this paper, we propose a novel approach for Part of
Speech disambiguation using a recurrent neural network, a paradigm
capable of dealing with sequential data. We observed that
connectionist approach to text summarization has a natural way of
learning grammatical structures through experience. Experimental
results show that our approach achieves acceptable performance.
Abstract: The third phase of web means semantic web requires many web pages which are annotated with metadata. Thus, a crucial question is where to acquire these metadata. In this paper we propose our approach, a semi-automatic method to annotate the texts of documents and web pages and employs with a quite comprehensive knowledge base to categorize instances with regard to ontology. The approach is evaluated against the manual annotations and one of the most popular annotation tools which works the same as our tool. The approach is implemented in .net framework and uses the WordNet for knowledge base, an annotation tool for the Semantic Web.
Abstract: XML has become a popular standard for information exchange via web. Each XML document can be presented as a rooted, ordered, labeled tree. The Node label shows the exact position of a node in the original document. Region and Dewey encoding are two famous methods of labeling trees. In this paper, we propose a new insert friendly labeling method named IFDewey based on recently proposed scheme, called Extended Dewey. In Extended Dewey many labels must be modified when a new node is inserted into the XML tree. Our method eliminates this problem by reserving even numbers for future insertion. Numbers generated by Extended Dewey may be even or odd. IFDewey modifies Extended Dewey so that only odd numbers are generated and even numbers can then be used for a much easier insertion of nodes.
Abstract: With the enormous growth on the web, users get easily
lost in the rich hyper structure. Thus developing user friendly and
automated tools for providing relevant information without any
redundant links to the users to cater to their needs is the primary task
for the website owners. Most of the existing web mining algorithms
have concentrated on finding frequent patterns while neglecting the
less frequent one that are likely to contain the outlying data such as
noise, irrelevant and redundant data. This paper proposes new
algorithm for mining the web content by detecting the redundant
links from the web documents using set theoretical(classical
mathematics) such as subset, union, intersection etc,. Then the
redundant links is removed from the original web content to get the
required information by the user..
Abstract: An ethnobotanical study was conducted to document
local knowledge and potentials of wild edible tubers that has been
reported and sighted and to investigate and record their distribution in
Pulau Redang and nearby islands of Terengganu, Malaysia.
Information was gathered from 42 villagers by using semi-structured
questionnaire. These respondents were selected randomly and no
appointment was made prior to the visits. For distribution, the
locations of wild edible tubers were recorded by using the Global
Positioning System (GPS). The wild edible tubers recorded were ubi
gadung, ubi toyo, ubi kasu, ubi jaga, ubi seratus and ubi kertas.
Dioscorea or commonly known as yam is reported to be one of the
major food sources worldwide. The majority of villagers used
Dioscorea hispida Dennst. or ubi gadung in many ways in their life
such as for food, medicinal purposes and fish poison. The villagers
have identified this ubi gadung by looking at the morphological
characteristics; that include leaf shape, stem and the color of the
tuber-s flesh.
Abstract: Knowing consumers' preferences and perceptions of
the sensory evaluation of drink products are very significant to
manufacturers and retailers alike. With no appropriate sensory
analysis, there is a high risk of market disappointment. This paper
aims to rank the selected coffee products and also to determine the
best of quality attribute through sensory evaluation using fuzzy
decision making model. Three products of coffee drinks were used
for sensory evaluation. Data were collected from thirty judges at a
hypermarket in Kuala Terengganu, Malaysia. The judges were asked
to specify their sensory evaluation in linguistic terms of the quality
attributes of colour, smell, taste and mouth feel for each product and
also the weight of each quality attribute. Five fuzzy linguistic terms
represent the quality attributes were introduced prior analysing. The
judgment membership function and the weights were compared to
rank the products and also to determine the best quality attribute. The
product of Indoc was judged as the first in ranking and 'taste' as the
best quality attribute. These implicate the importance of sensory
evaluation in identifying consumers- preferences and also the
competency of fuzzy approach in decision making.
Abstract: Advent enhancements in the field of computing have
increased massive use of web based electronic documents. Current
Copyright protection laws are inadequate to prove the ownership for
electronic documents and do not provide strong features against
copying and manipulating information from the web. This has
opened many channels for securing information and significant
evolutions have been made in the area of information security.
Digital Watermarking has developed into a very dynamic area of
research and has addressed challenging issues for digital content.
Watermarking can be visible (logos or signatures) and invisible
(encoding and decoding). Many visible watermarking techniques
have been studied for text documents but there are very few for web
based text. XML files are used to trade information on the internet
and contain important information. In this paper, two invisible
watermarking techniques using Synonyms and Acronyms are
proposed for XML files to prove the intellectual ownership and to
achieve the security. Analysis is made for different attacks and
amount of capacity to be embedded in the XML file is also noticed.
A comparative analysis for capacity is also made for both methods.
The system has been implemented using C# language and all tests are
made practically to get the results.
Abstract: It is necessary to incorporate technological advances
achieved in the field of engineering into dentistry in order to enhance
the process of diagnosis, treatment planning and enable the doctors to
render better treatment to their patients. To achieve this ultimate goal
long distance collaborations are often necessary. This paper discusses
the various collaborative tools and their applications to solve a few
burning problems confronted by the dentists. Customization is often
the solution to most of the problems. But rapid designing,
development and cost effective manufacturing is a difficult task to
achieve. This problem can be solved using the technique of digital
manufacturing. Cases from 6 major branches of dentistry have been
discussed and possible solutions with the help of state of art
technology using rapid digital manufacturing have been proposed in
the present paper. The paper also entails the usage of existing tools in
collaborative and digital manufacturing area.