Abstract: The development of Internet technology in recent years has led to a more active role of users in creating Web content. This has significant effects both on individual learning and collaborative knowledge building. This paper will present an integrative framework model to describe and explain learning and knowledge building with shared digital artifacts on the basis of Luhmann-s systems theory and Piaget-s model of equilibration. In this model, knowledge progress is based on cognitive conflicts resulting from incongruities between an individual-s prior knowledge and the information which is contained in a digital artifact. Empirical support for the model will be provided by 1) applying it descriptively to texts from Wikipedia, 2) examining knowledge-building processes using a social network analysis, and 3) presenting a survey of a series of experimental laboratory studies.
Abstract: Term Extraction, a key data preparation step in Text
Mining, extracts the terms, i.e. relevant collocation of words,
attached to specific concepts (e.g. genetic-algorithms and decisiontrees
are terms associated to the concept “Machine Learning" ). In
this paper, the task of extracting interesting collocations is achieved
through a supervised learning algorithm, exploiting a few
collocations manually labelled as interesting/not interesting. From
these examples, the ROGER algorithm learns a numerical function,
inducing some ranking on the collocations. This ranking is optimized
using genetic algorithms, maximizing the trade-off between the false
positive and true positive rates (Area Under the ROC curve). This
approach uses a particular representation for the word collocations,
namely the vector of values corresponding to the standard statistical
interestingness measures attached to this collocation. As this
representation is general (over corpora and natural languages),
generality tests were performed by experimenting the ranking
function learned from an English corpus in Biology, onto a French
corpus of Curriculum Vitae, and vice versa, showing a good
robustness of the approaches compared to the state-of-the-art Support
Vector Machine (SVM).
Abstract: The paper gives the pilot results of the project that is
oriented on the use of data mining techniques and knowledge
discoveries from production systems through them. They have been
used in the management of these systems. The simulation models of
manufacturing systems have been developed to obtain the necessary
data about production. The authors have developed the way of
storing data obtained from the simulation models in the data
warehouse. Data mining model has been created by using specific
methods and selected techniques for defined problems of production
system management. The new knowledge has been applied to
production management system. Gained knowledge has been tested
on simulation models of the production system. An important benefit
of the project has been proposal of the new methodology. This
methodology is focused on data mining from the databases that store
operational data about the production process.
Abstract: This paper proposes an improvement method of classification
efficiency in a classification model. The model is used
in a risk search system and extracts specific labels from articles
posted at bulletin board sites. The system can analyze the important
discussions composed of the articles. The improvement method
introduces ensemble learning methods that use multiple classification
models. Also, it introduces expressions related to the specific labels
into generation of word vectors. The paper applies the improvement
method to articles collected from three bulletin board sites selected
by users and verifies the effectiveness of the improvement method.
Abstract: This paper aims to (1) analyze the profiles of
transgressors (detected evaders); (2) examine reason(s) that triggered a
tax audit, causes of tax evasion, audit timeframe and tax penalty
charged; and (3) to assess if tax auditors followed the guidelines as
stated in the 'Tax Audit Framework' when conducting tax audits. In
2011, the Inland Revenue Board Malaysia (IRBM) had audited and
finalized 557 company cases. With official permission, data of all the
557 cases were obtained from the IRBM. Of these, a total of 421 cases
with complete information were analyzed. About 58.1% was small and
medium corporations and from the construction industry (32.8%). The
selection for tax audit was based on risk analysis (66.8%), information
from third party (11.1%), and firm with low profitability or fluctuating
profit pattern (7.8%). The three persistent causes of tax evasion by
firms were over claimed expenses (46.8%), fraudulent reporting of
income (38.5%) and overstating purchases (10.5%). These findings
are consistent with past literature. Results showed that tax auditors
took six to 18 months to close audit cases. More than half of tax
evaders were fined 45% on additional tax raised during audit for the
first offence. The study found tax auditors did follow the guidelines in
the 'Tax Audit Framework' in audit selection, settlement and penalty
imposition.
Abstract: The healthcare environment is generally perceived as
being information rich yet knowledge poor. However, there is a lack
of effective analysis tools to discover hidden relationships and trends
in data. In fact, valuable knowledge can be discovered from
application of data mining techniques in healthcare system. In this
study, a proficient methodology for the extraction of significant
patterns from the Coronary Heart Disease warehouses for heart
attack prediction, which unfortunately continues to be a leading cause
of mortality in the whole world, has been presented. For this purpose,
we propose to enumerate dynamically the optimal subsets of the
reduced features of high interest by using rough sets technique
associated to dynamic programming. Therefore, we propose to
validate the classification using Random Forest (RF) decision tree to
identify the risky heart disease cases. This work is based on a large
amount of data collected from several clinical institutions based on
the medical profile of patient. Moreover, the experts- knowledge in
this field has been taken into consideration in order to define the
disease, its risk factors, and to establish significant knowledge
relationships among the medical factors. A computer-aided system is
developed for this purpose based on a population of 525 adults. The
performance of the proposed model is analyzed and evaluated based
on set of benchmark techniques applied in this classification problem.
Abstract: Communication is becoming a significant tool to engage stakeholders since half of the century ago. In the recent years, there has been rapid growth of new technology developments. In tandem with such developments, there has been growing emphasis in communication strategies and management especially in determining the level of influence and management strategies among the said stakeholders on particular field. This paper presents a research conceptual framework focusing on stakeholder theories, communication and management strategies to be implied on the engagement of stakeholders of new technology developments of fertilizer industry in Malaysia. Framework espoused in this paper will provide insights into the various stakeholder theories and engagement strategies from different principal necessary for a successful introduction of new technology development in the above stated industry. The proposed framework has theoretical significance in filling the gap of the body of knowledge in the implementation of communication strategies in Malaysian fertilizer industry.
Abstract: The benefits of rooftop greenery systems (such as
energy savings, reduction of greenhouse gas emission for mitigating
climate change and maintaining sustainable development, indoor
temperature control etc.) in buildings are well recognized, however
there remains very little research conducted for quantifying the
benefits in subtropical climates such as in Australia. This study
mainly focuses on measuring/determining temperature profile and air
conditioning energy savings by implementing rooftop greenery
systems in subtropical Central Queensland in Australia. An
experimental set-up was installed at Rockhampton campus of Central
Queensland University, where two standard shipping containers (6m
x 2.4m x 2.4m) were converted into small offices, one with green
roof and one without. These were used for temperature, humidity and
energy consumption data collection. The study found that an energy
savings of up to 11.70% and temperature difference of up to 4°C can
be achieved in March in subtropical Central Queensland climate in
Australia. It is expected that more energy can be saved in peak
summer days (December/February) as temperature difference
between green roof and non-green roof is higher in December-
February.
Abstract: With the implied volatility as an important factor in
financial decision-making, in particular in option pricing valuation,
and also the given fact that the pricing biases of Leland option pricing
models and the implied volatility structure for the options are related,
this study considers examining the implied adjusted volatility smile
patterns and term structures in the S&P/ASX 200 index options using
the different Leland option pricing models. The examination of the
implied adjusted volatility smiles and term structures in the
Australian index options market covers the global financial crisis in
the mid-2007. The implied adjusted volatility was found to escalate
approximately triple the rate prior the crisis.
Abstract: In this paper, we propose novel algorithmic models
based on information fusion and feature transformation in crossmodal
subspace for different types of residue features extracted from
several intra-frame and inter-frame pixel sub-blocks in video
sequences for detecting digital video tampering or forgery. An
evaluation of proposed residue features – the noise residue features
and the quantization features, their transformation in cross-modal
subspace, and their multimodal fusion, for emulated copy-move
tamper scenario shows a significant improvement in tamper detection
accuracy as compared to single mode features without transformation
in cross-modal subspace.
Abstract: This paper explores the effectiveness of machine
learning techniques in detecting firms that issue fraudulent financial
statements (FFS) and deals with the identification of factors
associated to FFS. To this end, a number of experiments have been
conducted using representative learning algorithms, which were
trained using a data set of 164 fraud and non-fraud Greek firms in the
recent period 2001-2002. The decision of which particular method to
choose is a complicated problem. A good alternative to choosing
only one method is to create a hybrid forecasting system
incorporating a number of possible solution methods as components
(an ensemble of classifiers). For this purpose, we have implemented
a hybrid decision support system that combines the representative
algorithms using a stacking variant methodology and achieves better
performance than any examined simple and ensemble method. To
sum up, this study indicates that the investigation of financial
information can be used in the identification of FFS and underline the
importance of financial ratios.
Abstract: In any distributed systems, process scheduling plays a
vital role in determining the efficiency of the system. Process scheduling algorithms are used to ensure that the components of the
system would be able to maximize its utilization and able to complete all the processes assigned in a specified period of time.
This paper focuses on the development of comparative simulator for distributed process scheduling algorithms. The objectives of the works that have been carried out include the development of the
comparative simulator, as well as to implement a comparative study
between three distributed process scheduling algorithms; senderinitiated,
receiver-initiated and hybrid sender-receiver-initiated
algorithms. The comparative study was done based on the Average Waiting Time (AWT) and Average Turnaround Time (ATT) of the
processes involved. The simulation results show that the performance of the algorithms depends on the number of nodes in the system.
Abstract: To investigate the correspondence of theory and
practice, a successfully implemented Knowledge Management
System (KMS) is explored through the lens of Alavi and Leidner-s
proposed KMS framework for the analysis of an information system
in knowledge management (Framework-AISKM). The applied KMS
system was designed to manage curricular knowledge in a distributed
university environment. The motivation for the KMS is discussed
along with the types of knowledge necessary in an academic setting.
Elements of the KMS involved in all phases of capturing and
disseminating knowledge are described. As the KMS matures the
resulting data stores form the precursor to and the potential for
knowledge mining. The findings from this exploratory study indicate
substantial correspondence between the successful KMS and the
theory-based framework providing provisional confirmation for the
framework while suggesting factors that contributed to the system-s
success. Avenues for future work are described.
Abstract: This paper deals with the application of a well-known neural network technique, multilayer back-propagation (BP) neural network, in financial data mining. A modified neural network forecasting model is presented, and an intelligent mining system is developed. The system can forecast the buying and selling signs according to the prediction of future trends to stock market, and provide decision-making for stock investors. The simulation result of seven years to Shanghai Composite Index shows that the return achieved by this mining system is about three times as large as that achieved by the buy and hold strategy, so it is advantageous to apply neural networks to forecast financial time series, the different investors could benefit from it.
Abstract: Nowadays, Gene Ontology has been used widely by many researchers for biological data mining and information retrieval, integration of biological databases, finding genes, and incorporating knowledge in the Gene Ontology for gene clustering. However, the increase in size of the Gene Ontology has caused problems in maintaining and processing them. One way to obtain their accessibility is by clustering them into fragmented groups. Clustering the Gene Ontology is a difficult combinatorial problem and can be modeled as a graph partitioning problem. Additionally, deciding the number k of clusters to use is not easily perceived and is a hard algorithmic problem. Therefore, an approach for solving the automatic clustering of the Gene Ontology is proposed by incorporating cohesion-and-coupling metric into a hybrid algorithm consisting of a genetic algorithm and a split-and-merge algorithm. Experimental results and an example of modularized Gene Ontology in RDF/XML format are given to illustrate the effectiveness of the algorithm.
Abstract: In this paper, we present a new algorithm for clustering data in large datasets using image processing approaches. First the dataset is mapped into a binary image plane. The synthesized image is then processed utilizing efficient image processing techniques to cluster the data in the dataset. Henceforth, the algorithm avoids exhaustive search to identify clusters. The algorithm considers only a small set of the data that contains critical boundary information sufficient to identify contained clusters. Compared to available data clustering techniques, the proposed algorithm produces similar quality results and outperforms them in execution time and storage requirements.
Abstract: The one-class support vector machine “support vector
data description” (SVDD) is an ideal approach for anomaly or outlier
detection. However, for the applicability of SVDD in real-world
applications, the ease of use is crucial. The results of SVDD are
massively determined by the choice of the regularisation parameter C
and the kernel parameter of the widely used RBF kernel. While for
two-class SVMs the parameters can be tuned using cross-validation
based on the confusion matrix, for a one-class SVM this is not
possible, because only true positives and false negatives can occur
during training. This paper proposes an approach to find the optimal
set of parameters for SVDD solely based on a training set from
one class and without any user parameterisation. Results on artificial
and real data sets are presented, underpinning the usefulness of the
approach.
Abstract: Much research into handwritten Thai character
recognition have been proposed, such as comparing heads of
characters, Fuzzy logic and structure trees, etc. This paper presents a
system of handwritten Thai character recognition, which is based on
the Ant-minor algorithm (data mining based on Ant colony
optimization). Zoning is initially used to determine each character.
Then three distinct features (also called attributes) of each character
in each zone are extracted. The attributes are Head zone, End point,
and Feature code. All attributes are used for construct the
classification rules by an Ant-miner algorithm in order to classify
112 Thai characters. For this experiment, the Ant-miner algorithm is
adapted, with a small change to increase the recognition rate. The
result of this experiment is a 97% recognition rate of the training set
(11200 characters) and 82.7% recognition rate of unseen data test
(22400 characters).
Abstract: This Classifying Bird Sounds (chip notes) project-s
purpose is to reduce the unwanted noise from recorded bird sound
chip notes, design a scheme to detect differences and similarities
between recorded chip notes, and classify bird sound chip notes. The
technologies of determining the similarities of sound waves have
been used in communication, sound engineering and wireless sound
applications for many years. Our research is focused on the similarity
of chip notes, which are the sounds from different birds. The program
we use is generated by Microsoft Cµ.
Abstract: These days people love to travel around the world.
Regardless of their location and time, they especially Muslims still
need to perform their prayers. Normally for travelers, they need to
bring maps, compass and for Muslim, they even have to bring Qibla
pointer when they travel. It is slightly difficult to determine the Qibla
direction and to know the time for each prayer. As the technology
grows, many PDA equip with maps and GPS to locate their location.
In this paper we present a new electronic device called Mobile Qibla
and Prayer Time Finder to locate the Qibla direction and to
determine each prayer time based on the current user-s location using
PDA. This device use PIC microcontroller equipped with digital
compass where it will communicate with PDA using Bluetooth
technology and display the exact Qibla direction and prayer time
automatically at any place in the world. This device is reliable and
accurate in determining the Qibla direction and prayer time.