Abstract: In EFL programs, rating scales used in writing
assessment are often constructed by intuition. Intuition-based scales
tend to provide inaccurate and divisive ratings of learners’ writing
performance. Hence, following an empirical approach, this study
attempted to develop a rating scale for elementary-level writing at an
EFL program in Saudi Arabia. Towards this goal, 98 students’ essays
were scored and then coded using comprehensive taxonomy of
writing constructs and their measures. An automatic linear modeling
was run to find out which measures would best predict essay scores.
A nonparametric ANOVA, the Kruskal-Wallis test, was then used to
determine which measures could best differentiate among scoring
levels. Findings indicated that there were certain measures that could
serve as either good predictors of essay scores or differentiators
among scoring levels, or both. The main conclusion was that a rating
scale can be empirically developed using predictive and
discriminative statistical tests.
Abstract: One of the most important tasks in the risk
management is the correct determination of probability of default
(PD) of particular financial subjects. In this paper a possibility of
determination of financial institution’s PD according to the creditscoring
models is discussed. The paper is divided into the two parts.
The first part is devoted to the estimation of the three different
models (based on the linear discriminant analysis, logit regression
and probit regression) from the sample of almost three hundred US
commercial banks. Afterwards these models are compared and
verified on the control sample with the view to choose the best one.
The second part of the paper is aimed at the application of the chosen
model on the portfolio of three key Czech banks to estimate their
present financial stability. However, it is not less important to be able
to estimate the evolution of PD in the future. For this reason, the
second task in this paper is to estimate the probability distribution of
the future PD for the Czech banks. So, there are sampled randomly
the values of particular indicators and estimated the PDs’ distribution,
while it’s assumed that the indicators are distributed according to the
multidimensional subordinated Lévy model (Variance Gamma model
and Normal Inverse Gaussian model, particularly). Although the
obtained results show that all banks are relatively healthy, there is
still high chance that “a financial crisis” will occur, at least in terms
of probability. This is indicated by estimation of the various quantiles
in the estimated distributions. Finally, it should be noted that the
applicability of the estimated model (with respect to the used data) is
limited to the recessionary phase of the financial market.
Abstract: The assessment of the risk posed by a borrower to a
lender is one of the common problems that financial institutions have
to deal with. Consumers vying for a mortgage are generally
compared to each other by the use of a number called the Credit
Score, which is generated by applying a mathematical algorithm to
information in the applicant’s credit report. The higher the credit
score, the lower the risk posed by the candidate, and the better he is
to be taken on by the lender. The objective of the present work is to
use fuzzy logic and linguistic rules to create a model that generates
Credit Scores.
Abstract: Protein kinases participate in a myriad of cellular
processes of major biomedical interest. The in vivo substrate
specificity of these enzymes is a process determined by several
factors, and despite several years of research on the topic, is still
far from being totally understood. In the present work, we have
quantified the contributions to the kinase substrate specificity of
i) the phosphorylation sites and their surrounding residues in the
sequence and of ii) the association of kinases to adaptor or scaffold
proteins. We have used position-specific scoring matrices (PSSMs),
to represent the stretches of sequences phosphorylated by 93 families
of kinases. We have found negative correlations between the number
of sequences from which a PSSM is generated and the statistical
significance and the performance of that PSSM. Using a subset
of 22 statistically significant PSSMs, we have identified specificity
determinant residues (SDRs) for 86% of the corresponding kinase
families. Our results suggest that different SDRs can function as
positive or negative elements of substrate recognition by the different
families of kinases. Additionally, we have found that human proteins
with known function as adaptors or scaffolds (kAS) tend to interact
with a significantly large fraction of the substrates of the kinases to
which they associate. Based on this characteristic we have identified
a set of 279 potential adaptors/scaffolds (pAS) for human kinases,
which is enriched in Pfam domains and functional terms tightly
related to the proposed function. Moreover, our results show that
for 74.6% of the kinase–pAS association found, the pAS colocalize
with the substrates of the kinases they are associated to. Finally, we
have found evidence suggesting that the association of kinases to
adaptors and scaffolds, may contribute significantly to diminish the
in vivo substrate crossed-specificity of protein kinases. In general, our
results indicate the relevance of several SDRs for both the positive
and negative selection of phosphorylation sites by kinase families and
also suggest that the association of kinases to pAS proteins may be
an important factor for the localization of the enzymes with their set
of substrates.
Abstract: Rubric is a very important tool for teachers and students for a variety of purposes. Teachers use the rubric for evaluating student work while students use rubrics for self-assessment. Therefore, this paper was emphasized scoring rubric as a scoring tool for teachers in an environment of Competency Based Education and Training (CBET) in Malaysia Vocational College. A total of three teachers in the fields of electrical and electronics engineering were interviewed to identify how the use of rubrics practiced since vocational transformation implemented in 2012. Overall holistic rubric used to determine the performance of students in the skills area.
Abstract: In this paper we propose a new classification method for automatic sleep scoring using an artificial neural network based decision tree. It attempts to treat sleep scoring progress as a series of two-class problems and solves them with a decision tree made up of a group of neural network classifiers, each of which uses a special feature set and is aimed at only one specific sleep stage in order to maximize the classification effect. A single electroencephalogram (EEG) signal is used for our analysis rather than depending on multiple biological signals, which makes greatly simplifies the data acquisition process. Experimental results demonstrate that the average epoch by epoch agreement between the visual and the proposed method in separating 30s wakefulness+S1, REM, S2 and SWS epochs was 88.83%. This study shows that the proposed method performed well in all the four stages, and can effectively limit error propagation at the same time. It could, therefore, be an efficient method for automatic sleep scoring. Additionally, since it requires only a small volume of data it could be suited to pervasive applications.
Abstract: Psoriasis is a chronic inflammatory skin condition
which affects 2-3% of population around the world. Psoriasis Area
and Severity Index (PASI) is a gold standard to assess psoriasis
severity as well as the treatment efficacy. Although a gold standard,
PASI is rarely used because it is tedious and complex. In practice,
PASI score is determined subjectively by dermatologists, therefore
inter and intra variations of assessment are possible to happen even
among expert dermatologists. This research develops an algorithm to
assess psoriasis lesion for PASI scoring objectively. Focus of this
research is thickness assessment as one of PASI four parameters
beside area, erythema and scaliness. Psoriasis lesion thickness is
measured by averaging the total elevation from lesion base to lesion
surface. Thickness values of 122 3D images taken from 39 patients
are grouped into 4 PASI thickness score using K-means clustering.
Validation on lesion base construction is performed using twelve
body curvature models and show good result with coefficient of
determinant (R2) is equal to 1.
Abstract: The objective of current study is to investigate the
differences of winning and losing teams in terms of goal scoring and
passing sequences. Total of 31 matches from UEFA-EURO 2012
were analyzed and 5 matches were excluded from analysis due to
matches end up drawn. There are two groups of variable used in the
study which is; i. the goal scoring variable and: ii. passing sequences
variable. Data were analyzed using Wilcoxon matched pair rank test
with significant value set at p < 0.05. Current study found the timing
of goal scored was significantly higher for winning team at 1st half
(Z=-3.416, p=.001) and 2nd half (Z=-3.252, p=.001). The scoring
frequency was also found to be increase as time progressed and the
last 15 minutes of the game was the time interval the most goals
scored. The indicators that were significantly differences between
winning and losing team were the goal scored (Z=-4.578, p=.000),
the head (Z=-2.500, p=.012), the right foot (Z=-3.788,p=.000),
corner (Z=-.2.126,p=.033), open play (Z=-3.744,p=.000), inside the
penalty box (Z=-4.174, p=.000) , attackers (Z=-2.976, p=.003) and
also the midfielders (Z=-3.400, p=.001). Regarding the passing
sequences, there are significance difference between both teams in
short passing sequences (Z=-.4.141, p=.000). While for the long
passing, there were no significance difference (Z=-.1.795, p=.073).
The data gathered in present study can be used by the coaches to
construct detailed training program based on their objectives.
Abstract: Protein-protein interactions (PPI) play a crucial role in many biological processes such as cell signalling, transcription, translation, replication, signal transduction, and drug targeting, etc. Structural information about protein-protein interaction is essential for understanding the molecular mechanisms of these processes. Structures of protein-protein complexes are still difficult to obtain by biophysical methods such as NMR and X-ray crystallography, and therefore protein-protein docking computation is considered an important approach for understanding protein-protein interactions. However, reliable prediction of the protein-protein complexes is still under way. In the past decades, several grid-based docking algorithms based on the Katchalski-Katzir scoring scheme were developed, e.g., FTDock, ZDOCK, HADDOCK, RosettaDock, HEX, etc. However, the success rate of protein-protein docking prediction is still far from ideal. In this work, we first propose a more practical measure for evaluating the success of protein-protein docking predictions,the rate of first success (RFS), which is similar to the concept of mean first passage time (MFPT). Accordingly, we have assessed the ZDOCK bound and unbound benchmarks 2.0 and 3.0. We also createda new benchmark set for protein-protein docking predictions, in which the complexes have experimentally determined binding affinity data. We performed free energy calculation based on the solution of non-linear Poisson-Boltzmann equation (nlPBE) to improve the binding mode prediction. We used the well-studied thebarnase-barstarsystem to validate the parameters for free energy calculations. Besides,thenlPBE-based free energy calculations were conducted for the badly predicted cases by ZDOCK and ZRANK. We found that direct molecular mechanics energetics cannot be used to discriminate the native binding pose from the decoys.Our results indicate that nlPBE-based calculations appeared to be one of the promising approaches for improving the success rate of binding pose predictions.
Abstract: This paper presents an economic game for sybil
detection in a distributed computing environment. Cost parameters
reflecting impacts of different sybil attacks are introduced in the sybil
detection game. The optimal strategies for this game in which both
sybil and non-sybil identities are expected to participate are devised.
A cost sharing economic mechanism called Discriminatory
Rewarding Mechanism for Sybil Detection is proposed based on this
game. A detective accepts a security deposit from each active agent,
negotiates with the agents and offers rewards to the sybils if the latter
disclose their identity. The basic objective of the detective is to
determine the optimum reward amount for each sybil which will
encourage the maximum possible number of sybils to reveal
themselves. Maintaining privacy is an important issue for the
mechanism since the participants involved in the negotiation are
generally reluctant to share their private information. The mechanism
has been applied to Tor by introducing a reputation scoring function.
Abstract: In this paper we focus on event extraction from Tamil
news article. This system utilizes a scoring scheme for extracting and
grouping event-specific sentences. Using this scoring scheme eventspecific
clustering is performed for multiple documents. Events are
extracted from each document using a scoring scheme based on
feature score and condition score. Similarly event specific sentences
are clustered from multiple documents using this scoring scheme.
The proposed system builds the Event Template based on user
specified query. The templates are filled with event specific details
like person, location and timeline extracted from the formed clusters.
The proposed system applies these methodologies for Tamil news
articles that have been enconverted into UNL graphs using a Tamil to
UNL-enconverter. The main intention of this work is to generate an
event based template.
Abstract: The evaluation of the question answering system is a major research area that needs much attention. Before the rise of domain-oriented question answering systems based on natural language understanding and reasoning, evaluation is never a problem as information retrieval-based metrics are readily available for use. However, when question answering systems began to be more domains specific, evaluation becomes a real issue. This is especially true when understanding and reasoning is required to cater for a wider variety of questions and at the same time achieve higher quality responses The research in this paper discusses the inappropriateness of the existing measure for response quality evaluation and in a later part, the call for new standard measures and the related considerations are brought forward. As a short-term solution for evaluating response quality of heterogeneous systems, and to demonstrate the challenges in evaluating systems of different nature, this research presents a black-box approach using observation, classification scheme and a scoring mechanism to assess and rank three example systems (i.e. AnswerBus, START and NaLURI).
Abstract: State-of-the-art methods for secondary structure (Porter, Psi-PRED, SAM-T99sec, Sable) and solvent accessibility (Sable, ACCpro) predictions use evolutionary profiles represented by the position specific scoring matrix (PSSM). It has been demonstrated that evolutionary profiles are the most important features in the feature space for these predictions. Unfortunately applying PSSM matrix leads to high dimensional feature spaces that may create problems with parameter optimization and generalization. Several recently published suggested that applying feature extraction for the PSSM matrix may result in improvements in secondary structure predictions. However, none of the top performing methods considered here utilizes dimensionality reduction to improve generalization. In the present study, we used simple and fast methods for features selection (t-statistics, information gain) that allow us to decrease the dimensionality of PSSM matrix by 75% and improve generalization in the case of secondary structure prediction compared to the Sable server.
Abstract: It has become crucial over the years for nations to
improve their credit scoring methods and techniques in light of the
increasing volatility of the global economy. Statistical methods or
tools have been the favoured means for this; however artificial
intelligence or soft computing based techniques are becoming
increasingly preferred due to their proficient and precise nature and
relative simplicity. This work presents a comparison between Support
Vector Machines and Artificial Neural Networks two popular soft
computing models when applied to credit scoring. Amidst the
different criteria-s that can be used for comparisons; accuracy,
computational complexity and processing times are the selected
criteria used to evaluate both models. Furthermore the German credit
scoring dataset which is a real world dataset is used to train and test
both developed models. Experimental results obtained from our study
suggest that although both soft computing models could be used with
a high degree of accuracy, Artificial Neural Networks deliver better
results than Support Vector Machines.
Abstract: The evaluation of conversational agents or chatterbots question answering systems is a major research area that needs much attention. Before the rise of domain-oriented conversational agents based on natural language understanding and reasoning, evaluation is never a problem as information retrieval-based metrics are readily available for use. However, when chatterbots began to become more domain specific, evaluation becomes a real issue. This is especially true when understanding and reasoning is required to cater for a wider variety of questions and at the same time to achieve high quality responses. This paper discusses the inappropriateness of the existing measures for response quality evaluation and the call for new standard measures and related considerations are brought forward. As a short-term solution for evaluating response quality of conversational agents, and to demonstrate the challenges in evaluating systems of different nature, this research proposes a blackbox approach using observation, classification scheme and a scoring mechanism to assess and rank three example systems, AnswerBus, START and AINI.
Abstract: Throughout this paper, a relatively new technique, the Tabu search variable selection model, is elaborated showing how it can be efficiently applied within the financial world whenever researchers come across the selection of a subset of variables from a whole set of descriptive variables under analysis. In the field of financial prediction, researchers often have to select a subset of variables from a larger set to solve different type of problems such as corporate bankruptcy prediction, personal bankruptcy prediction, mortgage, credit scoring and the Arbitrage Pricing Model (APM). Consequently, to demonstrate how the method operates and to illustrate its usefulness as well as its superiority compared to other commonly used methods, the Tabu search algorithm for variable selection is compared to two main alternative search procedures namely, the stepwise regression and the maximum R 2 improvement method. The Tabu search is then implemented in finance; where it attempts to predict corporate bankruptcy by selecting the most appropriate financial ratios and thus creating its own prediction score equation. In comparison to other methods, mostly the Altman Z-Score model, the Tabu search model produces a higher success rate in predicting correctly the failure of firms or the continuous running of existing entities.
Abstract: The small interfering RNA (siRNA) alters the
regulatory role of mRNA during gene expression by translational
inhibition. Recent studies show that upregulation of mRNA because
serious diseases like cancer. So designing effective siRNA with good
knockdown effects plays an important role in gene silencing. Various
siRNA design tools had been developed earlier. In this work, we are
trying to analyze the existing good scoring second generation siRNA
predicting tools and to optimize the efficiency of siRNA prediction
by designing a computational model using Artificial Neural Network
and whole stacking energy (%G), which may help in gene silencing
and drug design in cancer therapy. Our model is trained and tested
against a large data set of siRNA sequences. Validation of our results
is done by finding correlation coefficient of experimental versus
observed inhibition efficacy of siRNA. We achieved a correlation
coefficient of 0.727 in our previous computational model and we
could improve the correlation coefficient up to 0.753 when the
threshold of whole tacking energy is greater than or equal to -32.5
kcal/mol.
Abstract: Sleep stage scoring is the process of classifying the
stage of the sleep in which the subject is in. Sleep is classified into
two states based on the constellation of physiological parameters.
The two states are the non-rapid eye movement (NREM) and the
rapid eye movement (REM). The NREM sleep is also classified into
four stages (1-4). These states and the state wakefulness are
distinguished from each other based on the brain activity. In this
work, a classification method for automated sleep stage scoring
based on a single EEG recording using wavelet packet decomposition
was implemented. Thirty two ploysomnographic recording from the
MIT-BIH database were used for training and validation of the
proposed method. A single EEG recording was extracted and
smoothed using Savitzky-Golay filter. Wavelet packets
decomposition up to the fourth level based on 20th order Daubechies
filter was used to extract features from the EEG signal. A features
vector of 54 features was formed. It was reduced to a size of 25 using
the gain ratio method and fed into a classifier of regression trees. The
regression trees were trained using 67% of the records available. The
records for training were selected based on cross validation of the
records. The remaining of the records was used for testing the
classifier. The overall correct rate of the proposed method was found
to be around 75%, which is acceptable compared to the techniques in
the literature.
Abstract: this article proposed a methodology for computer
numerical control (CNC) machine scoring. The case study company
is a manufacturer of hard disk drive parts in Thailand. In this
company, sample of parts manufactured from CNC machine are
usually taken randomly for quality inspection. These inspection data
were used to make a decision to shut down the machine if it has
tendency to produce parts that are out of specification. Large amount
of data are produced in this process and data mining could be very
useful technique in analyzing them. In this research, data mining
techniques were used to construct a machine scoring model called
'machine priority assessment model (MPAM)'. This model helps to
ensure that the machine with higher risk of producing defective parts
be inspected before those with lower risk. If the defective prone
machine is identified sooner, defective part and rework could be
reduced hence improving the overall productivity. The results
showed that the proposed method can be successfully implemented
and approximately 351,000 baht of opportunity cost could have
saved in the case study company.
Abstract: Using Dynamic Bayesian Networks (DBN) to model genetic regulatory networks from gene expression data is one of the major paradigms for inferring the interactions among genes. Averaging a collection of models for predicting network is desired, rather than relying on a single high scoring model. In this paper, two kinds of model searching approaches are compared, which are Greedy hill-climbing Search with Restarts (GSR) and Markov Chain Monte Carlo (MCMC) methods. The GSR is preferred in many papers, but there is no such comparison study about which one is better for DBN models. Different types of experiments have been carried out to try to give a benchmark test to these approaches. Our experimental results demonstrated that on average the MCMC methods outperform the GSR in accuracy of predicted network, and having the comparable performance in time efficiency. By proposing the different variations of MCMC and employing simulated annealing strategy, the MCMC methods become more efficient and stable. Apart from comparisons between these approaches, another objective of this study is to investigate the feasibility of using DBN modeling approaches for inferring gene networks from few snapshots of high dimensional gene profiles. Through synthetic data experiments as well as systematic data experiments, the experimental results revealed how the performances of these approaches can be influenced as the target gene network varies in the network size, data size, as well as system complexity.