Abstract: Numerical analysis naturally finds applications in all
fields of engineering and the physical sciences, but in the
21st century, the life sciences and even the arts have adopted
elements of scientific computations. The numerical data analysis
became key process in research and development of all the fields [6].
In this paper we have made an attempt to analyze the specified
numerical patterns with reference to the association rule mining
techniques with minimum confidence and minimum support mining
criteria. The extracted rules and analyzed results are graphically
demonstrated. Association rules are a simple but very useful form of
data mining that describe the probabilistic co-occurrence of certain
events within a database [7]. They were originally designed to
analyze market-basket data, in which the likelihood of items being
purchased together within the same transactions are analyzed.
Abstract: The occurrence of missing values in database is a serious problem for Data Mining tasks, responsible for degrading data quality and accuracy of analyses. In this context, the area has shown a lack of standardization for experiments to treat missing values, introducing difficulties to the evaluation process among different researches due to the absence in the use of common parameters. This paper proposes a testbed intended to facilitate the experiments implementation and provide unbiased parameters using available datasets and suited performance metrics in order to optimize the evaluation and comparison between the state of art missing values treatments.
Abstract: The aim of this study was to estimate the frequency of
EBV infection in Hodgkin's lymphoma (HL) and non-Hodgkin's
lymphoma (NHL) occurring in Jordanian patients. A total of 55
patients with lymphoma were examined in this study. Of 55 patients,
30 and 25 were diagnosed as HL and NHL, respectively. The four
HL subtypes were observed with the majority of the cases exhibited
the mixed cellularity (MC) subtype followed by the nodular sclerosis
(NS). The high grade was found to be the commonest subtype of
NHL in our sample, followed by the low grade. The presence of EBV
virus was detected by immunostating for expression of latent
membrane protein-1 (LMP-1). The frequency of LMP-1 expression
occurred more frequent in patients with HL (60.0%) than in patients
with NHL (32.0%). The frequency of LMP-1 expression was also
higher in patients with MC subtype (61.11%) than those patients with
NS (28.57%). No age or gender difference in occurrence of EBV
infection was observed among patient with HL. By contrast, the
prevalence of EBV infection in NHL patients aged below 50 was
lower (16.66%) than in NHL patients aged 50 or above (46.15%). In
addition, EBV infection was more frequent in females with NHL
(38.46%) than in male with NHL (25%). In NHL cases, the
frequency of EBV infection in intermediate grade (60.0%) was high
when compared with frequency of low (25%) or high grades (25%).
In conclusion, analysis of LMP-1 expression indicates an important
role for this viral oncogene in the pathogenesis of EBV-associated
malignant lymphomas. These data also support the previous findings
that people with EBV may develop lymphoma and that efforts to
maintain low lymphoma should be considered for people with EBV
infection.
Abstract: In biological and biomedical research motif finding tools are important in locating regulatory elements in DNA sequences. There are many such motif finding tools available, which often yield position weight matrices and significance indicators. These indicators, p-values and E-values, describe the likelihood that a motif alignment is generated by the background process, and the expected number of occurrences of the motif in the data set, respectively. The various tools often estimate these indicators differently, making them not directly comparable. One approach for comparing motifs from different tools, is computing the E-value as the product of the p-value and the number of possible alignments in the data set. In this paper we explore the combinatorics of the motif alignment models OOPS, ZOOPS, and ANR, and propose a generic algorithm for computing the number of possible combinations accurately. We also show that using the wrong alignment model can give E-values that significantly diverge from their true values.
Abstract: Internet is largely composed of textual contents and a
huge volume of digital contents gets floated over the Internet daily.
The ease of information sharing and re-production has made it
difficult to preserve author-s copyright. Digital watermarking came
up as a solution for copyright protection of plain text problem after
1993. In this paper, we propose a zero text watermarking algorithm
based on occurrence frequency of non-vowel ASCII characters and
words for copyright protection of plain text. The embedding
algorithm makes use of frequency non-vowel ASCII characters and
words to generate a specialized author key. The extraction algorithm
uses this key to extract watermark, hence identify the original
copyright owner. Experimental results illustrate the effectiveness of
the proposed algorithm on text encountering meaning preserving
attacks performed by five independent attackers.
Abstract: e-mail has become an important means of electronic
communication but the viability of its usage is marred by Unsolicited
Bulk e-mail (UBE) messages. UBE consists of many types
like pornographic, virus infected and 'cry-for-help' messages as well
as fake and fraudulent offers for jobs, winnings and medicines. UBE
poses technical and socio-economic challenges to usage of e-mails.
To meet this challenge and combat this menace, we need to
understand UBE. Towards this end, the current paper presents a
content-based textual analysis of more than 2700 body enhancement
medicinal UBE. Technically, this is an application of Text Parsing
and Tokenization for an un-structured textual document and we
approach it using Bag Of Words (BOW) and Vector Space Document
Model techniques. We have attempted to identify the most
frequently occurring lexis in the UBE documents that advertise
various products for body enhancement. The analysis of such top
100 lexis is also presented. We exhibit the relationship between
occurrence of a word from the identified lexis-set in the given UBE
and the probability that the given UBE will be the one advertising for
fake medicinal product. To the best of our knowledge and survey of
related literature, this is the first formal attempt for identification of
most frequently occurring lexis in such UBE by its textual analysis.
Finally, this is a sincere attempt to bring about alertness against and
mitigate the threat of such luring but fake UBE.
Abstract: In built-up structures, one of the effective ways of
dissipating unwanted vibration is to exploit the occurrence of slip at
the interfaces of structural laminates. The present work focuses on
the dynamic analysis of welded structures. A mathematical
formulation has been developed for the mechanism of slip damping
in layered and welded mild steel beams with unequal thickness
subjected to both periodic and non-periodic forces. It is observed that
a number of vital parameters such as; thickness ratio, pressure
distribution characteristics, relative slip and kinematic co-efficient of
friction at the interfaces, nature of exciting forces, length and
thickness of the beam specimen govern the damping characteristics of
these structures. Experimental verification has been carried out to
validate the analysis and study the effect of these parameters. The
developed damping model for the structure is found to be in fairly
good agreement with the measured data. Finally, the results of the
analysis are discussed and rationalized.
Abstract: To develop a process of extracting pixel values over the using of satellite remote sensing image data in Thailand. It is a very important and effective method of forecasting rainfall. This paper presents an approach for forecasting a possible rainfall area based on pixel values from remote sensing satellite images. First, a method uses an automatic extraction process of the pixel value data from the satellite image sequence. Then, a data process is designed to enable the inference of correlations between pixel value and possible rainfall occurrences. The result, when we have a high averaged pixel value of daily water vapor data, we will also have a high amount of daily rainfall. This suggests that the amount of averaged pixel values can be used as an indicator of raining events. There are some positive associations between pixel values of daily water vapor images and the amount of daily rainfall at each rain-gauge station throughout Thailand. The proposed approach was proven to be a helpful manual for rainfall forecasting from meteorologists by which using automated analyzing and interpreting process of meteorological remote sensing data.
Abstract: A large amount of valuable information is available in
plain text clinical reports. New techniques and technologies are
applied to extract information from these reports. In this study, we
developed a domain based software system to transform 600
Otorhinolaryngology discharge notes to a structured form for
extracting clinical data from the discharge notes. In order to decrease
the system process time discharge notes were transformed into a data
table after preprocessing. Several word lists were constituted to
identify common section in the discharge notes, including patient
history, age, problems, and diagnosis etc. N-gram method was used
for discovering terms co-Occurrences within each section. Using this
method a dataset of concept candidates has been generated for the
validation step, and then Predictive Apriori algorithm for Association
Rule Mining (ARM) was applied to validate candidate concepts.
Abstract: Microarray data profiles gene expression on a whole
genome scale, therefore, it provides a good way to study associations
between gene expression and occurrence or progression of cancer.
More and more researchers realized that microarray data is helpful
to predict cancer sample. However, the high dimension of gene
expressions is much larger than the sample size, which makes this
task very difficult. Therefore, how to identify the significant genes
causing cancer becomes emergency and also a hot and hard research
topic. Many feature selection algorithms have been proposed in
the past focusing on improving cancer predictive accuracy at the
expense of ignoring the correlations between the features. In this
work, a novel framework (named by SGS) is presented for stable gene
selection and efficient cancer prediction . The proposed framework
first performs clustering algorithm to find the gene groups where
genes in each group have higher correlation coefficient, and then
selects the significant genes in each group with Bayesian Lasso and
important gene groups with group Lasso, and finally builds prediction
model based on the shrinkage gene space with efficient classification
algorithm (such as, SVM, 1NN, Regression and etc.). Experiment
results on real world data show that the proposed framework often
outperforms the existing feature selection and prediction methods,
say SAM, IG and Lasso-type prediction model.
Abstract: In this paper, a Markovian risk model with two-type claims is considered. In such a risk model, the occurrences of the two type claims are described by two point processes {Ni(t), t ¸ 0}, i = 1, 2, where {Ni(t), t ¸ 0} is the number of jumps during the interval (0, t] for the Markov jump process {Xi(t), t ¸ 0} . The ruin probability ª(u) of a company facing such a risk model is mainly discussed. An integral equation satisfied by the ruin probability ª(u) is obtained and the bounds for the convergence rate of the ruin probability ª(u) are given by using key-renewal theorem.
Abstract: In this paper the effect of faults in the elements and
parts of discrete event systems is investigated. In the occurrence of
faults, some states of the system must be changed and some of them
must be forbidden. For this goal, different states of these elements are
examined and a model for fail-safe behavior of each state is
introduced. Replacing new models of the target elements in the
preliminary model by a systematic method, leads to a fail-safe
discrete event system.
Abstract: The breakdown strength characteristic of Low Density
Polyethylene films (LDPE) under DC voltage application and the
effect of water absorption have been studied. Mainly, our experiment
was investigated under two conditions; dry and heavy water
absorption. Under DC ramp voltage, the result found that the
breakdown strength under heavy water absorption has a lower value
than dry condition. In order to clarify the effect, the temperature rise of
film was observed using non contact thermograph until the occurrence
of the electrical breakdown and the conduction current of the sample
was also measured in correlation with the thermograph measurement.
From the observations, it was shown that under the heavy water
absorption, the hot spot in the samples appeared at lower voltage. At
the same voltage the temperature of the hot spot and conduction
current was higher than that under the dry condition. The measurement
result has a good correlation between the existence of a critical field
for conduction current and thermograph observation. In case of the
heavy water absorption, the occurrence of the threshold field was
earlier than the dry condition as result lead to higher of conduction
current and the temperature rise appears after threshold field was
significantly increased in increasing of field. The higher temperature
rise was caused by the higher current conduction as the result the
insulation leads to breakdown to the lower field application.
Abstract: An embedded system for SEU(single event upset) test
needs to be designed to prevent system failure by high-energy particles
during measuring SEU. SEU is a phenomenon in which the data is changed temporary in semiconductor device caused by high-energy particles. In this paper, we present an embedded system for
SRAM(static random access memory) SEU test. SRAMs are on the DUT(device under test) and it is separated from control board which
manages the DUT and measures the occurrence of SEU. It needs to
have considerations for preventing system failure while managing the
DUT and making an accurate measurement of SEUs. We measure the occurrence of SEUs from five different SRAMs at three different
cyclotron beam energies 30, 35, and 40MeV. The number of SEUs of SRAMs ranges from 3.75 to 261.00 in average.
Abstract: Road crashes not only claim lives and inflict injuries but also create economic burden to the society due to loss of productivity. The problem of deaths and injuries as a result of road traffic crashes is now acknowledged to be a global phenomenon with authorities in virtually all countries of the world concerned about the growth in the number of people killed and seriously injured on their roads. However, the road crash scenario of a developing country like Bangladesh is much worse comparing with this of developed countries. For developing proper countermeasures it is necessary to identify the factors affecting crash occurrences. The objectives of the study is to examine the effect of district wise road infrastructure, socioeconomic and demographic features on crash occurrence .The unit of analysis will be taken as individual district which has not been explored much in the past. Reported crash data obtained from Bangladesh Road Transport Authority (BRTA) from the year 2004 to 2010 are utilized to develop negative binomial model. The model result will reveal the effect of road length (both paved and unpaved), road infrastructure and several socio economic characteristics on district level crash frequency in Bangladesh.
Abstract: The main purpose of this study is to provide a detailed
statistical overview of the time and regional distribution, relative
timing occurrence of economic crises and government changes in 51
economies over the 1990–2007 periods. At the same time, the
predictive power of the economic crises on set government changes
will be examined using “signal approach".
The result showed that the percentage of government changes is
highest in transition economies (86 percent of observations) and
lowest in Latin American economies (39 percent of observations).
The percentages of government changes are same in both developed
and developing countries (43 percent of observations). However,
average crises per year (frequency of crises) are higher (lower) in
developing (developed) countries than developed (developing)
countries. Also, the predictive power of economic crises about the
onset of a government change is highest in Transition economies (81
percent) and lowest in Latin American countries (30 percent). The
predictive power of economic crises in developing countries (43
percent) is lower than developed countries (55 percent).
Abstract: Sickness absence represents a major economic and
social issue. Analysis of sick leave data is a recurrent challenge to analysts because of the complexity of the data structure which is
often time dependent, highly skewed and clumped at zero. Ignoring these features to make statistical inference is likely to be inefficient
and misguided. Traditional approaches do not address these problems. In this study, we discuss model methodologies in terms of statistical techniques for addressing the difficulties with sick leave data. We also introduce and demonstrate a new method by performing a longitudinal assessment of long-term absenteeism using
a large registration dataset as a working example available from the Helsinki Health Study for municipal employees from Finland during the period of 1990-1999. We present a comparative study on model
selection and a critical analysis of the temporal trends, the occurrence
and degree of long-term sickness absences among municipal employees. The strengths of this working example include the large
sample size over a long follow-up period providing strong evidence in supporting of the new model. Our main goal is to propose a way to
select an appropriate model and to introduce a new methodology for analysing sickness absence data as well as to demonstrate model
applicability to complicated longitudinal data.
Abstract: The traditional Failure Mode and Effects Analysis
(FMEA) uses Risk Priority Number (RPN) to evaluate the risk level
of a component or process. The RPN index is determined by
calculating the product of severity, occurrence and detection indexes.
The most critically debated disadvantage of this approach is that
various sets of these three indexes may produce an identical value of
RPN. This research paper seeks to address the drawbacks in
traditional FMEA and to propose a new approach to overcome these
shortcomings. The Risk Priority Code (RPC) is used to prioritize
failure modes, when two or more failure modes have the same RPN.
A new method is proposed to prioritize failure modes, when there is a
disagreement in ranking scale for severity, occurrence and detection.
An Analysis of Variance (ANOVA) is used to compare means of
RPN values. SPSS (Statistical Package for the Social Sciences)
statistical analysis package is used to analyze the data. The results
presented are based on two case studies. It is found that the proposed
new methodology/approach resolves the limitations of traditional
FMEA approach.
Abstract: Dengue fever has become a major concern for health
authorities all over the world particularly in the tropical countries.
These countries, in particular are experiencing the most worrying
outbreak of dengue fever (DF) and dengue haemorrhagic fever
(DHF). The DF and DHF epidemics, thus, have become the main
causes of hospital admissions and deaths in Malaysia. This paper,
therefore, attempts to examine the environmental factors that may
influence the recent dengue outbreak. The aim of this study is twofold,
firstly is to establish a statistical model to describe the
relationship between the number of dengue cases and a range of
explanatory variables and secondly, to identify the lag operator for
explanatory variables which affect the dengue incidence the most.
The explanatory variables involved include the level of cloud cover,
percentage of relative humidity, amount of rainfall, maximum
temperature, minimum temperature and wind speed. The Poisson and
Negative Binomial regression analyses were used in this study. The
results of the analyses on the 915 observations (daily data taken from
July 2006 to Dec 2008), reveal that the climatic factors comprising of
daily temperature and wind speed were found to significantly
influence the incidence of dengue fever after 2 and 3 weeks of their
occurrences. The effect of humidity, on the other hand, appears to be
significant only after 2 weeks.
Abstract: Aluminum alloy has an extensive range of industrial application due to its consistent mechanical properties and structural integrity. The heat treatment by precipitation technique affected the Magnesium, Silicon Manganese and copper crystals dissolved in the Aluminum alloy. The crystals dislocated to precipitate on the crystal’s boundaries of the Aluminum alloy when given a thermal energy increased its hardness. In this project various times and temperature were varied to find out the best combination of these variables to increase the precipitation of the metals on the Aluminum crystal’s boundaries which will lead to get the highest hardness. These specimens are then tested for their hardness and tensile strength. It is noticed that when the temperature increases, the precipitation increases and consequently the hardness increases. A threshold temperature value (264C0) of Aluminum alloy should not be reached due to the occurrence of recrystalization which causes the crystal to grow. This recrystalization process affected the ductility of the alloy and decrease hardness. In addition, and while increasing the temperature the alloy’s mechanical properties will decrease. The mechanical properties, namely tensile and hardness properties are investigated according to standard procedures. In this research, different temperature and time have been applied to increase hardening.The highest hardness at 100°c in 6 hours equals to 207.31 HBR, while at the same temperature and time the lowest elongation equals to 146.5.