Role of Association Rule Mining in Numerical Data Analysis

Numerical analysis naturally finds applications in all fields of engineering and the physical sciences, but in the 21st century, the life sciences and even the arts have adopted elements of scientific computations. The numerical data analysis became key process in research and development of all the fields [6]. In this paper we have made an attempt to analyze the specified numerical patterns with reference to the association rule mining techniques with minimum confidence and minimum support mining criteria. The extracted rules and analyzed results are graphically demonstrated. Association rules are a simple but very useful form of data mining that describe the probabilistic co-occurrence of certain events within a database [7]. They were originally designed to analyze market-basket data, in which the likelihood of items being purchased together within the same transactions are analyzed.

A Testbed for the Experiments Performed in Missing Value Treatments

The occurrence of missing values in database is a serious problem for Data Mining tasks, responsible for degrading data quality and accuracy of analyses. In this context, the area has shown a lack of standardization for experiments to treat missing values, introducing difficulties to the evaluation process among different researches due to the absence in the use of common parameters. This paper proposes a testbed intended to facilitate the experiments implementation and provide unbiased parameters using available datasets and suited performance metrics in order to optimize the evaluation and comparison between the state of art missing values treatments.

Prevalence of Epstein-Barr Virus Latent Membrane Protein-1 in Jordanian Patients with Hodgkin's Lymphoma and Non- Hodgkin's Lymphoma

The aim of this study was to estimate the frequency of EBV infection in Hodgkin's lymphoma (HL) and non-Hodgkin's lymphoma (NHL) occurring in Jordanian patients. A total of 55 patients with lymphoma were examined in this study. Of 55 patients, 30 and 25 were diagnosed as HL and NHL, respectively. The four HL subtypes were observed with the majority of the cases exhibited the mixed cellularity (MC) subtype followed by the nodular sclerosis (NS). The high grade was found to be the commonest subtype of NHL in our sample, followed by the low grade. The presence of EBV virus was detected by immunostating for expression of latent membrane protein-1 (LMP-1). The frequency of LMP-1 expression occurred more frequent in patients with HL (60.0%) than in patients with NHL (32.0%). The frequency of LMP-1 expression was also higher in patients with MC subtype (61.11%) than those patients with NS (28.57%). No age or gender difference in occurrence of EBV infection was observed among patient with HL. By contrast, the prevalence of EBV infection in NHL patients aged below 50 was lower (16.66%) than in NHL patients aged 50 or above (46.15%). In addition, EBV infection was more frequent in females with NHL (38.46%) than in male with NHL (25%). In NHL cases, the frequency of EBV infection in intermediate grade (60.0%) was high when compared with frequency of low (25%) or high grades (25%). In conclusion, analysis of LMP-1 expression indicates an important role for this viral oncogene in the pathogenesis of EBV-associated malignant lymphomas. These data also support the previous findings that people with EBV may develop lymphoma and that efforts to maintain low lymphoma should be considered for people with EBV infection.

Exploring the Combinatorics of Motif Alignments Foraccurately Computing E-values from P-values

In biological and biomedical research motif finding tools are important in locating regulatory elements in DNA sequences. There are many such motif finding tools available, which often yield position weight matrices and significance indicators. These indicators, p-values and E-values, describe the likelihood that a motif alignment is generated by the background process, and the expected number of occurrences of the motif in the data set, respectively. The various tools often estimate these indicators differently, making them not directly comparable. One approach for comparing motifs from different tools, is computing the E-value as the product of the p-value and the number of possible alignments in the data set. In this paper we explore the combinatorics of the motif alignment models OOPS, ZOOPS, and ANR, and propose a generic algorithm for computing the number of possible combinations accurately. We also show that using the wrong alignment model can give E-values that significantly diverge from their true values.

Improved Zero Text Watermarking Algorithm against Meaning Preserving Attacks

Internet is largely composed of textual contents and a huge volume of digital contents gets floated over the Internet daily. The ease of information sharing and re-production has made it difficult to preserve author-s copyright. Digital watermarking came up as a solution for copyright protection of plain text problem after 1993. In this paper, we propose a zero text watermarking algorithm based on occurrence frequency of non-vowel ASCII characters and words for copyright protection of plain text. The embedding algorithm makes use of frequency non-vowel ASCII characters and words to generate a specialized author key. The extraction algorithm uses this key to extract watermark, hence identify the original copyright owner. Experimental results illustrate the effectiveness of the proposed algorithm on text encountering meaning preserving attacks performed by five independent attackers.

Identification of Most Frequently Occurring Lexis in Body-enhancement Medicinal Unsolicited Bulk e-mails

e-mail has become an important means of electronic communication but the viability of its usage is marred by Unsolicited Bulk e-mail (UBE) messages. UBE consists of many types like pornographic, virus infected and 'cry-for-help' messages as well as fake and fraudulent offers for jobs, winnings and medicines. UBE poses technical and socio-economic challenges to usage of e-mails. To meet this challenge and combat this menace, we need to understand UBE. Towards this end, the current paper presents a content-based textual analysis of more than 2700 body enhancement medicinal UBE. Technically, this is an application of Text Parsing and Tokenization for an un-structured textual document and we approach it using Bag Of Words (BOW) and Vector Space Document Model techniques. We have attempted to identify the most frequently occurring lexis in the UBE documents that advertise various products for body enhancement. The analysis of such top 100 lexis is also presented. We exhibit the relationship between occurrence of a word from the identified lexis-set in the given UBE and the probability that the given UBE will be the one advertising for fake medicinal product. To the best of our knowledge and survey of related literature, this is the first formal attempt for identification of most frequently occurring lexis in such UBE by its textual analysis. Finally, this is a sincere attempt to bring about alertness against and mitigate the threat of such luring but fake UBE.

Vibration Attenuation in Layered and Welded Beams with Unequal Thickness

In built-up structures, one of the effective ways of dissipating unwanted vibration is to exploit the occurrence of slip at the interfaces of structural laminates. The present work focuses on the dynamic analysis of welded structures. A mathematical formulation has been developed for the mechanism of slip damping in layered and welded mild steel beams with unequal thickness subjected to both periodic and non-periodic forces. It is observed that a number of vital parameters such as; thickness ratio, pressure distribution characteristics, relative slip and kinematic co-efficient of friction at the interfaces, nature of exciting forces, length and thickness of the beam specimen govern the damping characteristics of these structures. Experimental verification has been carried out to validate the analysis and study the effect of these parameters. The developed damping model for the structure is found to be in fairly good agreement with the measured data. Finally, the results of the analysis are discussed and rationalized.

The Pixel Value Data Approach for Rainfall Forecasting Based on GOES-9 Satellite Image Sequence Analysis

To develop a process of extracting pixel values over the using of satellite remote sensing image data in Thailand. It is a very important and effective method of forecasting rainfall. This paper presents an approach for forecasting a possible rainfall area based on pixel values from remote sensing satellite images. First, a method uses an automatic extraction process of the pixel value data from the satellite image sequence. Then, a data process is designed to enable the inference of correlations between pixel value and possible rainfall occurrences. The result, when we have a high averaged pixel value of daily water vapor data, we will also have a high amount of daily rainfall. This suggests that the amount of averaged pixel values can be used as an indicator of raining events. There are some positive associations between pixel values of daily water vapor images and the amount of daily rainfall at each rain-gauge station throughout Thailand. The proposed approach was proven to be a helpful manual for rainfall forecasting from meteorologists by which using automated analyzing and interpreting process of meteorological remote sensing data.

Concepts Extraction from Discharge Notes using Association Rule Mining

A large amount of valuable information is available in plain text clinical reports. New techniques and technologies are applied to extract information from these reports. In this study, we developed a domain based software system to transform 600 Otorhinolaryngology discharge notes to a structured form for extracting clinical data from the discharge notes. In order to decrease the system process time discharge notes were transformed into a data table after preprocessing. Several word lists were constituted to identify common section in the discharge notes, including patient history, age, problems, and diagnosis etc. N-gram method was used for discovering terms co-Occurrences within each section. Using this method a dataset of concept candidates has been generated for the validation step, and then Predictive Apriori algorithm for Association Rule Mining (ARM) was applied to validate candidate concepts.

Novel Hybrid Method for Gene Selection and Cancer Prediction

Microarray data profiles gene expression on a whole genome scale, therefore, it provides a good way to study associations between gene expression and occurrence or progression of cancer. More and more researchers realized that microarray data is helpful to predict cancer sample. However, the high dimension of gene expressions is much larger than the sample size, which makes this task very difficult. Therefore, how to identify the significant genes causing cancer becomes emergency and also a hot and hard research topic. Many feature selection algorithms have been proposed in the past focusing on improving cancer predictive accuracy at the expense of ignoring the correlations between the features. In this work, a novel framework (named by SGS) is presented for stable gene selection and efficient cancer prediction . The proposed framework first performs clustering algorithm to find the gene groups where genes in each group have higher correlation coefficient, and then selects the significant genes in each group with Bayesian Lasso and important gene groups with group Lasso, and finally builds prediction model based on the shrinkage gene space with efficient classification algorithm (such as, SVM, 1NN, Regression and etc.). Experiment results on real world data show that the proposed framework often outperforms the existing feature selection and prediction methods, say SAM, IG and Lasso-type prediction model.

Ruin Probability for a Markovian Risk Model with Two-type Claims

In this paper, a Markovian risk model with two-type claims is considered. In such a risk model, the occurrences of the two type claims are described by two point processes {Ni(t), t ¸ 0}, i = 1, 2, where {Ni(t), t ¸ 0} is the number of jumps during the interval (0, t] for the Markov jump process {Xi(t), t ¸ 0} . The ruin probability ª(u) of a company facing such a risk model is mainly discussed. An integral equation satisfied by the ruin probability ª(u) is obtained and the bounds for the convergence rate of the ruin probability ª(u) are given by using key-renewal theorem.

Fail-safe Modeling of Discrete Event Systems using Petri Nets

In this paper the effect of faults in the elements and parts of discrete event systems is investigated. In the occurrence of faults, some states of the system must be changed and some of them must be forbidden. For this goal, different states of these elements are examined and a model for fail-safe behavior of each state is introduced. Replacing new models of the target elements in the preliminary model by a systematic method, leads to a fail-safe discrete event system.

Breakdown of LDPE Film under Heavy Water Absorption

The breakdown strength characteristic of Low Density Polyethylene films (LDPE) under DC voltage application and the effect of water absorption have been studied. Mainly, our experiment was investigated under two conditions; dry and heavy water absorption. Under DC ramp voltage, the result found that the breakdown strength under heavy water absorption has a lower value than dry condition. In order to clarify the effect, the temperature rise of film was observed using non contact thermograph until the occurrence of the electrical breakdown and the conduction current of the sample was also measured in correlation with the thermograph measurement. From the observations, it was shown that under the heavy water absorption, the hot spot in the samples appeared at lower voltage. At the same voltage the temperature of the hot spot and conduction current was higher than that under the dry condition. The measurement result has a good correlation between the existence of a critical field for conduction current and thermograph observation. In case of the heavy water absorption, the occurrence of the threshold field was earlier than the dry condition as result lead to higher of conduction current and the temperature rise appears after threshold field was significantly increased in increasing of field. The higher temperature rise was caused by the higher current conduction as the result the insulation leads to breakdown to the lower field application.

An Embedded System Design for SRAM SEU Test

An embedded system for SEU(single event upset) test needs to be designed to prevent system failure by high-energy particles during measuring SEU. SEU is a phenomenon in which the data is changed temporary in semiconductor device caused by high-energy particles. In this paper, we present an embedded system for SRAM(static random access memory) SEU test. SRAMs are on the DUT(device under test) and it is separated from control board which manages the DUT and measures the occurrence of SEU. It needs to have considerations for preventing system failure while managing the DUT and making an accurate measurement of SEUs. We measure the occurrence of SEUs from five different SRAMs at three different cyclotron beam energies 30, 35, and 40MeV. The number of SEUs of SRAMs ranges from 3.75 to 261.00 in average.

Study on the Effect of Road Infrastructure, Socio-Economic and Demographic Features on Road Crashes in Bangladesh

Road crashes not only claim lives and inflict injuries but also create economic burden to the society due to loss of productivity. The problem of deaths and injuries as a result of road traffic crashes is now acknowledged to be a global phenomenon with authorities in virtually all countries of the world concerned about the growth in the number of people killed and seriously injured on their roads. However, the road crash scenario of a developing country like Bangladesh is much worse comparing with this of developed countries. For developing proper countermeasures it is necessary to identify the factors affecting crash occurrences. The objectives of the study is to examine the effect of district wise road infrastructure, socioeconomic and demographic features on crash occurrence .The unit of analysis will be taken as individual district which has not been explored much in the past. Reported crash data obtained from Bangladesh Road Transport Authority (BRTA) from the year 2004 to 2010 are utilized to develop negative binomial model. The model result will reveal the effect of road length (both paved and unpaved), road infrastructure and several socio economic characteristics on district level crash frequency in Bangladesh.

Are Economic Crises and Government Changes Related? A Descriptive Statistic Analysis

The main purpose of this study is to provide a detailed statistical overview of the time and regional distribution, relative timing occurrence of economic crises and government changes in 51 economies over the 1990–2007 periods. At the same time, the predictive power of the economic crises on set government changes will be examined using “signal approach". The result showed that the percentage of government changes is highest in transition economies (86 percent of observations) and lowest in Latin American economies (39 percent of observations). The percentages of government changes are same in both developed and developing countries (43 percent of observations). However, average crises per year (frequency of crises) are higher (lower) in developing (developed) countries than developed (developing) countries. Also, the predictive power of economic crises about the onset of a government change is highest in Transition economies (81 percent) and lowest in Latin American countries (30 percent). The predictive power of economic crises in developing countries (43 percent) is lower than developed countries (55 percent).

On Methodologies for Analysing Sickness Absence Data: An Insight into a New Method

Sickness absence represents a major economic and social issue. Analysis of sick leave data is a recurrent challenge to analysts because of the complexity of the data structure which is often time dependent, highly skewed and clumped at zero. Ignoring these features to make statistical inference is likely to be inefficient and misguided. Traditional approaches do not address these problems. In this study, we discuss model methodologies in terms of statistical techniques for addressing the difficulties with sick leave data. We also introduce and demonstrate a new method by performing a longitudinal assessment of long-term absenteeism using a large registration dataset as a working example available from the Helsinki Health Study for municipal employees from Finland during the period of 1990-1999. We present a comparative study on model selection and a critical analysis of the temporal trends, the occurrence and degree of long-term sickness absences among municipal employees. The strengths of this working example include the large sample size over a long follow-up period providing strong evidence in supporting of the new model. Our main goal is to propose a way to select an appropriate model and to introduce a new methodology for analysing sickness absence data as well as to demonstrate model applicability to complicated longitudinal data.

A New Approach for Prioritization of Failure Modes in Design FMEA using ANOVA

The traditional Failure Mode and Effects Analysis (FMEA) uses Risk Priority Number (RPN) to evaluate the risk level of a component or process. The RPN index is determined by calculating the product of severity, occurrence and detection indexes. The most critically debated disadvantage of this approach is that various sets of these three indexes may produce an identical value of RPN. This research paper seeks to address the drawbacks in traditional FMEA and to propose a new approach to overcome these shortcomings. The Risk Priority Code (RPC) is used to prioritize failure modes, when two or more failure modes have the same RPN. A new method is proposed to prioritize failure modes, when there is a disagreement in ranking scale for severity, occurrence and detection. An Analysis of Variance (ANOVA) is used to compare means of RPN values. SPSS (Statistical Package for the Social Sciences) statistical analysis package is used to analyze the data. The results presented are based on two case studies. It is found that the proposed new methodology/approach resolves the limitations of traditional FMEA approach.

Modelling Dengue Fever (DF) and Dengue Haemorrhagic Fever (DHF) Outbreak Using Poisson and Negative Binomial Model

Dengue fever has become a major concern for health authorities all over the world particularly in the tropical countries. These countries, in particular are experiencing the most worrying outbreak of dengue fever (DF) and dengue haemorrhagic fever (DHF). The DF and DHF epidemics, thus, have become the main causes of hospital admissions and deaths in Malaysia. This paper, therefore, attempts to examine the environmental factors that may influence the recent dengue outbreak. The aim of this study is twofold, firstly is to establish a statistical model to describe the relationship between the number of dengue cases and a range of explanatory variables and secondly, to identify the lag operator for explanatory variables which affect the dengue incidence the most. The explanatory variables involved include the level of cloud cover, percentage of relative humidity, amount of rainfall, maximum temperature, minimum temperature and wind speed. The Poisson and Negative Binomial regression analyses were used in this study. The results of the analyses on the 915 observations (daily data taken from July 2006 to Dec 2008), reveal that the climatic factors comprising of daily temperature and wind speed were found to significantly influence the incidence of dengue fever after 2 and 3 weeks of their occurrences. The effect of humidity, on the other hand, appears to be significant only after 2 weeks.

Heat Treatment of Aluminum Alloy 7449

Aluminum alloy has an extensive range of industrial application due to its consistent mechanical properties and structural integrity. The heat treatment by precipitation technique affected the Magnesium, Silicon Manganese and copper crystals dissolved in the Aluminum alloy. The crystals dislocated to precipitate on the crystal’s boundaries of the Aluminum alloy when given a thermal energy increased its hardness. In this project various times and temperature were varied to find out the best combination of these variables to increase the precipitation of the metals on the Aluminum crystal’s boundaries which will lead to get the highest hardness. These specimens are then tested for their hardness and tensile strength. It is noticed that when the temperature increases, the precipitation increases and consequently the hardness increases. A threshold temperature value (264C0) of Aluminum alloy should not be reached due to the occurrence of recrystalization which causes the crystal to grow. This recrystalization process affected the ductility of the alloy and decrease hardness. In addition, and while increasing the temperature the alloy’s mechanical properties will decrease. The mechanical properties, namely tensile and hardness properties are investigated according to standard procedures. In this research, different temperature and time have been applied to increase hardening.The highest hardness at 100°c in 6 hours equals to 207.31 HBR, while at the same temperature and time the lowest elongation equals to 146.5.