Development of Subjective Measures of Interestingness: From Unexpectedness to Shocking

Knowledge Discovery of Databases (KDD) is the process of extracting previously unknown but useful and significant information from large massive volume of databases. Data Mining is a stage in the entire process of KDD which applies an algorithm to extract interesting patterns. Usually, such algorithms generate huge volume of patterns. These patterns have to be evaluated by using interestingness measures to reflect the user requirements. Interestingness is defined in different ways, (i) Objective measures (ii) Subjective measures. Objective measures such as support and confidence extract meaningful patterns based on the structure of the patterns, while subjective measures such as unexpectedness and novelty reflect the user perspective. In this report, we try to brief the more widely spread and successful subjective measures and propose a new subjective measure of interestingness, i.e. shocking.

Actionable Rules: Issues and New Directions

Knowledge Discovery in Databases (KDD) is the process of extracting previously unknown, hidden and interesting patterns from a huge amount of data stored in databases. Data mining is a stage of the KDD process that aims at selecting and applying a particular data mining algorithm to extract an interesting and useful knowledge. It is highly expected that data mining methods will find interesting patterns according to some measures, from databases. It is of vital importance to define good measures of interestingness that would allow the system to discover only the useful patterns. Measures of interestingness are divided into objective and subjective measures. Objective measures are those that depend only on the structure of a pattern and which can be quantified by using statistical methods. While, subjective measures depend only on the subjectivity and understandability of the user who examine the patterns. These subjective measures are further divided into actionable, unexpected and novel. The key issues that faces data mining community is how to make actions on the basis of discovered knowledge. For a pattern to be actionable, the user subjectivity is captured by providing his/her background knowledge about domain. Here, we consider the actionability of the discovered knowledge as a measure of interestingness and raise important issues which need to be addressed to discover actionable knowledge.

AGHAZ : An Expert System Based approach for the Translation of English to Urdu

Machine Translation (MT 3) of English text to its Urdu equivalent is a difficult challenge. Lot of attempts has been made, but a few limited solutions are provided till now. We present a direct approach, using an expert system to translate English text into its equivalent Urdu, using The Unicode Standard, Version 4.0 (ISBN 0-321-18578-1) Range: 0600–06FF. The expert system works with a knowledge base that contains grammatical patterns of English and Urdu, as well as a tense and gender-aware dictionary of Urdu words (with their English equivalents).

Mining Sequential Patterns Using Hybrid Evolutionary Algorithm

Mining Sequential Patterns in large databases has become an important data mining task with broad applications. It is an important task in data mining field, which describes potential sequenced relationships among items in a database. There are many different algorithms introduced for this task. Conventional algorithms can find the exact optimal Sequential Pattern rule but it takes a long time, particularly when they are applied on large databases. Nowadays, some evolutionary algorithms, such as Particle Swarm Optimization and Genetic Algorithm, were proposed and have been applied to solve this problem. This paper will introduce a new kind of hybrid evolutionary algorithm that combines Genetic Algorithm (GA) with Particle Swarm Optimization (PSO) to mine Sequential Pattern, in order to improve the speed of evolutionary algorithms convergence. This algorithm is referred to as SP-GAPSO.

Analysis of Metallothionein Gene MT1A (rs11076161) and MT2A (rs10636) Polymorphisms as a Molecular Marker in Type 2 Diabetes Mellitus among Malay Population

Type 2 diabetes mellitus (T2DM) is a complex metabolic disorder that characterized by the presence of high glucose in blood that cause from insulin resistance and insufficiency due to deterioration β-cell Langerhans functions. T2DM is commonly caused by the combination of inherited genetic variations as well as our own lifestyle. Metallothionein (MT) is a known cysteine-rich protein responsible in helping zinc homeostasis which is important in insulin signaling and secretion as well as protection our body from reactive oxygen species (ROS). MT scavenged ROS and free radicals in our body happen to be one of the reasons of T2DM and its complications. The objective of this study was to investigate the association of MT1A and MT2A polymorphisms between T2DM and control subjects among Malay populations. This study involved 150 T2DM and 120 Healthy individuals of Malay ethnic with mixed genders. The genomic DNA was extracted from buccal cells and amplified for MT1A and MT2A loci; the 347bp and 238bp banding patterns were respectively produced by mean of the Polymerase Chain Reaction (PCR). The PCR products were digested with Mlucl and Tsp451 restriction enzymes respectively and producing fragments lengths of (158/189/347bp) and (103/135/238bp) respectively. The ANOVA test was conducted and it shown that there was a significant difference between diabetic and control subjects for age, BMI, WHR, SBP, FPG, HBA1C, LDL, TG, TC and family history with (P0.05). The genotype frequency for AA, AG and GG of MT1A polymorphisms was 72.7%, 22.7% and 4.7% in cases and 15%, 55% and 30% in control respectively. As for MT2A, genotype frequency of GG, GC and CC was 42.7%, 27.3% and 30% in case and 5%, 40% and 55% for control respectively. Both polymorphisms show significant difference between two investigated groups with (P=0.000). The Post hoc test was conducted and shows a significant difference between the genotypes within each polymorphism (P=0. 000). The MT1A and MT2A polymorphisms were believed to be the reliable molecular markers to distinguish the T2DM subjects from healthy individuals in Malay populations.

Study of Electro-Optical Properties of ZnS Nanoparticles Prepared by Colloidal Particles Method

ZnS nanoparticles of different size have been synthesized using a colloidal particles method. Zns nanoparticles prepared with capping agent (mercaptoethanol) then were characterized using X-ray diffraction (XRD) and UV-Vis spectroscopy. The particle size of the nanoparticles calculated from the XRD patterns has been found in the range 1.85-2.44nm. Absorption spectra have been obtained using UV-Vis spectrophotometer to find the optical band gap and the obtained values have been founded to being range 3.83-4.59eV. It was also found that energy band gap increase with the increase in molar capping agent solution.

Implementation of On-Line Cutting Stock Problem on NC Machines

Introduction applicability of high-speed cutting stock problem (CSP) is presented in this paper. Due to the orders continued coming in from various on-line ways for a professional cutting company, to stay competitive, such a business has to focus on sustained production at high levels. In others words, operators have to keep the machine running to stay ahead of the pack. Therefore, the continuous stock cutting problem with setup is proposed to minimize the cutting time and pattern changing time to meet the on-line given demand. In this paper, a novel method is proposed to solve the problem directly by using cutting patterns directly. A major advantage of the proposed method in series on-line production is that the system can adjust the cutting plan according to the floating orders. Examples with multiple items are demonstrated. The results show considerable efficiency and reliability in high-speed cutting of CSP.

Identification of Complex Sense-antisense Gene's Module on 17q11.2 Associated with Breast Cancer Aggressiveness and Patient's Survival

Sense-antisense gene pair (SAGP) is a pair of two oppositely transcribed genes sharing a common region on a chromosome. In the mammalian genomes, SAGPs can be organized in more complex sense-antisense gene architectures (CSAGA) in which at least one gene could share loci with two or more antisense partners. Many dozens of CSAGAs can be found in the human genome. However, CSAGAs have not been systematically identified and characterized in context of their role in human diseases including cancers. In this work we characterize the structural-functional properties of a cluster of 5 genes –TMEM97, IFT20, TNFAIP1, POLDIP2 and TMEM199, termed TNFAIP1 / POLDIP2 module. This cluster is organized as CSAGA in cytoband 17q11.2. Affymetrix U133A&B expression data of two large cohorts (410 atients, in total) of breast cancer patients and patient survival data were used. For the both studied cohorts, we demonstrate (i) strong and reproducible transcriptional co-regulatory patterns of genes of TNFAIP1/POLDIP2 module in breast cancer cell subtypes and (ii) significant associations of TNFAIP1/POLDIP2 CSAGA with amplification of the CSAGA region in breast cancer, (ii) cancer aggressiveness (e.g. genetic grades) and (iv) disease free patient-s survival. Moreover, gene pairs of this module demonstrate strong synergetic effect in the prognosis of time of breast cancer relapse. We suggest that TNFAIP1/ POLDIP2 cluster can be considered as a novel type of structural-functional gene modules in the human genome.

Effects of Annealing Treatment on Optical Properties of Anatase TiO2 Thin Films

In this investigation, anatase TiO2 thin films were grown by radio frequency magnetron sputtering on glass substrates at a high sputtering pressure and room temperature. The anatase films were then annealed at 300-600 °C in air for a period of 1 hour. To examine the structure and morphology of the films, X-ray diffraction (XRD) and atomic force microscopy (AFM) methods were used respectively. From X-ray diffraction patterns of the TiO2 films, it was found that the as-deposited film showed some differences compared with the annealed films and the intensities of the peaks of the crystalline phase increased with the increase of annealing temperature. From AFM images, the distinct variations in the morphology of the thin films were also observed. The optical constants were characterized using the transmission spectra of the films obtained by UV-VIS-IR spectrophotometer. Besides, optical thickness of the film deposited at room temperature was calculated and cross-checked by taking a cross-sectional image through SEM. The optical band gaps were evaluated through Tauc model. It was observed that TiO2 films produced at room temperatures exhibited high visible transmittance and transmittance decreased slightly with the increase of annealing temperatures. The films were found to be crystalline having anatase phase. The refractive index of the films was found from 2.31-2.35 in the visible range. The extinction coefficient was nearly zero in the visible range and was found to increase with annealing temperature. The allowed indirect optical band gap of the films was estimated to be in the range from 3.39 to 3.42 eV which showed a small variation. The allowed direct band gap was found to increase from 3.67 to 3.72 eV. The porosity was also found to decrease at a higher annealing temperature making the film compact and dense.

The Optimization of an Intelligent Traffic Congestion Level Classification from Motorists- Judgments on Vehicle's Moving Patterns

We proposed a technique to identify road traffic congestion levels from velocity of mobile sensors with high accuracy and consistent with motorists- judgments. The data collection utilized a GPS device, a webcam, and an opinion survey. Human perceptions were used to rate the traffic congestion levels into three levels: light, heavy, and jam. Then the ratings and velocity were fed into a decision tree learning model (J48). We successfully extracted vehicle movement patterns to feed into the learning model using a sliding windows technique. The parameters capturing the vehicle moving patterns and the windows size were heuristically optimized. The model achieved accuracy as high as 99.68%. By implementing the model on the existing traffic report systems, the reports will cover comprehensive areas. The proposed method can be applied to any parts of the world.

FIR Filter Design via Linear Complementarity Problem, Messy Genetic Algorithm, and Ising Messy Genetic Algorithm

In this paper the design of maximally flat linear phase finite impulse response (FIR) filters is considered. The problem is handled with totally two different approaches. The first one is completely deterministic numerical approach where the problem is formulated as a Linear Complementarity Problem (LCP). The other one is based on a combination of Markov Random Fields (MRF's) approach with messy genetic algorithm (MGA). Markov Random Fields (MRFs) are a class of probabilistic models that have been applied for many years to the analysis of visual patterns or textures. Our objective is to establish MRFs as an interesting approach to modeling messy genetic algorithms. We establish a theoretical result that every genetic algorithm problem can be characterized in terms of a MRF model. This allows us to construct an explicit probabilistic model of the MGA fitness function and introduce the Ising MGA. Experimentations done with Ising MGA are less costly than those done with standard MGA since much less computations are involved. The least computations of all is for the LCP. Results of the LCP, random search, random seeded search, MGA, and Ising MGA are discussed.

Customer Knowledge and Service Development, the Web 2.0 Role in Co-production

The paper is concerned with relationships between SSME and ICTs and focuses on the role of Web 2.0 tools in the service development process. The research presented aims at exploring how collaborative technologies can support and improve service processes, highlighting customer centrality and value coproduction. The core idea of the paper is the centrality of user participation and the collaborative technologies as enabling factors; Wikipedia is analyzed as an example. The result of such analysis is the identification and description of a pattern characterising specific services in which users collaborate by means of web tools with value co-producers during the service process. The pattern of collaborative co-production concerning several categories of services including knowledge based services is then discussed.

Shear-Layer Instabilities of a Pulsed Stack-Issued Transverse Jet

Shear-layer instabilities of a pulsed stack-issued transverse jet were studied experimentally in a wind tunnel. Jet pulsations were induced by means of acoustic excitation. Streak pictures of the smoke-flow patterns illuminated by the laser-light sheet in the median plane were recorded with a high-speed digital camera. Instantaneous velocities of the shear-layer instabilities in the flow were digitized by a hot-wire anemometer. By analyzing the streak pictures of the smoke-flow visualization, three characteristic flow modes, synchronized flapping jet, transition, and synchronized shear-layer vortices, are identified in the shear layer of the pulsed stack-issued transverse jet at various excitation Strouhal numbers. The shear-layer instabilities of the pulsed stack-issued transverse jet are synchronized by acoustic excitation except for transition mode. In transition flow mode, the shear-layer vortices would exhibit a frequency that would be twice as great as the acoustic excitation frequency.

Unsteady Flow between Two Concentric Rotating Spheres along with Uniform Transpiration

In this study, the numerical solution of unsteady flow between two concentric rotating spheres with suction and blowing at their boundaries is presented. The spheres are rotating about a common axis of rotation while their angular velocities are constant. The Navier-Stokes equations are solved by employing the finite difference method and implicit scheme. The resulting flow patterns are presented for various values of the flow parameters including rotational Reynolds number Re , and a blowing/suction Reynolds number Rew . Viscous torques at the inner and the outer spheres are calculated, too. It is seen that increasing the amount of suction and blowing decrease the size of eddies generated in the annulus.

Navigation Patterns Mining Approach based on Expectation Maximization Algorithm

Web usage mining algorithms have been widely utilized for modeling user web navigation behavior. In this study we advance a model for mining of user-s navigation pattern. The model makes user model based on expectation-maximization (EM) algorithm.An EM algorithm is used in statistics for finding maximum likelihood estimates of parameters in probabilistic models, where the model depends on unobserved latent variables. The experimental results represent that by decreasing the number of clusters, the log likelihood converges toward lower values and probability of the largest cluster will be decreased while the number of the clusters increases in each treatment.

Family Structure between Muslim and Santal Communities in Rural Bangladesh

Family structure that is culturally constructed in every society is the basic unit of social structure. Purpose of the study was to compare family structure, including marriage, residence, family size, type, role sharing, authority, and communication patterns between Muslim and Santal communities in rural Bangladesh. For this we assumed that family structure with the elements was significantly different between the two communities in rural Bangladesh. In so doing, 288 active couples (145 for Muslim and 143 for Santal) selected by cluster random sampling were intensively interviewed with a semi-structured questionnaire method. The results of Pearson Chi-Squire Test reveal that there were significant differences in the family structure followed by the two communities in the study area. Further cross-cultural study should be done on why family structure varies between the communities in Bangladesh.

A Review on Technology Forecasting Methods and Their Application Area

Technology changes have been acknowledged as a critical factor in determining competitiveness of organization. Under such environment, the right anticipation of technology change has been of huge importance in strategic planning. To monitor technology change, technology forecasting (TF) is frequently utilized. In academic perspective, TF has received great attention for a long time. However, few researches have been conducted to provide overview of the TF literature. Even though some studies deals with review of TF research, they generally focused on type and characteristics of various TF, so hardly provides information about patterns of TF research and which TF method is used in certain technology industry. Accordingly, this study profile developments in and patterns of scholarly research in TF over time. Also, this study investigates which technology industries have used certain TF method and identifies their relationships. This study will help in understanding TF research trend and their application area.

Photomechanical Analysis of Wooden Testing Bodies under Flexural Loadings

Application of wood in rural construction is diffused all around the world since remote times. However, its inclusion in structural design deserves strong support from broad knowledge of material properties. The pertinent literature reveals the application of optical methods in determining the complete field displacement on bodies exhibiting regular as well as irregular surfaces. The use of moiré techniques in experimental mechanics consists in analyzing the patterns generated on the body surface before and after deformation. The objective of this research work is to study the qualitative deformation behavior of wooden testing specimens under specific loading situations. The experiment setup follows the literature description of shadow moiré methods. Results indicate strong anisotropy influence of the generated displacement field. Important qualitative as well as quantitative stress and strain distribution were obtained wooden members which are applicable to rural constructions.

Wedding in Thailand: Traditional and Business

This study is purely qualitative. The objectives of this study can be identified as two main factors: traditionally explanation and economically studying. The study of weddings, both in traditional beauty and the aggressively strong competitive in the wedding business market has limited population of the study only Thailand internal wedding consumers. Focus group with the new marriage couple and in-depth interview with fully experiences wedding businessman were used. Traditionally, Thai weddings are very various; therefore, the recent patterns were briefly concluded to be processes of traditional Thai wedding will be revealed and explained then give more details in the formal procedures.  Economically, weddings business are related to many types of businesses from catering business, hospitality and tourism business, pre-wedding photography, and the complete full-serviced wedding organizer for examples. The situations, changes and obstacles of the wedding related business will be discussed.

M2LGP: Mining Multiple Level Gradual Patterns

Gradual patterns have been studied for many years as they contain precious information. They have been integrated in many expert systems and rule-based systems, for instance to reason on knowledge such as “the greater the number of turns, the greater the number of car crashes”. In many cases, this knowledge has been considered as a rule “the greater the number of turns → the greater the number of car crashes” Historically, works have thus been focused on the representation of such rules, studying how implication could be defined, especially fuzzy implication. These rules were defined by experts who were in charge to describe the systems they were working on in order to turn them to operate automatically. More recently, approaches have been proposed in order to mine databases for automatically discovering such knowledge. Several approaches have been studied, the main scientific topics being: how to determine what is an relevant gradual pattern, and how to discover them as efficiently as possible (in terms of both memory and CPU usage). However, in some cases, end-users are not interested in raw level knowledge, and are rather interested in trends. Moreover, it may be the case that no relevant pattern can be discovered at a low level of granularity (e.g. city), whereas some can be discovered at a higher level (e.g. county). In this paper, we thus extend gradual pattern approaches in order to consider multiple level gradual patterns. For this purpose, we consider two aggregation policies, namely horizontal and vertical.