Three Dimensional Analysis of Pollution Dispersion in Street Canyon

Three dimensional simulations are carried out to estimate the effect of wind direction, wind speed and geometry on the flow and dispersion of vehicular pollutant in a street canyon. The pollutant sources are motor vehicles passing between the two buildings. Suitable emission factors for petrol and diesel vehicles at varying vehicle speed are used for the estimation of the rate of emission from the streets. The dispersion of automobile pollutant released from the street is simulated by introducing vehicular emission source term as a fixed-flux boundary condition at the ground level over the road. The emission source term is suitably calculated by adopting emission factors from literature for varying conditions of street traffic. It is observed that increase in wind angle disturbs the symmetric pattern of pollution distribution along the street length. The concentration increases in the far end of the street as compared to the near end.

Balancing Neural Trees to Improve Classification Performance

In this paper, a neural tree (NT) classifier having a simple perceptron at each node is considered. A new concept for making a balanced tree is applied in the learning algorithm of the tree. At each node, if the perceptron classification is not accurate and unbalanced, then it is replaced by a new perceptron. This separates the training set in such a way that almost the equal number of patterns fall into each of the classes. Moreover, each perceptron is trained only for the classes which are present at respective node and ignore other classes. Splitting nodes are employed into the neural tree architecture to divide the training set when the current perceptron node repeats the same classification of the parent node. A new error function based on the depth of the tree is introduced to reduce the computational time for the training of a perceptron. Experiments are performed to check the efficiency and encouraging results are obtained in terms of accuracy and computational costs.

Food Safety and Perceived Risk: A Case Study of Khao San Road, Bangkok, Thailand

Food safety is an important concern for holiday makers in foreign and unfamiliar tourist destinations. In fact, risk from food in these tourist destinations has an influence on tourist perception. This risk can potentially affect physical health and lead to an inability to pursue planned activities. The objective of this paper was to compare foreign tourists- demographics including gender, age and education level, with the level of perceived risk towards food safety. A total of 222 foreign tourists during their stay at Khao San Road in Bangkok were used as the sample. Independent- samples ttest, analysis of variance, and Least Significant Difference or LSD post hoc test were utilized. The findings revealed that there were few demographic differences in level of perceived risk among the foreign tourists. The post hoc test indicated a significant difference among the old and the young tourists, and between the higher and lower level of education. Ranks of tourists- perceived risk towards food safety unveiled some interesting results. Tourists- perceived risk of food safety in established restaurants can be ranked as i) cleanliness of dining utensils, ii) sanitation of food preparation area, and iii) cleanliness of food seasoning and ingredients. Whereas, the tourists- perceived risk of food safety in street food and drink can be ranked as i) cleanliness of stalls and pushcarts, ii) cleanliness of food sold, and iii) personal hygiene of street food hawkers or vendors.

Application of Machine Learning Methods to Online Test Error Detection in Semiconductor Test

As in today's semiconductor industries test costs can make up to 50 percent of the total production costs, an efficient test error detection becomes more and more important. In this paper, we present a new machine learning approach to test error detection that should provide a faster recognition of test system faults as well as an improved test error recall. The key idea is to learn a classifier ensemble, detecting typical test error patterns in wafer test results immediately after finishing these tests. Since test error detection has not yet been discussed in the machine learning community, we define central problem-relevant terms and provide an analysis of important domain properties. Finally, we present comparative studies reflecting the failure detection performance of three individual classifiers and three ensemble methods based upon them. As base classifiers we chose a decision tree learner, a support vector machine and a Bayesian network, while the compared ensemble methods were simple and weighted majority vote as well as stacking. For the evaluation, we used cross validation and a specially designed practical simulation. By implementing our approach in a semiconductor test department for the observation of two products, we proofed its practical applicability.

Speaker Independent Quranic Recognizer Basedon Maximum Likelihood Linear Regression

An automatic speech recognition system for the formal Arabic language is needed. The Quran is the most formal spoken book in Arabic, it is spoken all over the world. In this research, an automatic speech recognizer for Quranic based speakerindependent was developed and tested. The system was developed based on the tri-phone Hidden Markov Model and Maximum Likelihood Linear Regression (MLLR). The MLLR computes a set of transformations which reduces the mismatch between an initial model set and the adaptation data. It uses the regression class tree, as well as, estimates a set of linear transformations for the mean and variance parameters of a Gaussian mixture HMM system. The 30th Chapter of the Quran, with five of the most famous readers of the Quran, was used for the training and testing of the data. The chapter includes about 2000 distinct words. The advantages of using the Quranic verses as the database in this developed recognizer are the uniqueness of the words and the high level of orderliness between verses. The level of accuracy from the tested data ranged 68 to 85%.

Extraction of Symbolic Rules from Artificial Neural Networks

Although backpropagation ANNs generally predict better than decision trees do for pattern classification problems, they are often regarded as black boxes, i.e., their predictions cannot be explained as those of decision trees. In many applications, it is desirable to extract knowledge from trained ANNs for the users to gain a better understanding of how the networks solve the problems. A new rule extraction algorithm, called rule extraction from artificial neural networks (REANN) is proposed and implemented to extract symbolic rules from ANNs. A standard three-layer feedforward ANN is the basis of the algorithm. A four-phase training algorithm is proposed for backpropagation learning. Explicitness of the extracted rules is supported by comparing them to the symbolic rules generated by other methods. Extracted rules are comparable with other methods in terms of number of rules, average number of conditions for a rule, and predictive accuracy. Extensive experimental studies on several benchmarks classification problems, such as breast cancer, iris, diabetes, and season classification problems, demonstrate the effectiveness of the proposed approach with good generalization ability.

A Consistency Protocol Multi-Layer for Replicas Management in Large Scale Systems

Large scale systems such as computational Grid is a distributed computing infrastructure that can provide globally available network resources. The evolution of information processing systems in Data Grid is characterized by a strong decentralization of data in several fields whose objective is to ensure the availability and the reliability of the data in the reason to provide a fault tolerance and scalability, which cannot be possible only with the use of the techniques of replication. Unfortunately the use of these techniques has a height cost, because it is necessary to maintain consistency between the distributed data. Nevertheless, to agree to live with certain imperfections can improve the performance of the system by improving competition. In this paper, we propose a multi-layer protocol combining the pessimistic and optimistic approaches conceived for the data consistency maintenance in large scale systems. Our approach is based on a hierarchical representation model with tree layers, whose objective is with double vocation, because it initially makes it possible to reduce response times compared to completely pessimistic approach and it the second time to improve the quality of service compared to an optimistic approach.

Ensembling Classifiers – An Application toImage Data Classification from Cherenkov Telescope Experiment

Ensemble learning algorithms such as AdaBoost and Bagging have been in active research and shown improvements in classification results for several benchmarking data sets with mainly decision trees as their base classifiers. In this paper we experiment to apply these Meta learning techniques with classifiers such as random forests, neural networks and support vector machines. The data sets are from MAGIC, a Cherenkov telescope experiment. The task is to classify gamma signals from overwhelmingly hadron and muon signals representing a rare class classification problem. We compare the individual classifiers with their ensemble counterparts and discuss the results. WEKA a wonderful tool for machine learning has been used for making the experiments.

Context-aware Recommender Systems using Data Mining Techniques

This study proposes a novel recommender system to provide the advertisements of context-aware services. Our proposed model is designed to apply a modified collaborative filtering (CF) algorithm with regard to the several dimensions for the personalization of mobile devices – location, time and the user-s needs type. In particular, we employ a classification rule to understand user-s needs type using a decision tree algorithm. In addition, we collect primary data from the mobile phone users and apply them to the proposed model to validate its effectiveness. Experimental results show that the proposed system makes more accurate and satisfactory advertisements than comparative systems.

Evaluation of Algorithms for Sequential Decision in Biosonar Target Classification

A sequential decision problem, based on the task ofidentifying the species of trees given acoustic echo data collectedfrom them, is considered with well-known stochastic classifiers,including single and mixture Gaussian models. Echoes are processedwith a preprocessing stage based on a model of mammalian cochlearfiltering, using a new discrete low-pass filter characteristic. Stoppingtime performance of the sequential decision process is evaluated andcompared. It is observed that the new low pass filter processingresults in faster sequential decisions.

Enhanced-Delivery Overlay Multicasting Scheme by Optimizing Bandwidth and Latency Discrepancy Ratios

With optimized bandwidth and latency discrepancy ratios, Node Gain Scores (NGSs) are determined and used as a basis for shaping the max-heap overlay. The NGSs - determined as the respective bandwidth-latency-products - govern the construction of max-heap-form overlays. Each NGS is earned as a synergy of discrepancy ratio of the bandwidth requested with respect to the estimated available bandwidth, and latency discrepancy ratio between the nodes and the source node. The tree leads to enhanceddelivery overlay multicasting – increasing packet delivery which could, otherwise, be hindered by induced packet loss occurring in other schemes not considering the synergy of these parameters on placing the nodes on the overlays. The NGS is a function of four main parameters – estimated available bandwidth, Ba; individual node's requested bandwidth, Br; proposed node latency to its prospective parent (Lp); and suggested best latency as advised by source node (Lb). Bandwidth discrepancy ratio (BDR) and latency discrepancy ratio (LDR) carry weights of α and (1,000 - α ) , respectively, with arbitrary chosen α ranging between 0 and 1,000 to ensure that the NGS values, used as node IDs, maintain a good possibility of uniqueness and balance between the most critical factor between the BDR and the LDR. A max-heap-form tree is constructed with assumption that all nodes possess NGS less than the source node. To maintain a sense of load balance, children of each level's siblings are evenly distributed such that a node can not accept a second child, and so on, until all its siblings able to do so, have already acquired the same number of children. That is so logically done from left to right in a conceptual overlay tree. The records of the pair-wise approximate available bandwidths as measured by a pathChirp scheme at individual nodes are maintained. Evaluation measures as compared to other schemes – Bandwidth Aware multicaSt architecturE (BASE), Tree Building Control Protocol (TBCP), and Host Multicast Tree Protocol (HMTP) - have been conducted. This new scheme generally performs better in terms of trade-off between packet delivery ratio; link stress; control overhead; and end-to-end delays.

Experiments on Element and Document Statistics for XML Retrieval

This paper presents an information retrieval model on XML documents based on tree matching. Queries and documents are represented by extended trees. An extended tree is built starting from the original tree, with additional weighted virtual links between each node and its indirect descendants allowing to directly reach each descendant. Therefore only one level separates between each node and its indirect descendants. This allows to compare the user query and the document with flexibility and with respect to the structural constraints of the query. The content of each node is very important to decide weither a document element is relevant or not, thus the content should be taken into account in the retrieval process. We separate between the structure-based and the content-based retrieval processes. The content-based score of each node is commonly based on the well-known Tf × Idf criteria. In this paper, we compare between this criteria and another one we call Tf × Ief. The comparison is based on some experiments into a dataset provided by INEX1 to show the effectiveness of our approach on one hand and those of both weighting functions on the other.

Tree Based Decomposition of Sunspot Images

Solar sunspot rotation, latitudinal bands are studied based on intelligent computation methods. A combination of image fusion method with together tree decomposition is used to obtain quantitative values about the latitudes of trajectories on sun surface that sunspots rotate around them. Daily solar images taken with SOlar and Heliospheric (SOHO) satellite are fused for each month separately .The result of fused image is decomposed with Quad Tree decomposition method in order to achieve the precise information about latitudes of sunspot trajectories. Such analysis is useful for gathering information about the regions on sun surface and coordinates in space that is more expose to solar geomagnetic storms, tremendous flares and hot plasma gases permeate interplanetary space and help human to serve their technical systems. Here sunspot images in September, November and October in 2001 are used for studying the magnetic behavior of sun.

Learning Classifier Systems Approach for Automated Discovery of Crisp and Fuzzy Hierarchical Production Rules

This research presents a system for post processing of data that takes mined flat rules as input and discovers crisp as well as fuzzy hierarchical structures using Learning Classifier System approach. Learning Classifier System (LCS) is basically a machine learning technique that combines evolutionary computing, reinforcement learning, supervised or unsupervised learning and heuristics to produce adaptive systems. A LCS learns by interacting with an environment from which it receives feedback in the form of numerical reward. Learning is achieved by trying to maximize the amount of reward received. Crisp description for a concept usually cannot represent human knowledge completely and practically. In the proposed Learning Classifier System initial population is constructed as a random collection of HPR–trees (related production rules) and crisp / fuzzy hierarchies are evolved. A fuzzy subsumption relation is suggested for the proposed system and based on Subsumption Matrix (SM), a suitable fitness function is proposed. Suitable genetic operators are proposed for the chosen chromosome representation method. For implementing reinforcement a suitable reward and punishment scheme is also proposed. Experimental results are presented to demonstrate the performance of the proposed system.

A Hybrid Scheme for on-Line Diagnostic Decision Making Using Optimal Data Representation and Filtering Technique

The early diagnostic decision making in industrial processes is absolutely necessary to produce high quality final products. It helps to provide early warning for a special event in a process, and finding its assignable cause can be obtained. This work presents a hybrid diagnostic schmes for batch processes. Nonlinear representation of raw process data is combined with classification tree techniques. The nonlinear kernel-based dimension reduction is executed for nonlinear classification decision boundaries for fault classes. In order to enhance diagnosis performance for batch processes, filtering of the data is performed to get rid of the irrelevant information of the process data. For the diagnosis performance of several representation, filtering, and future observation estimation methods, four diagnostic schemes are evaluated. In this work, the performance of the presented diagnosis schemes is demonstrated using batch process data.

Using Spectral Vectors and M-Tree for Graph Clustering and Searching in Graph Databases of Protein Structures

In this paper, we represent protein structure by using graph. A protein structure database will become a graph database. Each graph is represented by a spectral vector. We use Jacobi rotation algorithm to calculate the eigenvalues of the normalized Laplacian representation of adjacency matrix of graph. To measure the similarity between two graphs, we calculate the Euclidean distance between two graph spectral vectors. To cluster the graphs, we use M-tree with the Euclidean distance to cluster spectral vectors. Besides, M-tree can be used for graph searching in graph database. Our proposal method was tested with graph database of 100 graphs representing 100 protein structures downloaded from Protein Data Bank (PDB) and we compare the result with the SCOP hierarchical structure.

Comparative Analysis of Measures to Secure Two-Way Evacuation Routes for Vulnerable People during Large Disasters in a Historic Area

Historic preservation areas are extremely vulnerable to disasters because they are home to many vulnerable people and contain many closely spaced wooden houses. However, the narrow streets in these regions have historic meaning, which means that they cannot be widened and can become blocked easily during large disasters. Here, we describe our efforts to establish a methodology for the planning of evacuation route sin such historic preservation areas. In particular, this study aims to clarify the effectiveness of measures intended to secure two-way evacuation routes for vulnerable people during large disasters in a historic area preserved under the Cultural Properties Protection Law, Japan.

Generating Concept Trees from Dynamic Self-organizing Map

Self-organizing map (SOM) provides both clustering and visualization capabilities in mining data. Dynamic self-organizing maps such as Growing Self-organizing Map (GSOM) has been developed to overcome the problem of fixed structure in SOM to enable better representation of the discovered patterns. However, in mining large datasets or historical data the hierarchical structure of the data is also useful to view the cluster formation at different levels of abstraction. In this paper, we present a technique to generate concept trees from the GSOM. The formation of tree from different spread factor values of GSOM is also investigated and the quality of the trees analyzed. The results show that concept trees can be generated from GSOM, thus, eliminating the need for re-clustering of the data from scratch to obtain a hierarchical view of the data under study.

Artificial Intelligence Support for Interferon Treatment Decision in Chronic Hepatitis B

Chronic hepatitis B can evolve to cirrhosis and liver cancer. Interferon is the only effective treatment, for carefully selected patients, but it is very expensive. Some of the selection criteria are based on liver biopsy, an invasive, costly and painful medical procedure. Therefore, developing efficient non-invasive selection systems, could be in the patients benefit and also save money. We investigated the possibility to create intelligent systems to assist the Interferon therapeutical decision, mainly by predicting with acceptable accuracy the results of the biopsy. We used a knowledge discovery in integrated medical data - imaging, clinical, and laboratory data. The resulted intelligent systems, tested on 500 patients with chronic hepatitis B, based on C5.0 decision trees and boosting, predict with 100% accuracy the results of the liver biopsy. Also, by integrating the other patients selection criteria, they offer a non-invasive support for the correct Interferon therapeutic decision. To our best knowledge, these decision systems outperformed all similar systems published in the literature, and offer a realistic opportunity to replace liver biopsy in this medical context.

Architectural Stratification and Woody Species Diversity of a Subtropical Forest Grown in a Limestone Habitat in Okinawa Island, Japan

The forest stand consisted of four layers. The species composition between the third and the bottom layers was almost similar, whereas it was almost exclusive between the top and the lower three layers. The values of Shannon-s index H' and Pielou-s index J ' tended to increase from the bottom layer upward, except for H' -value of the top layer. The values of H' and J ' were 4.21 bit and 0.73, respectively, for the total stand. High woody species diversity of the forest depended on large trees in the upper layers, which trend was different from a subtropical evergreen broadleaf forest grown in silicate habitat in the northern part of Okinawa Island. The spatial distribution of trees was overlapped between the third and the bottom layers, whereas it was independent or slightly exclusive between the top and the lower three layers. Mean tree weight of each layer decreased from the top toward the bottom layer, whereas the corresponding tree density increased from the top downward. This relationship was analogous to the process of self-thinning plant populations.