Speedup Breadth-First Search by Graph Ordering

Breadth-First Search (BFS) is a core graph algorithm that is widely used for graph analysis. As it is frequently used in many graph applications, improving the BFS performance is essential. In this paper, we present a graph ordering method that could reorder the graph nodes to achieve better data locality, thus, improving the BFS performance. Our method is based on an observation that the sibling relationships will dominate the cache access pattern during the BFS traversal. Therefore, we propose a frequency-based model to construct the graph order. First, we optimize the graph order according to the nodes’ visit frequency. Nodes with high visit frequency will be processed in priority. Second, we try to maximize the child nodes’ overlap layer by layer. As it is proved to be NP-hard, we propose a heuristic method that could greatly reduce the preprocessing overheads.We conduct extensive experiments on 16 real-world datasets. The result shows that our method could achieve comparable performance with the state-of-the-art methods while the graph ordering overheads are only about 1/15.

The Applicability of Distillation as an Alternative Nuclear Reprocessing Method

A customized two-stage model has been developed to simulate, analyse, and visualize distillation of actinides as a useful alternative low-pressure separation method in the nuclear recycling cases. Under the most optimal conditions of idealized thermodynamic equilibrium stages and under total reflux of distillate the investigated cases of chloride systems for the separation of such actinides are (A) UCl4-CsCl-PuCl3 and (B) ThCl4-NaCl-PuCl3. Simulatively, uranium tetrachloride in case A is successfully separated by distillation into a six-stage distillation column, and thorium tetrachloride from case B into an eight-stage distillation column. For this, a permissible mole fraction value of 1E-06 has been assumed for the residual impurification degree. With further separation effort of eleven to seventeen required separation stages, the monochlorides of plutonium trichloride from both systems A and B are simulatively shown to be separated as high pure distillation products.

Improving the Performance of Deep Learning in Facial Emotion Recognition with Image Sharpening

We as humans use words with accompanying visual and facial cues to communicate effectively. Classifying facial emotion using computer vision methodologies has been an active research area in the computer vision field. In this paper, we propose a simple method for facial expression recognition that enhances accuracy. We tested our method on the FER-2013 dataset that contains static images. Instead of using Histogram equalization to preprocess the dataset, we used Unsharp Mask to emphasize texture and details and sharpened the edges. We also used ImageDataGenerator from Keras library for data augmentation. Then we used Convolutional Neural Networks (CNN) model to classify the images into 7 different facial expressions, yielding an accuracy of 69.46% on the test set. Our results show that using image preprocessing such as the sharpening technique for a CNN model can improve the performance, even when the CNN model is relatively simple.

Deep Learning Based, End-to-End Metaphor Detection in Greek with Recurrent and Convolutional Neural Networks

This paper presents and benchmarks a number of end-to-end Deep Learning based models for metaphor detection in Greek. We combine Convolutional Neural Networks and Recurrent Neural Networks with representation learning to bear on the metaphor detection problem for the Greek language. The models presented achieve exceptional accuracy scores, significantly improving the previous state-of-the-art results, which had already achieved accuracy 0.82. Furthermore, no special preprocessing, feature engineering or linguistic knowledge is used in this work. The methods presented achieve accuracy of 0.92 and F-score 0.92 with Convolutional Neural Networks (CNNs) and bidirectional Long Short Term Memory networks (LSTMs). Comparable results of 0.91 accuracy and 0.91 F-score are also achieved with bidirectional Gated Recurrent Units (GRUs) and Convolutional Recurrent Neural Nets (CRNNs). The models are trained and evaluated only on the basis of training tuples, the related sentences and their labels. The outcome is a state-of-the-art collection of metaphor detection models, trained on limited labelled resources, which can be extended to other languages and similar tasks.

Neural Network Based Speech to Text in Malay Language

Speech to text in Malay language is a system that converts Malay speech into text. The Malay language recognition system is still limited, thus, this paper aims to investigate the performance of ten Malay words obtained from the online Malay news. The methodology consists of three stages, which are preprocessing, feature extraction, and speech classification. In preprocessing stage, the speech samples are filtered using pre emphasis. After that, feature extraction method is applied to the samples using Mel Frequency Cepstrum Coefficient (MFCC). Lastly, speech classification is performed using Feedforward Neural Network (FFNN). The accuracy of the classification is further investigated based on the hidden layer size. From experimentation, the classifier with 40 hidden neurons shows the highest classification rate which is 94%.  

Measuring Text-Based Semantics Relatedness Using WordNet

Measuring semantic similarity between texts is calculating semantic relatedness between texts using various techniques. Our web application (Measuring Relatedness of Concepts-MRC) allows user to input two text corpuses and get semantic similarity percentage between both using WordNet. Our application goes through five stages for the computation of semantic relatedness. Those stages are: Preprocessing (extracts keywords from content), Feature Extraction (classification of words into Parts-of-Speech), Synonyms Extraction (retrieves synonyms against each keyword), Measuring Similarity (using keywords and synonyms, similarity is measured) and Visualization (graphical representation of similarity measure). Hence the user can measure similarity on basis of features as well. The end result is a percentage score and the word(s) which form the basis of similarity between both texts with use of different tools on same platform. In future work we look forward for a Web as a live corpus application that provides a simpler and user friendly tool to compare documents and extract useful information.

Measuring Text-Based Semantics Relatedness Using WordNet

Measuring semantic similarity between texts is calculating semantic relatedness between texts using various techniques. Our web application (Measuring Relatedness of Concepts-MRC) allows user to input two text corpuses and get semantic similarity percentage between both using WordNet. Our application goes through five stages for the computation of semantic relatedness. Those stages are: Preprocessing (extracts keywords from content), Feature Extraction (classification of words into Parts-of-Speech), Synonyms Extraction (retrieves synonyms against each keyword), Measuring Similarity (using keywords and synonyms, similarity is measured) and Visualization (graphical representation of similarity measure). Hence the user can measure similarity on basis of features as well. The end result is a percentage score and the word(s) which form the basis of similarity between both texts with use of different tools on same platform. In future work we look forward for a Web as a live corpus application that provides a simpler and user friendly tool to compare documents and extract useful information.

Waste Management in a Hot Laboratory of Japan Atomic Energy Agency – 1: Overview and Activities in Chemical Processing Facility

Chemical Processing Facility of Japan Atomic Energy Agency is a basic research field for advanced back-end technology developments with using actual high-level radioactive materials such as irradiated fuels from the fast reactor, high-level liquid waste from reprocessing plant. In the nature of a research facility, various kinds of chemical reagents have been offered for fundamental tests. Most of them were treated properly and stored in the liquid waste vessel equipped in the facility, but some were not treated and remained at the experimental space as a kind of legacy waste. It is required to treat the waste in safety. On the other hand, we formulated the Medium- and Long-Term Management Plan of Japan Atomic Energy Agency Facilities. This comprehensive plan considers Chemical Processing Facility as one of the facilities to be decommissioned. Even if the plan is executed, treatment of the “legacy” waste beforehand must be a necessary step for decommissioning operation. Under this circumstance, we launched a collaborative research project called the STRAD project, which stands for Systematic Treatment of Radioactive liquid waste for Decommissioning, in order to develop the treatment processes for wastes of the nuclear research facility. In this project, decomposition methods of chemicals causing a troublesome phenomenon such as corrosion and explosion have been developed and there is a prospect of their decomposition in the facility by simple method. And solidification of aqueous or organic liquid wastes after the decomposition has been studied by adding cement or coagulants. Furthermore, we treated experimental tools of various materials with making an effort to stabilize and to compact them before the package into the waste container. It is expected to decrease the number of transportation of the solid waste and widen the operation space. Some achievements of these studies will be shown in this paper. The project is expected to contribute beneficial waste management outcome that can be shared world widely.

Waste Management in a Hot Laboratory of Japan Atomic Energy Agency – 3: Volume Reduction and Stabilization of Solid Waste

In the Japan Atomic Energy Agency, three types of experimental research, advanced reactor fuel reprocessing, radioactive waste disposal, and nuclear fuel cycle technology, have been carried out at the Chemical Processing Facility. The facility has generated high level radioactive liquid and solid wastes in hot cells. The high level radioactive solid waste is divided into three main categories, a flammable waste, a non-flammable waste, and a solid reagent waste. A plastic product is categorized into the flammable waste and molten with a heating mantle. The non-flammable waste is cut with a band saw machine for reducing the volume. Among the solid reagent waste, a used adsorbent after the experiments is heated, and an extractant is decomposed for its stabilization. All high level radioactive solid wastes in the hot cells are packed in a high level radioactive solid waste can. The high level radioactive solid waste can is transported to the 2nd High Active Solid Waste Storage in the Tokai Reprocessing Plant in the Japan Atomic Energy Agency.

Optimized Brain Computer Interface System for Unspoken Speech Recognition: Role of Wernicke Area

In this paper, we propose an optimized brain computer interface (BCI) system for unspoken speech recognition, based on the fact that the constructions of unspoken words rely strongly on the Wernicke area, situated in the temporal lobe. Our BCI system has four modules: (i) the EEG Acquisition module based on a non-invasive headset with 14 electrodes; (ii) the Preprocessing module to remove noise and artifacts, using the Common Average Reference method; (iii) the Features Extraction module, using Wavelet Packet Transform (WPT); (iv) the Classification module based on a one-hidden layer artificial neural network. The present study consists of comparing the recognition accuracy of 5 Arabic words, when using all the headset electrodes or only the 4 electrodes situated near the Wernicke area, as well as the selection effect of the subbands produced by the WPT module. After applying the articial neural network on the produced database, we obtain, on the test dataset, an accuracy of 83.4% with all the electrodes and all the subbands of 8 levels of the WPT decomposition. However, by using only the 4 electrodes near Wernicke Area and the 6 middle subbands of the WPT, we obtain a high reduction of the dataset size, equal to approximately 19% of the total dataset, with 67.5% of accuracy rate. This reduction appears particularly important to improve the design of a low cost and simple to use BCI, trained for several words.

A Comprehensive Evaluation of Supervised Machine Learning for the Phase Identification Problem

Power distribution circuits undergo frequent network topology changes that are often left undocumented. As a result, the documentation of a circuit’s connectivity becomes inaccurate with time. The lack of reliable circuit connectivity information is one of the biggest obstacles to model, monitor, and control modern distribution systems. To enhance the reliability and efficiency of electric power distribution systems, the circuit’s connectivity information must be updated periodically. This paper focuses on one critical component of a distribution circuit’s topology - the secondary transformer to phase association. This topology component describes the set of phase lines that feed power to a given secondary transformer (and therefore a given group of power consumers). Finding the documentation of this component is call Phase Identification, and is typically performed with physical measurements. These measurements can take time lengths on the order of several months, but with supervised learning, the time length can be reduced significantly. This paper compares several such methods applied to Phase Identification for a large range of real distribution circuits, describes a method of training data selection, describes preprocessing steps unique to the Phase Identification problem, and ultimately describes a method which obtains high accuracy (> 96% in most cases, > 92% in the worst case) using only 5% of the measurements typically used for Phase Identification.

Optimized Preprocessing for Accurate and Efficient Bioassay Prediction with Machine Learning Algorithms

Bioassay is the measurement of the potency of a chemical substance by its effect on a living animal or plant tissue. Bioassay data and chemical structures from pharmacokinetic and drug metabolism screening are mined from and housed in multiple databases. Bioassay prediction is calculated accordingly to determine further advancement. This paper proposes a four-step preprocessing of datasets for improving the bioassay predictions. The first step is instance selection in which dataset is categorized into training, testing, and validation sets. The second step is discretization that partitions the data in consideration of accuracy vs. precision. The third step is normalization where data are normalized between 0 and 1 for subsequent machine learning processing. The fourth step is feature selection where key chemical properties and attributes are generated. The streamlined results are then analyzed for the prediction of effectiveness by various machine learning algorithms including Pipeline Pilot, R, Weka, and Excel. Experiments and evaluations reveal the effectiveness of various combination of preprocessing steps and machine learning algorithms in more consistent and accurate prediction.

Exploration of Hydrocarbon Unconventional Accumulations in the Argillaceous Formation of the Autochthonous Miocene Succession in the Carpathian Foredeep

The article shows results of the project which aims at evaluating possibilities of effective development and exploitation of natural gas from argillaceous series of the Autochthonous Miocene in the Carpathian Foredeep. To achieve the objective, the research team develop a world-trend based but unique methodology of processing and interpretation, adjusted to data, local variations and petroleum characteristics of the area. In order to determine the zones in which maximum volumes of hydrocarbons might have been generated and preserved as shale gas reservoirs, as well as to identify the most preferable well sites where largest gas accumulations are anticipated a number of task were accomplished. Evaluation of petrophysical properties and hydrocarbon saturation of the Miocene complex is based on laboratory measurements as well as interpretation of well-logs and archival data. The studies apply mercury porosimetry (MICP), micro CT and nuclear magnetic resonance imaging (using the Rock Core Analyzer). For prospective location (e.g. central part of Carpathian Foredeep – Brzesko-Wojnicz area) reprocessing and reinterpretation of detailed seismic survey data with the use of integrated geophysical investigations has been made. Construction of quantitative, structural and parametric models for selected areas of the Carpathian Foredeep is performed on the basis of integrated, detailed 3D computer models. Modeling are carried on with the Schlumberger’s Petrel software. Finally, prospective zones are spatially contoured in a form of regional 3D grid, which will be framework for generation modelling and comprehensive parametric mapping, allowing for spatial identification of the most prospective zones of unconventional gas accumulation in the Carpathian Foredeep. Preliminary results of research works indicate a potentially prospective area for occurrence of unconventional gas accumulations in the Polish part of Carpathian Foredeep.

Identity Verification Using k-NN Classifiers and Autistic Genetic Data

DNA data have been used in forensics for decades. However, current research looks at using the DNA as a biometric identity verification modality. The goal is to improve the speed of identification. We aim at using gene data that was initially used for autism detection to find if and how accurate is this data for identification applications. Mainly our goal is to find if our data preprocessing technique yields data useful as a biometric identification tool. We experiment with using the nearest neighbor classifier to identify subjects. Results show that optimal classification rate is achieved when the test set is corrupted by normally distributed noise with zero mean and standard deviation of 1. The classification rate is close to optimal at higher noise standard deviation reaching 3. This shows that the data can be used for identity verification with high accuracy using a simple classifier such as the k-nearest neighbor (k-NN). 

A Study of Mode Choice Model Improvement Considering Age Grouping

The purpose of this study is providing an improved mode choice model considering parameters including age grouping of prime-aged and old age. In this study, 2010 Household Travel Survey data were used and improper samples were removed through the analysis. Chosen alternative, date of birth, mode, origin code, destination code, departure time, and arrival time are considered from Household Travel Survey. By preprocessing data, travel time, travel cost, mode, and ratio of people aged 45 to 55 years, 55 to 65 years and over 65 years were calculated. After the manipulation, the mode choice model was constructed using LIMDEP by maximum likelihood estimation. A significance test was conducted for nine parameters, three age groups for three modes. Then the test was conducted again for the mode choice model with significant parameters, travel cost variable and travel time variable. As a result of the model estimation, as the age increases, the preference for the car decreases and the preference for the bus increases. This study is meaningful in that the individual and households characteristics are applied to the aggregate model.

Application of Data Mining Techniques for Tourism Knowledge Discovery

Application of five implementations of three data mining classification techniques was experimented for extracting important insights from tourism data. The aim was to find out the best performing algorithm among the compared ones for tourism knowledge discovery. Knowledge discovery process from data was used as a process model. 10-fold cross validation method is used for testing purpose. Various data preprocessing activities were performed to get the final dataset for model building. Classification models of the selected algorithms were built with different scenarios on the preprocessed dataset. The outperformed algorithm tourism dataset was Random Forest (76%) before applying information gain based attribute selection and J48 (C4.5) (75%) after selection of top relevant attributes to the class (target) attribute. In terms of time for model building, attribute selection improves the efficiency of all algorithms. Artificial Neural Network (multilayer perceptron) showed the highest improvement (90%). The rules extracted from the decision tree model are presented, which showed intricate, non-trivial knowledge/insight that would otherwise not be discovered by simple statistical analysis with mediocre accuracy of the machine using classification algorithms.

The Implementation of the Javanese Lettered-Manuscript Image Preprocessing Stage Model on the Batak Lettered-Manuscript Image

This paper presents the results of a study to test whether the Javanese character manuscript image preprocessing model that have been more widely applied, can also be applied to segment of the Batak characters manuscripts. The treatment process begins by converting the input image into a binary image. After the binary image is cleaned of noise, then the segmentation lines using projection profile is conducted. If unclear histogram projection is found, then the smoothing process before production indexes line segments is conducted. For each line image which has been produced, then the segmentation scripts in the line is applied, with regard of the connectivity between pixels which making up the letters that there is no characters are truncated. From the results of manuscript preprocessing system prototype testing, it is obtained the information about the system truth percentage value on pieces of Pustaka Batak Podani Ma AjiMamisinon manuscript ranged from 65% to 87.68% with a confidence level of 95%. The value indicates the truth percentage shown the initial processing model in Javanese characters manuscript image can be applied also to the image of the Batak characters manuscript.

Computer Modeling and Plant-Wide Dynamic Simulation for Industrial Flare Minimization

Flaring emissions during abnormal operating conditions such as plant start-ups, shut-downs, and upsets in chemical process industries (CPI) are usually significant. Flare minimization can help to save raw material and energy for CPI plants, and to improve local environmental sustainability. In this paper, a systematic methodology based on plant-wide dynamic simulation is presented for CPI plant flare minimizations under abnormal operating conditions. Since off-specification emission sources are inevitable during abnormal operating conditions, to significantly reduce flaring emission in a CPI plant, they must be either recycled to the upstream process for online reuse, or stored somewhere temporarily for future reprocessing, when the CPI plant manufacturing returns to stable operation. Thus, the off-spec products could be reused instead of being flared. This can be achieved through the identification of viable design and operational strategies during normal and abnormal operations through plant-wide dynamic scheduling, simulation, and optimization. The proposed study includes three stages of simulation works: (i) developing and validating a steady-state model of a CPI plant; (ii) transiting the obtained steady-state plant model to the dynamic modeling environment; and refining and validating the plant dynamic model; and (iii) developing flare minimization strategies for abnormal operating conditions of a CPI plant via a validated plant-wide dynamic model. This cost-effective methodology has two main merits: (i) employing large-scale dynamic modeling and simulations for industrial flare minimization, which involves various unit models for modeling hundreds of CPI plant facilities; (ii) dealing with critical abnormal operating conditions of CPI plants such as plant start-up and shut-down. Two virtual case studies on flare minimizations for start-up operation (over 50% of emission savings) and shut-down operation (over 70% of emission savings) of an ethylene plant have been employed to demonstrate the efficacy of the proposed study.

Urban Growth Analysis Using Multi-Temporal Satellite Images, Non-stationary Decomposition Methods and Stochastic Modeling

Remotely sensed data are a significant source for monitoring and updating databases for land use/cover. Nowadays, changes detection of urban area has been a subject of intensive researches. Timely and accurate data on spatio-temporal changes of urban areas are therefore required. The data extracted from multi-temporal satellite images are usually non-stationary. In fact, the changes evolve in time and space. This paper is an attempt to propose a methodology for changes detection in urban area by combining a non-stationary decomposition method and stochastic modeling. We consider as input of our methodology a sequence of satellite images I1, I2, … In at different periods (t = 1, 2, ..., n). Firstly, a preprocessing of multi-temporal satellite images is applied. (e.g. radiometric, atmospheric and geometric). The systematic study of global urban expansion in our methodology can be approached in two ways: The first considers the urban area as one same object as opposed to non-urban areas (e.g. vegetation, bare soil and water). The objective is to extract the urban mask. The second one aims to obtain a more knowledge of urban area, distinguishing different types of tissue within the urban area. In order to validate our approach, we used a database of Tres Cantos-Madrid in Spain, which is derived from Landsat for a period (from January 2004 to July 2013) by collecting two frames per year at a spatial resolution of 25 meters. The obtained results show the effectiveness of our method.

Automatic Music Score Recognition System Using Digital Image Processing

Music has always been an integral part of human’s daily lives. But, for the most people, reading musical score and turning it into melody is not easy. This study aims to develop an Automatic music score recognition system using digital image processing, which can be used to read and analyze musical score images automatically. The technical approaches included: (1) staff region segmentation; (2) image preprocessing; (3) note recognition; and (4) accidental and rest recognition. Digital image processing techniques (e.g., horizontal /vertical projections, connected component labeling, morphological processing, template matching, etc.) were applied according to musical notes, accidents, and rests in staff notations. Preliminary results showed that our system could achieve detection and recognition rates of 96.3% and 91.7%, respectively. In conclusion, we presented an effective automated musical score recognition system that could be integrated in a system with a media player to play music/songs given input images of musical score. Ultimately, this system could also be incorporated in applications for mobile devices as a learning tool, such that a music player could learn to play music/songs.