Two Class Motor Imagery Classification via Wave Atom Sub-Bants

The goal of motor image brain computer interface research is to create a link between the central nervous system and a computer or device. The most important signal for brain-computer interface is the electroencephalogram. The aim of this research is to explore a set of effective features from EEG signals, separated into frequency bands, using wave atom sub-bands to discriminate right and left-hand motor imagery signals. Over the transform coefficients, feature vectors are constructed for each frequency range and each transform sub-band, and their classification performances are tested. The method is validated using EEG signals from the BCI competition III dataset IIIa and classifiers such as support vector machine and k-nearest neighbors.

A Risk Assessment Tool for the Contamination of Aflatoxins on Dried Figs based on Machine Learning Algorithms

Aflatoxins are highly poisonous and carcinogenic compounds produced by species of the genus Aspergillus spp. that can infect a variety of agricultural foods, including dried figs. Biological and environmental factors, such as population, pathogenicity and aflatoxinogenic capacity of the strains, topography, soil and climate parameters of the fig orchards are believed to have a strong effect on aflatoxin levels. Existing methods for aflatoxin detection and measurement, such as high-performance liquid chromatography (HPLC), and enzyme-linked immunosorbent assay (ELISA), can provide accurate results, but the procedures are usually time-consuming, sample-destructive and expensive. Predicting aflatoxin levels prior to crop harvest is useful for minimizing the health and financial impact of a contaminated crop. Consequently, there is interest in developing a tool that predicts aflatoxin levels based on topography and soil analysis data of fig orchards. This paper describes the development of a risk assessment tool for the contamination of aflatoxin on dried figs, based on the location and altitude of the fig orchards, the population of the fungus Aspergillus spp. in the soil, and soil parameters such as pH, saturation percentage (SP), electrical conductivity (EC), organic matter, particle size analysis (sand, silt, clay), concentration of the exchangeable cations (Ca, Mg, K, Na), extractable P and trace of elements (B, Fe, Mn, Zn and Cu), by employing machine learning methods. In particular, our proposed method integrates three machine learning techniques i.e., dimensionality reduction on the original dataset (Principal Component Analysis), metric learning (Mahalanobis Metric for Clustering) and K-nearest Neighbors learning algorithm (KNN), into an enhanced model, with mean performance equal to 85% by terms of the Pearson Correlation Coefficient (PCC) between observed and predicted values.

Index t-SNE: Tracking Dynamics of High-Dimensional Datasets with Coherent Embeddings

t-SNE is an embedding method that the data science community has widely used. It helps two main tasks: to display results by coloring items according to the item class or feature value; and for forensic, giving a first overview of the dataset distribution. Two interesting characteristics of t-SNE are the structure preservation property and the answer to the crowding problem, where all neighbors in high dimensional space cannot be represented correctly in low dimensional space. t-SNE preserves the local neighborhood, and similar items are nicely spaced by adjusting to the local density. These two characteristics produce a meaningful representation, where the cluster area is proportional to its size in number, and relationships between clusters are materialized by closeness on the embedding. This algorithm is non-parametric. The transformation from a high to low dimensional space is described but not learned. Two initializations of the algorithm would lead to two different embedding. In a forensic approach, analysts would like to compare two or more datasets using their embedding. A naive approach would be to embed all datasets together. However, this process is costly as the complexity of t-SNE is quadratic, and would be infeasible for too many datasets. Another approach would be to learn a parametric model over an embedding built with a subset of data. While this approach is highly scalable, points could be mapped at the same exact position, making them indistinguishable. This type of model would be unable to adapt to new outliers nor concept drift. This paper presents a methodology to reuse an embedding to create a new one, where cluster positions are preserved. The optimization process minimizes two costs, one relative to the embedding shape and the second relative to the support embedding’ match. The embedding with the support process can be repeated more than once, with the newly obtained embedding. The successive embedding can be used to study the impact of one variable over the dataset distribution or monitor changes over time. This method has the same complexity as t-SNE per embedding, and memory requirements are only doubled. For a dataset of n elements sorted and split into k subsets, the total embedding complexity would be reduced from O(n2) to O(n2/k), and the memory requirement from n2 to 2(n/k)2 which enables computation on recent laptops. The method showed promising results on a real-world dataset, allowing to observe the birth, evolution and death of clusters. The proposed approach facilitates identifying significant trends and changes, which empowers the monitoring high dimensional datasets’ dynamics.

Machine Learning for Aiding Meningitis Diagnosis in Pediatric Patients

This paper presents a Machine Learning (ML) approach to support Meningitis diagnosis in patients at a children’s hospital in Sao Paulo, Brazil. The aim is to use ML techniques to reduce the use of invasive procedures, such as cerebrospinal fluid (CSF) collection, as much as possible. In this study, we focus on predicting the probability of Meningitis given the results of a blood and urine laboratory tests, together with the analysis of pain or other complaints from the patient. We tested a number of different ML algorithms, including: Adaptative Boosting (AdaBoost), Decision Tree, Gradient Boosting, K-Nearest Neighbors (KNN), Logistic Regression, Random Forest and Support Vector Machines (SVM). Decision Tree algorithm performed best, with 94.56% and 96.18% accuracy for training and testing data, respectively. These results represent a significant aid to doctors in diagnosing Meningitis as early as possible and in preventing expensive and painful procedures on some children.

A Study about the Distribution of the Spanning Ratios of Yao Graphs

A critical problem in wireless sensor networks is limited battery and memory of nodes. Therefore, each node in the network could maintain only a subset of its neighbors to communicate with. This will increase the battery usage in the network because each packet should take more hops to reach its destination. In order to tackle these problems, spanner graphs are defined. Since each node has a small degree in a spanner graph and the distance in the graph is not much greater than its actual geographical distance, spanner graphs are suitable candidates to be used for the topology of a wireless sensor network. In this paper, we study Yao graphs and their behavior for a randomly selected set of points. We generate several random point sets and compare the properties of their Yao graphs with the complete graph. Based on our data sets, we obtain several charts demonstrating how Yao graphs behave for a set of randomly chosen point set. As the results show, the stretch factor of a Yao graph follows a normal distribution. Furthermore, the stretch factor is in average far less than the worst case stretch factor proved for Yao graphs in previous results. Furthermore, we use Yao graph for a realistic point set and study its stretch factor in real world.

Distances over Incomplete Diabetes and Breast Cancer Data Based on Bhattacharyya Distance

Missing values in real-world datasets are a common problem. Many algorithms were developed to deal with this problem, most of them replace the missing values with a fixed value that was computed based on the observed values. In our work, we used a distance function based on Bhattacharyya distance to measure the distance between objects with missing values. Bhattacharyya distance, which measures the similarity of two probability distributions. The proposed distance distinguishes between known and unknown values. Where the distance between two known values is the Mahalanobis distance. When, on the other hand, one of them is missing the distance is computed based on the distribution of the known values, for the coordinate that contains the missing value. This method was integrated with Wikaya, a digital health company developing a platform that helps to improve prevention of chronic diseases such as diabetes and cancer. In order for Wikaya’s recommendation system to work distance between users need to be measured. Since there are missing values in the collected data, there is a need to develop a distance function distances between incomplete users profiles. To evaluate the accuracy of the proposed distance function in reflecting the actual similarity between different objects, when some of them contain missing values, we integrated it within the framework of k nearest neighbors (kNN) classifier, since its computation is based only on the similarity between objects. To validate this, we ran the algorithm over diabetes and breast cancer datasets, standard benchmark datasets from the UCI repository. Our experiments show that kNN classifier using our proposed distance function outperforms the kNN using other existing methods.

A Dataset of Program Educational Objectives Mapped to ABET Outcomes: Data Cleansing, Exploratory Data Analysis and Modeling

Datasets or collections are becoming important assets by themselves and now they can be accepted as a primary intellectual output of a research. The quality and usage of the datasets depend mainly on the context under which they have been collected, processed, analyzed, validated, and interpreted. This paper aims to present a collection of program educational objectives mapped to student’s outcomes collected from self-study reports prepared by 32 engineering programs accredited by ABET. The manual mapping (classification) of this data is a notoriously tedious, time consuming process. In addition, it requires experts in the area, which are mostly not available. It has been shown the operational settings under which the collection has been produced. The collection has been cleansed, preprocessed, some features have been selected and preliminary exploratory data analysis has been performed so as to illustrate the properties and usefulness of the collection. At the end, the collection has been benchmarked using nine of the most widely used supervised multiclass classification techniques (Binary Relevance, Label Powerset, Classifier Chains, Pruned Sets, Random k-label sets, Ensemble of Classifier Chains, Ensemble of Pruned Sets, Multi-Label k-Nearest Neighbors and Back-Propagation Multi-Label Learning). The techniques have been compared to each other using five well-known measurements (Accuracy, Hamming Loss, Micro-F, Macro-F, and Macro-F). The Ensemble of Classifier Chains and Ensemble of Pruned Sets have achieved encouraging performance compared to other experimented multi-label classification methods. The Classifier Chains method has shown the worst performance. To recap, the benchmark has achieved promising results by utilizing preliminary exploratory data analysis performed on the collection, proposing new trends for research and providing a baseline for future studies.

Automatic Staging and Subtype Determination for Non-Small Cell Lung Carcinoma Using PET Image Texture Analysis

In this study, our goal was to perform tumor staging and subtype determination automatically using different texture analysis approaches for a very common cancer type, i.e., non-small cell lung carcinoma (NSCLC). Especially, we introduced a texture analysis approach, called Law’s texture filter, to be used in this context for the first time. The 18F-FDG PET images of 42 patients with NSCLC were evaluated. The number of patients for each tumor stage, i.e., I-II, III or IV, was 14. The patients had ~45% adenocarcinoma (ADC) and ~55% squamous cell carcinoma (SqCCs). MATLAB technical computing language was employed in the extraction of 51 features by using first order statistics (FOS), gray-level co-occurrence matrix (GLCM), gray-level run-length matrix (GLRLM), and Laws’ texture filters. The feature selection method employed was the sequential forward selection (SFS). Selected textural features were used in the automatic classification by k-nearest neighbors (k-NN) and support vector machines (SVM). In the automatic classification of tumor stage, the accuracy was approximately 59.5% with k-NN classifier (k=3) and 69% with SVM (with one versus one paradigm), using 5 features. In the automatic classification of tumor subtype, the accuracy was around 92.7% with SVM one vs. one. Texture analysis of FDG-PET images might be used, in addition to metabolic parameters as an objective tool to assess tumor histopathological characteristics and in automatic classification of tumor stage and subtype.

Comparison of Different k-NN Models for Speed Prediction in an Urban Traffic Network

A database that records average traffic speeds measured at five-minute intervals for all the links in the traffic network of a metropolitan city. While learning from this data the models that can predict future traffic speed would be beneficial for the applications such as the car navigation system, building predictive models for every link becomes a nontrivial job if the number of links in a given network is huge. An advantage of adopting k-nearest neighbor (k-NN) as predictive models is that it does not require any explicit model building. Instead, k-NN takes a long time to make a prediction because it needs to search for the k-nearest neighbors in the database at prediction time. In this paper, we investigate how much we can speed up k-NN in making traffic speed predictions by reducing the amount of data to be searched for without a significant sacrifice of prediction accuracy. The rationale behind this is that we had a better look at only the recent data because the traffic patterns not only repeat daily or weekly but also change over time. In our experiments, we build several different k-NN models employing different sets of features which are the current and past traffic speeds of the target link and the neighbor links in its up/down-stream. The performances of these models are compared by measuring the average prediction accuracy and the average time taken to make a prediction using various amounts of data.

Consensus of Multi-Agent Systems under the Special Consensus Protocols

Two consensus problems are considered in this paper. One is the consensus of linear multi-agent systems with weakly connected directed communication topology. The other is the consensus of nonlinear multi-agent systems with strongly connected directed communication topology. For the first problem, a simplified consensus protocol is designed: Each child agent can only communicate with one of its neighbors. That is, the real communication topology is a directed spanning tree of the original communication topology and without any cycles. Then, the necessary and sufficient condition is put forward to the multi-agent systems can be reached consensus. It is worth noting that the given conditions do not need any eigenvalue of the corresponding Laplacian matrix of the original directed communication network. For the second problem, the feedback gain is designed in the nonlinear consensus protocol. Then, the sufficient condition is proposed such that the systems can be achieved consensus. Besides, the consensus interval is introduced and analyzed to solve the consensus problem. Finally, two numerical simulations are included to verify the theoretical analysis.

Nutrition Program Planning Based on Local Resources in Urban Fringe Areas of a Developing Country

Obesity prevalence and severe malnutrition in Indonesia has increased from 2007 to 2013. The utilization of local resources in nutritional program planning can be used to program efficiency and to reach the goal. The aim of this research is to plan a nutrition program based on local resources for urban fringe areas in a developing country. This research used a qualitative approach, with a focus on local resources including social capital, social system, cultural system. The study was conducted in Mijen, Central Java, as one of the urban fringe areas in Indonesia. Purposive and snowball sampling techniques are used to determine participants. A total of 16 participants took part in the study. Observation, interviews, focus group discussion, SWOT analysis, brainstorming and Miles and Huberman models were used to analyze the data. We have identified several local resources, such as the contributions from nutrition cadres, social organizations, social financial resources, as well as the cultural system and social system. The outstanding contribution of nutrition cadres is the participation and creativity to improve nutritional status. In addition, social organizations, like the role of the integrated health center for children (Pos Pelayanan Terpadu), can be engaged in the nutrition program planning. This center is supported by House of Nutrition to assist in nutrition program planning, and provide social support to families, neighbors and communities as social capitals. The study also reported that cultural systems that show appreciation for well-nourished children are a better way to improve the problem of balanced nutrition. Social systems such as teamwork and mutual cooperation can also be a potential resource to support nutritional programs and overcome associated problems. The impact of development in urban areas such as the introduction of more green areas which improve the perceived status of local people, as well as new health services facilitated by people and companies, can also be resources to support nutrition programs. Local resources in urban fringe areas can be used in the planning of nutrition programs. The expansion of partnership with all stakeholders, empowering the community through optimizing the roles of nutrition care centers for children as our recommendation with regard to nutrition program planning.

Relevant Stakeholders in Environmental Management Organization: The Case of Industries Três Rios/RJ

The intense process of economic acceleration, expansion of industrial activities and capitalism, combined with population growth, while promoting the development, bring environmental consequences and dynamics of locations. It can be seen that society is seeking to break with old paradigms of capitalist society, seeking to reconcile growth with sustainable development, with a change of mentality of the stakeholders of the production process (shareholders, employees, suppliers, customers, governments, and neighbors, groups citizens and the public in general). In this context, this research aims to map the stakeholders interested in environmental management in industries located in the city of Três Rios/RJ. The city of Três Rios is located in South-Central region of the state of Rio de Janeiro - Brazil. Methodological resources used refer to descriptive and field research, whose nature is qualitative and quantitative. It is also of multicases studies in the study area, and the data collection occurred by means of semi-structured questionnaires and interviews with employees related to the environmental area of the industries located in Três Rios and registered at the Federation of Industries the State of Rio de Janeiro - FIRJAN in the version of 2013 and active in federal revenue. Through this research it observed, among other things, the stakeholders involved in the environmental management process of “Três Rios” industry respondents, and those responding to the demands of environmental management.

A Spatial Hypergraph Based Semi-Supervised Band Selection Method for Hyperspectral Imagery Semantic Interpretation

Hyperspectral imagery (HSI) typically provides a wealth of information captured in a wide range of the electromagnetic spectrum for each pixel in the image. Hence, a pixel in HSI is a high-dimensional vector of intensities with a large spectral range and a high spectral resolution. Therefore, the semantic interpretation is a challenging task of HSI analysis. We focused in this paper on object classification as HSI semantic interpretation. However, HSI classification still faces some issues, among which are the following: The spatial variability of spectral signatures, the high number of spectral bands, and the high cost of true sample labeling. Therefore, the high number of spectral bands and the low number of training samples pose the problem of the curse of dimensionality. In order to resolve this problem, we propose to introduce the process of dimensionality reduction trying to improve the classification of HSI. The presented approach is a semi-supervised band selection method based on spatial hypergraph embedding model to represent higher order relationships with different weights of the spatial neighbors corresponding to the centroid of pixel. This semi-supervised band selection has been developed to select useful bands for object classification. The presented approach is evaluated on AVIRIS and ROSIS HSIs and compared to other dimensionality reduction methods. The experimental results demonstrate the efficacy of our approach compared to many existing dimensionality reduction methods for HSI classification.

Prevalence of Headache among Adult Population in Urban Varanasi, India

Headache is one of the most ubiquitous and frequent neurological disorders interfering with everyday life in all countries. India appears to be no exception. Objectives are to assess the prevalence of headache among adult population in urban area of Varanasi and to find out factors influencing the occurrence of headache. A community based cross sectional study was conducted among adult population in urban area of Varanasi district, Uttar Pradesh, India. Total 151 eligible respondents were interviewed by simple random sampling technique. Proportion percentage and Chisquare test were applied for data analysis. Out of 151 respondents, majority (58.3%) were females. In this study, 92.8% respondents belonged to age group 18-60 years while 7.2% was either 60 year of age or above. The overall prevalence of headache was found to be 51.1%. Highest and lowest prevalence of headache was recorded in age groups 18-29 year & 40-49 year respectively. Headache was 62.1% in illiterate and was 40.0% among graduate & above. Unskilled workers had more headache 73.1% than other type of occupation. Headache was more prevalent among unemployed (35.9%) than employed (6.4%). Females had higher family history of headache (48.9%) as compared to males (41.3%). Study subjects having peaceful relation with family members, relatives and neighbors had more headache than those having no peaceful relation.  

Hamiltonian Related Properties with and without Faults of the Dual-Cube Interconnection Network and Their Variations

In this paper, a thorough review about dual-cubes, DCn, the related studies and their variations are given. DCn was introduced to be a network which retains the pleasing properties of hypercube Qn but has a much smaller diameter. In fact, it is so constructed that the number of vertices of DCn is equal to the number of vertices of Q2n +1. However, each vertex in DCn is adjacent to n + 1 neighbors and so DCn has (n + 1) × 2^2n edges in total, which is roughly half the number of edges of Q2n+1. In addition, the diameter of any DCn is 2n +2, which is of the same order of that of Q2n+1. For selfcompleteness, basic definitions, construction rules and symbols are provided. We chronicle the results, where eleven significant theorems are presented, and include some open problems at the end.

Recognition of Tifinagh Characters with Missing Parts Using Neural Network

In this paper, we present an algorithm for reconstruction from incomplete 2D scans for tifinagh characters. This algorithm is based on using correlation between the lost block and its neighbors. This system proposed contains three main parts: pre-processing, features extraction and recognition. In the first step, we construct a database of tifinagh characters. In the second step, we will apply “shape analysis algorithm”. In classification part, we will use Neural Network. The simulation results demonstrate that the proposed method give good results.

Liability Aspects Related to Genetically Modified Food under the Food Safety Legislation in India

The question of legal liability over injury arising out of the import and the introduction of GM food emerges as a crucial issue confronting to promote GM food and its derivatives. There is a greater possibility of commercialized GM food from the exporting country to enter importing country where status of approval shall not be same. This necessitates the importance of fixing a liability mechanism to discuss the damage, if any, occurs at the level of transboundary movement or at the market. There was a widespread consensus to develop the Cartagena Protocol on Biosafety and to give for a dedicated regime on liability and redress in the form of Nagoya Kuala Lumpur Supplementary Protocol on the Liability and Redress (‘N-KL Protocol’) at the international context. The national legal frameworks based on this protocol are not adequately established in the prevailing food legislations of the developing countries. The developing economy like India is willing to import GM food and its derivatives after the successful commercialization of Bt Cotton in 2002. As a party to the N-KL Protocol, it is indispensable for India to formulate a legal framework and to discuss safety, liability, and regulatory issues surrounding GM foods in conformity to the provisions of the Protocol. The liability mechanism is also important in the case where the risk assessment and risk management is still in implementing stage. Moreover, the country is facing GM infiltration issues with its neighbors Bangladesh. As a precautionary approach, there is a need to formulate rules and procedure of legal liability to discuss any kind of damage occurs at transboundary trade. In this context, the proposed work will attempt to analyze the liability regime in the existing Food Safety and Standards Act, 2006 from the applicability and domestic compliance and to suggest legal and policy options for regulatory authorities.

A Survey on Opportunistic Routing in Mobile Ad Hoc Networks

Opportunistic Routing (OR) increases the transmission reliability and network throughput. Traditional routing protocols preselects one or more predetermined nodes before transmission starts and uses a predetermined neighbor to forward a packet in each hop. The opportunistic routing overcomes the drawback of unreliable wireless transmission by broadcasting one transmission can be overheard by manifold neighbors. The first cooperation-optimal protocol for Multirate OR (COMO) used to achieve social efficiency and prevent the selfish behavior of the nodes. The novel link-correlation-aware OR improves the performance by exploiting the miscellaneous low correlated forward links. Context aware Adaptive OR (CAOR) uses active suppression mechanism to reduce packet duplication. The Context-aware OR (COR) can provide efficient routing in mobile networks. By using Cooperative Opportunistic Routing in Mobile Ad hoc Networks (CORMAN), the problem of opportunistic data transfer can be tackled. While comparing to all the protocols, COMO is the best as it achieves social efficiency and prevents the selfish behavior of the nodes.

The Classification Performance in Parametric and Nonparametric Discriminant Analysis for a Class- Unbalanced Data of Diabetes Risk Groups

The problems arising from unbalanced data sets generally appear in real world applications. Due to unequal class distribution, many researchers have found that the performance of existing classifiers tends to be biased towards the majority class. The k-nearest neighbors’ nonparametric discriminant analysis is a method that was proposed for classifying unbalanced classes with good performance. In this study, the methods of discriminant analysis are of interest in investigating misclassification error rates for classimbalanced data of three diabetes risk groups. The purpose of this study was to compare the classification performance between parametric discriminant analysis and nonparametric discriminant analysis in a three-class classification of class-imbalanced data of diabetes risk groups. Data from a project maintaining healthy conditions for 599 employees of a government hospital in Bangkok were obtained for the classification problem. The employees were divided into three diabetes risk groups: non-risk (90%), risk (5%), and diabetic (5%). The original data including the variables of diabetes risk group, age, gender, blood glucose, and BMI were analyzed and bootstrapped for 50 and 100 samples, 599 observations per sample, for additional estimation of the misclassification error rate. Each data set was explored for the departure of multivariate normality and the equality of covariance matrices of the three risk groups. Both the original data and the bootstrap samples showed nonnormality and unequal covariance matrices. The parametric linear discriminant function, quadratic discriminant function, and the nonparametric k-nearest neighbors’ discriminant function were performed over 50 and 100 bootstrap samples and applied to the original data. Searching the optimal classification rule, the choices of prior probabilities were set up for both equal proportions (0.33: 0.33: 0.33) and unequal proportions of (0.90:0.05:0.05), (0.80: 0.10: 0.10) and (0.70, 0.15, 0.15). The results from 50 and 100 bootstrap samples indicated that the k-nearest neighbors approach when k=3 or k=4 and the defined prior probabilities of non-risk: risk: diabetic as 0.90: 0.05:0.05 or 0.80:0.10:0.10 gave the smallest error rate of misclassification. The k-nearest neighbors approach would be suggested for classifying a three-class-imbalanced data of diabetes risk groups.

Factors Associated with Mammography Screening Behaviors: A Cross-Sectional Descriptive Study of Egyptian Women

Breast cancer is considered as a substantial health concern and practicing mammography screening [MS] is important in minimizing its related morbidity. So it is essential to have a better understanding of breast cancer screening behaviors of women and factors that influence utilization of them. The aim of this study is to identify the factors that are linked to MS behaviors among the Egyptian women. A cross-sectional descriptive design was carried out to provide a snapshot of the factors that are linked to MS behaviors. A convenience sample of 311 women was utilized and all eligible participants admitted to the Women Imaging Unit who are 40 years of age or above, coming for mammography assessment, not pregnant or breast feeding and who accepted to participate in the study were included. A structured questionnaire was developed by the researchers and contains three parts; Socio-demographic data; Motivating factors associated with MS; and association between MS and model of behavior change. The analyzed data indicated that most of the participated women (66.6%) belonged to the age group of 40- 49.A high proportion of participants (58.1%) of group having previous MS influenced by their neighbors to practice MS, whereas 32.7 % in group not having previous MS were influenced by family members which indicated significant differences (P