Abstract: The goal of motor image brain computer interface research is to create a link between the central nervous system and a computer or device. The most important signal for brain-computer interface is the electroencephalogram. The aim of this research is to explore a set of effective features from EEG signals, separated into frequency bands, using wave atom sub-bands to discriminate right and left-hand motor imagery signals. Over the transform coefficients, feature vectors are constructed for each frequency range and each transform sub-band, and their classification performances are tested. The method is validated using EEG signals from the BCI competition III dataset IIIa and classifiers such as support vector machine and k-nearest neighbors.
Abstract: Aflatoxins are highly poisonous and carcinogenic compounds produced by species of the genus Aspergillus spp. that can infect a variety of agricultural foods, including dried figs. Biological and environmental factors, such as population, pathogenicity and aflatoxinogenic capacity of the strains, topography, soil and climate parameters of the fig orchards are believed to have a strong effect on aflatoxin levels. Existing methods for aflatoxin detection and measurement, such as high-performance liquid chromatography (HPLC), and enzyme-linked immunosorbent assay (ELISA), can provide accurate results, but the procedures are usually time-consuming, sample-destructive and expensive. Predicting aflatoxin levels prior to crop harvest is useful for minimizing the health and financial impact of a contaminated crop. Consequently, there is interest in developing a tool that predicts aflatoxin levels based on topography and soil analysis data of fig orchards. This paper describes the development of a risk assessment tool for the contamination of aflatoxin on dried figs, based on the location and altitude of the fig orchards, the population of the fungus Aspergillus spp. in the soil, and soil parameters such as pH, saturation percentage (SP), electrical conductivity (EC), organic matter, particle size analysis (sand, silt, clay), concentration of the exchangeable cations (Ca, Mg, K, Na), extractable P and trace of elements (B, Fe, Mn, Zn and Cu), by employing machine learning methods. In particular, our proposed method integrates three machine learning techniques i.e., dimensionality reduction on the original dataset (Principal Component Analysis), metric learning (Mahalanobis Metric for Clustering) and K-nearest Neighbors learning algorithm (KNN), into an enhanced model, with mean performance equal to 85% by terms of the Pearson Correlation Coefficient (PCC) between observed and predicted values.
Abstract: t-SNE is an embedding method that the data science community has widely used. It helps two main tasks: to display results by coloring items according to the item class or feature value; and for forensic, giving a first overview of the dataset distribution. Two interesting characteristics of t-SNE are the structure preservation property and the answer to the crowding problem, where all neighbors in high dimensional space cannot be represented correctly in low dimensional space. t-SNE preserves the local neighborhood, and similar items are nicely spaced by adjusting to the local density. These two characteristics produce a meaningful representation, where the cluster area is proportional to its size in number, and relationships between clusters are materialized by closeness on the embedding. This algorithm is non-parametric. The transformation from a high to low dimensional space is described but not learned. Two initializations of the algorithm would lead to two different embedding. In a forensic approach, analysts would like to compare two or more datasets using their embedding. A naive approach would be to embed all datasets together. However, this process is costly as the complexity of t-SNE is quadratic, and would be infeasible for too many datasets. Another approach would be to learn a parametric model over an embedding built with a subset of data. While this approach is highly scalable, points could be mapped at the same exact position, making them indistinguishable. This type of model would be unable to adapt to new outliers nor concept drift. This paper presents a methodology to reuse an embedding to create a new one, where cluster positions are preserved. The optimization process minimizes two costs, one relative to the embedding shape and the second relative to the support embedding’ match. The embedding with the support process can be repeated more than once, with the newly obtained embedding. The successive embedding can be used to study the impact of one variable over the dataset distribution or monitor changes over time. This method has the same complexity as t-SNE per embedding, and memory requirements are only doubled. For a dataset of n elements sorted and split into k subsets, the total embedding complexity would be reduced from O(n2) to O(n2/k), and the memory requirement from n2 to 2(n/k)2 which enables computation on recent laptops. The method showed promising results on a real-world dataset, allowing to observe the birth, evolution and death of clusters. The proposed approach facilitates identifying significant trends and changes, which empowers the monitoring high dimensional datasets’ dynamics.
Abstract: This paper presents a Machine Learning (ML) approach to support Meningitis diagnosis in patients at a children’s hospital in Sao Paulo, Brazil. The aim is to use ML techniques to reduce the use of invasive procedures, such as cerebrospinal fluid (CSF) collection, as much as possible. In this study, we focus on predicting the probability of Meningitis given the results of a blood and urine laboratory tests, together with the analysis of pain or other complaints from the patient. We tested a number of different ML algorithms, including: Adaptative Boosting (AdaBoost), Decision Tree, Gradient Boosting, K-Nearest Neighbors (KNN), Logistic Regression, Random Forest and Support Vector Machines (SVM). Decision Tree algorithm performed best, with 94.56% and 96.18% accuracy for training and testing data, respectively. These results represent a significant aid to doctors in diagnosing Meningitis as early as possible and in preventing expensive and painful procedures on some children.
Abstract: A critical problem in wireless sensor networks is limited battery and memory of nodes. Therefore, each node in the network could maintain only a subset of its neighbors to communicate with. This will increase the battery usage in the network because each packet should take more hops to reach its destination. In order to tackle these problems, spanner graphs are defined. Since each node has a small degree in a spanner graph and the distance in the graph is not much greater than its actual geographical distance, spanner graphs are suitable candidates to be used for the topology of a wireless sensor network. In this paper, we study Yao graphs and their behavior for a randomly selected set of points. We generate several random point sets and compare the properties of their Yao graphs with the complete graph. Based on our data sets, we obtain several charts demonstrating how Yao graphs behave for a set of randomly chosen point set. As the results show, the stretch factor of a Yao graph follows a normal distribution. Furthermore, the stretch factor is in average far less than the worst case stretch factor proved for Yao graphs in previous results. Furthermore, we use Yao graph for a realistic point set and study its stretch factor in real world.
Abstract: Missing values in real-world datasets are a common
problem. Many algorithms were developed to deal with this
problem, most of them replace the missing values with a fixed
value that was computed based on the observed values. In
our work, we used a distance function based on Bhattacharyya
distance to measure the distance between objects with missing
values. Bhattacharyya distance, which measures the similarity of
two probability distributions. The proposed distance distinguishes
between known and unknown values. Where the distance between
two known values is the Mahalanobis distance. When, on the other
hand, one of them is missing the distance is computed based on the
distribution of the known values, for the coordinate that contains
the missing value. This method was integrated with Wikaya, a
digital health company developing a platform that helps to improve
prevention of chronic diseases such as diabetes and cancer. In order
for Wikaya’s recommendation system to work distance between users
need to be measured. Since there are missing values in the collected
data, there is a need to develop a distance function distances between
incomplete users profiles. To evaluate the accuracy of the proposed
distance function in reflecting the actual similarity between different
objects, when some of them contain missing values, we integrated it
within the framework of k nearest neighbors (kNN) classifier, since
its computation is based only on the similarity between objects. To
validate this, we ran the algorithm over diabetes and breast cancer
datasets, standard benchmark datasets from the UCI repository. Our
experiments show that kNN classifier using our proposed distance
function outperforms the kNN using other existing methods.
Abstract: Datasets or collections are becoming important assets by themselves and now they can be accepted as a primary intellectual output of a research. The quality and usage of the datasets depend mainly on the context under which they have been collected, processed, analyzed, validated, and interpreted. This paper aims to present a collection of program educational objectives mapped to student’s outcomes collected from self-study reports prepared by 32 engineering programs accredited by ABET. The manual mapping (classification) of this data is a notoriously tedious, time consuming process. In addition, it requires experts in the area, which are mostly not available. It has been shown the operational settings under which the collection has been produced. The collection has been cleansed, preprocessed, some features have been selected and preliminary exploratory data analysis has been performed so as to illustrate the properties and usefulness of the collection. At the end, the collection has been benchmarked using nine of the most widely used supervised multiclass classification techniques (Binary Relevance, Label Powerset, Classifier Chains, Pruned Sets, Random k-label sets, Ensemble of Classifier Chains, Ensemble of Pruned Sets, Multi-Label k-Nearest Neighbors and Back-Propagation Multi-Label Learning). The techniques have been compared to each other using five well-known measurements (Accuracy, Hamming Loss, Micro-F, Macro-F, and Macro-F). The Ensemble of Classifier Chains and Ensemble of Pruned Sets have achieved encouraging performance compared to other experimented multi-label classification methods. The Classifier Chains method has shown the worst performance. To recap, the benchmark has achieved promising results by utilizing preliminary exploratory data analysis performed on the collection, proposing new trends for research and providing a baseline for future studies.
Abstract: In this study, our goal was to perform tumor staging and subtype determination automatically using different texture analysis approaches for a very common cancer type, i.e., non-small cell lung carcinoma (NSCLC). Especially, we introduced a texture analysis approach, called Law’s texture filter, to be used in this context for the first time. The 18F-FDG PET images of 42 patients with NSCLC were evaluated. The number of patients for each tumor stage, i.e., I-II, III or IV, was 14. The patients had ~45% adenocarcinoma (ADC) and ~55% squamous cell carcinoma (SqCCs). MATLAB technical computing language was employed in the extraction of 51 features by using first order statistics (FOS), gray-level co-occurrence matrix (GLCM), gray-level run-length matrix (GLRLM), and Laws’ texture filters. The feature selection method employed was the sequential forward selection (SFS). Selected textural features were used in the automatic classification by k-nearest neighbors (k-NN) and support vector machines (SVM). In the automatic classification of tumor stage, the accuracy was approximately 59.5% with k-NN classifier (k=3) and 69% with SVM (with one versus one paradigm), using 5 features. In the automatic classification of tumor subtype, the accuracy was around 92.7% with SVM one vs. one. Texture analysis of FDG-PET images might be used, in addition to metabolic parameters as an objective tool to assess tumor histopathological characteristics and in automatic classification of tumor stage and subtype.
Abstract: A database that records average traffic speeds measured at five-minute intervals for all the links in the traffic network of a metropolitan city. While learning from this data the models that can predict future traffic speed would be beneficial for the applications such as the car navigation system, building predictive models for every link becomes a nontrivial job if the number of links in a given network is huge. An advantage of adopting k-nearest neighbor (k-NN) as predictive models is that it does not require any explicit model building. Instead, k-NN takes a long time to make a prediction because it needs to search for the k-nearest neighbors in the database at prediction time. In this paper, we investigate how much we can speed up k-NN in making traffic speed predictions by reducing the amount of data to be searched for without a significant sacrifice of prediction accuracy. The rationale behind this is that we had a better look at only the recent data because the traffic patterns not only repeat daily or weekly but also change over time. In our experiments, we build several different k-NN models employing different sets of features which are the current and past traffic speeds of the target link and the neighbor links in its up/down-stream. The performances of these models are compared by measuring the average prediction accuracy and the average time taken to make a prediction using various amounts of data.
Abstract: Two consensus problems are considered in this
paper. One is the consensus of linear multi-agent systems with
weakly connected directed communication topology. The other
is the consensus of nonlinear multi-agent systems with strongly
connected directed communication topology. For the first problem,
a simplified consensus protocol is designed: Each child agent can
only communicate with one of its neighbors. That is, the real
communication topology is a directed spanning tree of the original
communication topology and without any cycles. Then, the necessary
and sufficient condition is put forward to the multi-agent systems can
be reached consensus. It is worth noting that the given conditions do
not need any eigenvalue of the corresponding Laplacian matrix of the
original directed communication network. For the second problem,
the feedback gain is designed in the nonlinear consensus protocol.
Then, the sufficient condition is proposed such that the systems can
be achieved consensus. Besides, the consensus interval is introduced
and analyzed to solve the consensus problem. Finally, two numerical
simulations are included to verify the theoretical analysis.
Abstract: Obesity prevalence and severe malnutrition in Indonesia has increased from 2007 to 2013. The utilization of local resources in nutritional program planning can be used to program efficiency and to reach the goal. The aim of this research is to plan a nutrition program based on local resources for urban fringe areas in a developing country. This research used a qualitative approach, with a focus on local resources including social capital, social system, cultural system. The study was conducted in Mijen, Central Java, as one of the urban fringe areas in Indonesia. Purposive and snowball sampling techniques are used to determine participants. A total of 16 participants took part in the study. Observation, interviews, focus group discussion, SWOT analysis, brainstorming and Miles and Huberman models were used to analyze the data. We have identified several local resources, such as the contributions from nutrition cadres, social organizations, social financial resources, as well as the cultural system and social system. The outstanding contribution of nutrition cadres is the participation and creativity to improve nutritional status. In addition, social organizations, like the role of the integrated health center for children (Pos Pelayanan Terpadu), can be engaged in the nutrition program planning. This center is supported by House of Nutrition to assist in nutrition program planning, and provide social support to families, neighbors and communities as social capitals. The study also reported that cultural systems that show appreciation for well-nourished children are a better way to improve the problem of balanced nutrition. Social systems such as teamwork and mutual cooperation can also be a potential resource to support nutritional programs and overcome associated problems. The impact of development in urban areas such as the introduction of more green areas which improve the perceived status of local people, as well as new health services facilitated by people and companies, can also be resources to support nutrition programs. Local resources in urban fringe areas can be used in the planning of nutrition programs. The expansion of partnership with all stakeholders, empowering the community through optimizing the roles of nutrition care centers for children as our recommendation with regard to nutrition program planning.
Abstract: The intense process of economic acceleration, expansion of industrial activities and capitalism, combined with population growth, while promoting the development, bring environmental consequences and dynamics of locations. It can be seen that society is seeking to break with old paradigms of capitalist society, seeking to reconcile growth with sustainable development, with a change of mentality of the stakeholders of the production process (shareholders, employees, suppliers, customers, governments, and neighbors, groups citizens and the public in general). In this context, this research aims to map the stakeholders interested in environmental management in industries located in the city of Três Rios/RJ. The city of Três Rios is located in South-Central region of the state of Rio de Janeiro - Brazil. Methodological resources used refer to descriptive and field research, whose nature is qualitative and quantitative. It is also of multicases studies in the study area, and the data collection occurred by means of semi-structured questionnaires and interviews with employees related to the environmental area of the industries located in Três Rios and registered at the Federation of Industries the State of Rio de Janeiro - FIRJAN in the version of 2013 and active in federal revenue. Through this research it observed, among other things, the stakeholders involved in the environmental management process of “Três Rios” industry respondents, and those responding to the demands of environmental management.
Abstract: Hyperspectral imagery (HSI) typically provides a
wealth of information captured in a wide range of the
electromagnetic spectrum for each pixel in the image. Hence, a
pixel in HSI is a high-dimensional vector of intensities with a
large spectral range and a high spectral resolution. Therefore, the
semantic interpretation is a challenging task of HSI analysis. We
focused in this paper on object classification as HSI semantic
interpretation. However, HSI classification still faces some issues,
among which are the following: The spatial variability of spectral
signatures, the high number of spectral bands, and the high cost
of true sample labeling. Therefore, the high number of spectral
bands and the low number of training samples pose the problem of
the curse of dimensionality. In order to resolve this problem, we
propose to introduce the process of dimensionality reduction trying
to improve the classification of HSI. The presented approach is a
semi-supervised band selection method based on spatial hypergraph
embedding model to represent higher order relationships with
different weights of the spatial neighbors corresponding to the
centroid of pixel. This semi-supervised band selection has been
developed to select useful bands for object classification. The
presented approach is evaluated on AVIRIS and ROSIS HSIs
and compared to other dimensionality reduction methods. The
experimental results demonstrate the efficacy of our approach
compared to many existing dimensionality reduction methods for
HSI classification.
Abstract: Headache is one of the most ubiquitous and frequent
neurological disorders interfering with everyday life in all countries.
India appears to be no exception. Objectives are to assess the
prevalence of headache among adult population in urban area of
Varanasi and to find out factors influencing the occurrence of
headache. A community based cross sectional study was conducted
among adult population in urban area of Varanasi district, Uttar
Pradesh, India. Total 151 eligible respondents were interviewed by
simple random sampling technique. Proportion percentage and Chisquare
test were applied for data analysis. Out of 151 respondents,
majority (58.3%) were females. In this study, 92.8% respondents
belonged to age group 18-60 years while 7.2% was either 60 year of
age or above. The overall prevalence of headache was found to be
51.1%. Highest and lowest prevalence of headache was recorded in
age groups 18-29 year & 40-49 year respectively. Headache was
62.1% in illiterate and was 40.0% among graduate & above.
Unskilled workers had more headache 73.1% than other type of
occupation. Headache was more prevalent among unemployed
(35.9%) than employed (6.4%). Females had higher family history of
headache (48.9%) as compared to males (41.3%). Study subjects
having peaceful relation with family members, relatives and
neighbors had more headache than those having no peaceful relation.
Abstract: In this paper, a thorough review about dual-cubes, DCn,
the related studies and their variations are given. DCn was introduced
to be a network which retains the pleasing properties of hypercube Qn
but has a much smaller diameter. In fact, it is so constructed that the
number of vertices of DCn is equal to the number of vertices of Q2n
+1. However, each vertex in DCn is adjacent to n + 1 neighbors and
so DCn has (n + 1) × 2^2n edges in total, which is roughly half the
number of edges of Q2n+1. In addition, the diameter of any DCn is 2n
+2, which is of the same order of that of Q2n+1. For selfcompleteness,
basic definitions, construction rules and symbols are
provided. We chronicle the results, where eleven significant theorems
are presented, and include some open problems at the end.
Abstract: In this paper, we present an algorithm for reconstruction from incomplete 2D scans for tifinagh characters. This algorithm is based on using correlation between the lost block and its neighbors. This system proposed contains three main parts: pre-processing, features extraction and recognition. In the first step, we construct a database of tifinagh characters. In the second step, we will apply “shape analysis algorithm”. In classification part, we will use Neural Network. The simulation results demonstrate that the proposed method give good results.
Abstract: The question of legal liability over injury arising out
of the import and the introduction of GM food emerges as a crucial
issue confronting to promote GM food and its derivatives. There is a
greater possibility of commercialized GM food from the exporting
country to enter importing country where status of approval shall not
be same. This necessitates the importance of fixing a liability
mechanism to discuss the damage, if any, occurs at the level of
transboundary movement or at the market. There was a widespread consensus to develop the Cartagena
Protocol on Biosafety and to give for a dedicated regime on liability
and redress in the form of Nagoya Kuala Lumpur Supplementary
Protocol on the Liability and Redress (‘N-KL Protocol’) at the
international context. The national legal frameworks based on this
protocol are not adequately established in the prevailing food
legislations of the developing countries. The developing economy
like India is willing to import GM food and its derivatives after the
successful commercialization of Bt Cotton in 2002. As a party to the
N-KL Protocol, it is indispensable for India to formulate a legal
framework and to discuss safety, liability, and regulatory issues
surrounding GM foods in conformity to the provisions of the
Protocol. The liability mechanism is also important in the case where
the risk assessment and risk management is still in implementing
stage. Moreover, the country is facing GM infiltration issues with its
neighbors Bangladesh. As a precautionary approach, there is a need
to formulate rules and procedure of legal liability to discuss any kind
of damage occurs at transboundary trade. In this context, the
proposed work will attempt to analyze the liability regime in the
existing Food Safety and Standards Act, 2006 from the applicability
and domestic compliance and to suggest legal and policy options for
regulatory authorities.
Abstract: Opportunistic Routing (OR) increases the
transmission reliability and network throughput. Traditional routing
protocols preselects one or more predetermined nodes before
transmission starts and uses a predetermined neighbor to forward a
packet in each hop. The opportunistic routing overcomes the
drawback of unreliable wireless transmission by broadcasting one
transmission can be overheard by manifold neighbors. The first
cooperation-optimal protocol for Multirate OR (COMO) used to
achieve social efficiency and prevent the selfish behavior of the
nodes. The novel link-correlation-aware OR improves the
performance by exploiting the miscellaneous low correlated forward
links. Context aware Adaptive OR (CAOR) uses active suppression
mechanism to reduce packet duplication. The Context-aware OR
(COR) can provide efficient routing in mobile networks. By using
Cooperative Opportunistic Routing in Mobile Ad hoc Networks
(CORMAN), the problem of opportunistic data transfer can be
tackled. While comparing to all the protocols, COMO is the best as it
achieves social efficiency and prevents the selfish behavior of the
nodes.
Abstract: The problems arising from unbalanced data sets
generally appear in real world applications. Due to unequal class
distribution, many researchers have found that the performance of
existing classifiers tends to be biased towards the majority class. The
k-nearest neighbors’ nonparametric discriminant analysis is a method
that was proposed for classifying unbalanced classes with good
performance. In this study, the methods of discriminant analysis are
of interest in investigating misclassification error rates for classimbalanced
data of three diabetes risk groups. The purpose of this
study was to compare the classification performance between
parametric discriminant analysis and nonparametric discriminant
analysis in a three-class classification of class-imbalanced data of
diabetes risk groups. Data from a project maintaining healthy
conditions for 599 employees of a government hospital in Bangkok
were obtained for the classification problem. The employees were
divided into three diabetes risk groups: non-risk (90%), risk (5%),
and diabetic (5%). The original data including the variables of
diabetes risk group, age, gender, blood glucose, and BMI were
analyzed and bootstrapped for 50 and 100 samples, 599 observations
per sample, for additional estimation of the misclassification error
rate. Each data set was explored for the departure of multivariate
normality and the equality of covariance matrices of the three risk
groups. Both the original data and the bootstrap samples showed nonnormality
and unequal covariance matrices. The parametric linear
discriminant function, quadratic discriminant function, and the
nonparametric k-nearest neighbors’ discriminant function were
performed over 50 and 100 bootstrap samples and applied to the
original data. Searching the optimal classification rule, the choices of
prior probabilities were set up for both equal proportions (0.33: 0.33:
0.33) and unequal proportions of (0.90:0.05:0.05), (0.80: 0.10: 0.10)
and (0.70, 0.15, 0.15). The results from 50 and 100 bootstrap samples
indicated that the k-nearest neighbors approach when k=3 or k=4 and
the defined prior probabilities of non-risk: risk: diabetic as 0.90:
0.05:0.05 or 0.80:0.10:0.10 gave the smallest error rate of
misclassification. The k-nearest neighbors approach would be
suggested for classifying a three-class-imbalanced data of diabetes
risk groups.
Abstract: Breast cancer is considered as a substantial health
concern and practicing mammography screening [MS] is important in
minimizing its related morbidity. So it is essential to have a better
understanding of breast cancer screening behaviors of women and
factors that influence utilization of them. The aim of this study is to
identify the factors that are linked to MS behaviors among the
Egyptian women. A cross-sectional descriptive design was carried
out to provide a snapshot of the factors that are linked to MS
behaviors. A convenience sample of 311 women was utilized and all
eligible participants admitted to the Women Imaging Unit who are 40
years of age or above, coming for mammography assessment, not
pregnant or breast feeding and who accepted to participate in the
study were included. A structured questionnaire was developed by
the researchers and contains three parts; Socio-demographic data;
Motivating factors associated with MS; and association between MS
and model of behavior change. The analyzed data indicated that most
of the participated women (66.6%) belonged to the age group of 40-
49.A high proportion of participants (58.1%) of group having
previous MS influenced by their neighbors to practice MS, whereas
32.7 % in group not having previous MS were influenced by family
members which indicated significant differences (P