Abstract: Accurate segmentation of the optic disc is very
important for computer-aided diagnosis of several ocular diseases
such as glaucoma, diabetic retinopathy, and hypertensive retinopathy.
The paper presents an accurate and fast optic disc detection and
segmentation method using an attention based fully convolutional
network. The network is trained from scratch using the fundus images
of extended MESSIDOR database and the trained model is used for
segmentation of optic disc. The false positives are removed based on
morphological operation and shape features. The result is evaluated
using three-fold cross-validation on six public fundus image databases
such as DIARETDB0, DIARETDB1, DRIVE, AV-INSPIRE, CHASE
DB1 and MESSIDOR. The attention based fully convolutional
network is robust and effective for detection and segmentation of
optic disc in the images affected by diabetic retinopathy and it
outperforms existing techniques.
Abstract: The aim of this paper is to compare and discuss better classifier algorithm options for credit risk assessment by applying different Machine Learning techniques. Using records from a Brazilian financial institution, this study uses a database of 5,432 companies that are clients of the bank, where 2,600 clients are classified as non-defaulters, 1,551 are classified as defaulters and 1,281 are temporarily defaulters, meaning that the clients are overdue on their payments for up 180 days. For each case, a total of 15 attributes was considered for a one-against-all assessment using four different techniques: Artificial Neural Networks Multilayer Perceptron (ANN-MLP), Artificial Neural Networks Radial Basis Functions (ANN-RBF), Logistic Regression (LR) and finally Support Vector Machines (SVM). For each method, different parameters were analyzed in order to obtain different results when the best of each technique was compared. Initially the data were coded in thermometer code (numerical attributes) or dummy coding (for nominal attributes). The methods were then evaluated for each parameter and the best result of each technique was compared in terms of accuracy, false positives, false negatives, true positives and true negatives. This comparison showed that the best method, in terms of accuracy, was ANN-RBF (79.20% for non-defaulter classification, 97.74% for defaulters and 75.37% for the temporarily defaulter classification). However, the best accuracy does not always represent the best technique. For instance, on the classification of temporarily defaulters, this technique, in terms of false positives, was surpassed by SVM, which had the lowest rate (0.07%) of false positive classifications. All these intrinsic details are discussed considering the results found, and an overview of what was presented is shown in the conclusion of this study.
Abstract: Telemedicine services use a large amount of data, most of which are diagnostic images in Digital Imaging and Communications in Medicine (DICOM) and Health Level Seven (HL7) formats. Metadata is generated from each related image to support their identification. This study presents the use of decision trees for the optimization of information search processes for diagnostic images, hosted on the cloud server. To analyze the performance in the server, the following quality of service (QoS) metrics are evaluated: delay, bandwidth, jitter, latency and throughput in five test scenarios for a total of 26 experiments during the loading and downloading of DICOM images, hosted by the telemedicine group server of the Universidad Militar Nueva Granada, Bogotá, Colombia. By applying decision trees as a data mining technique and comparing it with the sequential search, it was possible to evaluate the search times of diagnostic images in the server. The results show that by using the metadata in decision trees, the search times are substantially improved, the computational resources are optimized and the request management of the telemedicine image service is improved. Based on the experiments carried out, search efficiency increased by 45% in relation to the sequential search, given that, when downloading a diagnostic image, false positives are avoided in management and acquisition processes of said information. It is concluded that, for the diagnostic images services in telemedicine, the technique of decision trees guarantees the accessibility and robustness in the acquisition and manipulation of medical images, in improvement of the diagnoses and medical procedures in patients.
Abstract: The paper presents a method that utilizes figure-ground color segmentation to extract effective global feature in terms of false positive reduction in the head-shoulder detection. Conventional detectors that rely on local features such as HOG due to real-time operation suffer from false positives. Color cue in an input image provides salient information on a global characteristic which is necessary to alleviate the false positives of the local feature based detectors. An effective approach that uses figure-ground color segmentation has been presented in an effort to reduce the false positives in object detection. In this paper, an extended version of the approach is presented that adopts separate multipart foregrounds instead of a single prior foreground and performs the figure-ground color segmentation with each of the foregrounds. The multipart foregrounds include the parts of the head-shoulder shape and additional auxiliary foregrounds being optimized by a search algorithm. A classifier is constructed with the feature that consists of a set of the multiple resulting segmentations. Experimental results show that the presented method can discriminate more false positive than the single prior shape-based classifier as well as detectors with the local features. The improvement is possible because the presented approach can reduce the false positives that have the same colors in the head and shoulder foregrounds.
Abstract: Background modeling and subtraction in video
analysis has been widely used as an effective method for moving
objects detection in many computer vision applications. Recently, a
large number of approaches have been developed to tackle different
types of challenges in this field. However, the dynamic background
and illumination variations are the most frequently occurred problems
in the practical situation. This paper presents a favorable two-layer
model based on codebook algorithm incorporated with local binary
pattern (LBP) texture measure, targeted for handling dynamic
background and illumination variation problems. More specifically,
the first layer is designed by block-based codebook combining with
LBP histogram and mean value of each RGB color channel. Because
of the invariance of the LBP features with respect to monotonic
gray-scale changes, this layer can produce block wise detection results
with considerable tolerance of illumination variations. The pixel-based
codebook is employed to reinforce the precision from the output of the
first layer which is to eliminate false positives further. As a result, the
proposed approach can greatly promote the accuracy under the
circumstances of dynamic background and illumination changes.
Experimental results on several popular background subtraction
datasets demonstrate very competitive performance compared to
previous models.
Abstract: Detecting changes in multiple images of the same
scene has recently seen increased interest due to the many
contemporary applications including smart security systems, smart
homes, remote sensing, surveillance, medical diagnosis, weather
forecasting, speed and distance measurement, post-disaster forensics
and much more. These applications differ in the scale, nature, and
speed of change. This paper presents an application of image
processing techniques to implement a real-time change detection
system. Change is identified by comparing the RGB representation of
two consecutive frames captured in real-time. The detection threshold
can be controlled to account for various luminance levels. The
comparison result is passed through a filter before decision making to
reduce false positives, especially at lower luminance conditions. The
system is implemented with a MATLAB Graphical User interface
with several controls to manage its operation and performance.
Abstract: In the past few years, the amount of malicious software
increased exponentially and, therefore, machine learning algorithms
became instrumental in identifying clean and malware files through
(semi)-automated classification. When working with very large
datasets, the major challenge is to reach both a very high malware
detection rate and a very low false positive rate. Another challenge
is to minimize the time needed for the machine learning algorithm to
do so. This paper presents a comparative study between different
machine learning techniques such as linear classifiers, ensembles,
decision trees or various hybrids thereof. The training dataset consists
of approximately 2 million clean files and 200.000 infected files,
which is a realistic quantitative mixture. The paper investigates the
above mentioned methods with respect to both their performance
(detection rate and false positive rate) and their practicability.
Abstract: Bloom filter is a probabilistic and memory efficient
data structure designed to answer rapidly whether an element is
present in a set. It tells that the element is definitely not in the set but
its presence is with certain probability. The trade-off to use Bloom
filter is a certain configurable risk of false positives. The odds of a
false positive can be made very low if the number of hash function is
sufficiently large. For spam detection, weight is attached to each set
of elements. The spam weight for a word is a measure used to rate the
e-mail. Each word is assigned to a Bloom filter based on its weight.
The proposed work introduces an enhanced concept in Bloom filter
called Bin Bloom Filter (BBF). The performance of BBF over
conventional Bloom filter is evaluated under various optimization
techniques. Real time data set and synthetic data sets are used for
experimental analysis and the results are demonstrated for bin sizes 4,
5, 6 and 7. Finally analyzing the results, it is found that the BBF
which uses heuristic techniques performs better than the traditional
Bloom filter in spam detection.
Abstract: Network security attacks are the violation of
information security policy that received much attention to the
computational intelligence society in the last decades. Data mining
has become a very useful technique for detecting network intrusions
by extracting useful knowledge from large number of network data
or logs. Naïve Bayesian classifier is one of the most popular data
mining algorithm for classification, which provides an optimal way
to predict the class of an unknown example. It has been tested that
one set of probability derived from data is not good enough to have
good classification rate. In this paper, we proposed a new learning
algorithm for mining network logs to detect network intrusions
through naïve Bayesian classifier, which first clusters the network
logs into several groups based on similarity of logs, and then
calculates the prior and conditional probabilities for each group of
logs. For classifying a new log, the algorithm checks in which cluster
the log belongs and then use that cluster-s probability set to classify
the new log. We tested the performance of our proposed algorithm by
employing KDD99 benchmark network intrusion detection dataset,
and the experimental results proved that it improves detection rates
as well as reduces false positives for different types of network
intrusions.
Abstract: Cryptography provides the secure manner of
information transmission over the insecure channel. It authenticates
messages based on the key but not on the user. It requires a lengthy
key to encrypt and decrypt the sending and receiving the messages,
respectively. But these keys can be guessed or cracked. Moreover,
Maintaining and sharing lengthy, random keys in enciphering and
deciphering process is the critical problem in the cryptography
system. A new approach is described for generating a crypto key,
which is acquired from a person-s iris pattern. In the biometric field,
template created by the biometric algorithm can only be
authenticated with the same person. Among the biometric templates,
iris features can efficiently be distinguished with individuals and
produces less false positives in the larger population. This type of iris
code distribution provides merely less intra-class variability that aids
the cryptosystem to confidently decrypt messages with an exact
matching of iris pattern. In this proposed approach, the iris features
are extracted using multi resolution wavelets. It produces 135-bit iris
codes from each subject and is used for encrypting/decrypting the
messages. The autocorrelators are used to recall original messages
from the partially corrupted data produced by the decryption process.
It intends to resolve the repudiation and key management problems.
Results were analyzed in both conventional iris cryptography system
(CIC) and non-repudiation iris cryptography system (NRIC). It
shows that this new approach provides considerably high
authentication in enciphering and deciphering processes.
Abstract: In this paper, a new learning approach for network
intrusion detection using naïve Bayesian classifier and ID3 algorithm
is presented, which identifies effective attributes from the training
dataset, calculates the conditional probabilities for the best attribute
values, and then correctly classifies all the examples of training and
testing dataset. Most of the current intrusion detection datasets are
dynamic, complex and contain large number of attributes. Some of
the attributes may be redundant or contribute little for detection
making. It has been successfully tested that significant attribute
selection is important to design a real world intrusion detection
systems (IDS). The purpose of this study is to identify effective
attributes from the training dataset to build a classifier for network
intrusion detection using data mining algorithms. The experimental
results on KDD99 benchmark intrusion detection dataset demonstrate
that this new approach achieves high classification rates and reduce
false positives using limited computational resources.
Abstract: Static analysis of source code is used for auditing web
applications to detect the vulnerabilities. In this paper, we propose a
new algorithm to analyze the PHP source code for detecting LFI and
RFI potential vulnerabilities. In our approach, we first define some
patterns for finding some functions which have potential to be abused
because of unhandled user inputs. More precisely, we use regular
expression as a fast and simple method to define some patterns for
detection of vulnerabilities. As inclusion functions could be also used
in a safe way, there could occur many false positives (FP). The first
cause of these FP-s could be that the function does not use a usersupplied
variable as an argument. So, we extract a list of usersupplied
variables to be used for detecting vulnerable lines of code.
On the other side, as vulnerability could spread among the variables
like by multi-level assignment, we also try to extract the hidden usersupplied
variables. We use the resulted list to decrease the false
positives of our method. Finally, as there exist some ways to prevent
the vulnerability of inclusion functions, we define also some patterns
to detect them and decrease our false positives.
Abstract: Phishing, or stealing of sensitive information on the
web, has dealt a major blow to Internet Security in recent times. Most
of the existing anti-phishing solutions fail to handle the fuzziness
involved in phish detection, thus leading to a large number of false
positives. This fuzziness is attributed to the use of highly flexible and
at the same time, highly ambiguous HTML language. We introduce a
new perspective against phishing, that tries to systematically prove,
whether a given page is phished or not, using the corresponding
original page as the basis of the comparison. It analyzes the layout of
the pages under consideration to determine the percentage distortion
between them, indicative of any form of malicious alteration. The
system design represents an intelligent system, employing dynamic
assessment which accurately identifies brand new phishing attacks
and will prove effective in reducing the number of false positives.
This framework could potentially be used as a knowledge base, in
educating the internet users against phishing.
Abstract: It has been established that microRNAs (miRNAs) play
an important role in gene expression by post-transcriptional regulation
of messengerRNAs (mRNAs). However, the precise relationships
between microRNAs and their target genes in sense of numbers,
types and biological relevance remain largely unclear. Dissecting the
miRNA-target relationships will render more insights for miRNA
targets identification and validation therefore promote the understanding
of miRNA function. In miRBase, miRanda is the key
algorithm used for target prediction for Zebrafish. This algorithm
is high-throughput but brings lots of false positives (noise). Since
validation of a large scale of targets through laboratory experiments
is very time consuming, several computational methods for miRNA
targets validation should be developed. In this paper, we present an
integrative method to investigate several aspects of the relationships
between miRNAs and their targets with the final purpose of extracting
high confident targets from miRanda predicted targets pool. This is
achieved by using the techniques ranging from statistical tests to
clustering and association rules. Our research focuses on Zebrafish.
It was found that validated targets do not necessarily associate with
the highest sequence matching. Besides, for some miRNA families,
the frequency of their predicted targets is significantly higher in the
genomic region nearby their own physical location. Finally, in a case
study of dre-miR-10 and dre-miR-196, it was found that the predicted
target genes hoxd13a, hoxd11a, hoxd10a and hoxc4a of dre-miR-
10 while hoxa9a, hoxc8a and hoxa13a of dre-miR-196 have similar
characteristics as validated target genes and therefore represent high
confidence target candidates.
Abstract: In this paper, we present a new learning algorithm for
anomaly based network intrusion detection using improved self
adaptive naïve Bayesian tree (NBTree), which induces a hybrid of
decision tree and naïve Bayesian classifier. The proposed approach
scales up the balance detections for different attack types and keeps
the false positives at acceptable level in intrusion detection. In
complex and dynamic large intrusion detection dataset, the detection
accuracy of naïve Bayesian classifier does not scale up as well as
decision tree. It has been successfully tested in other problem
domains that naïve Bayesian tree improves the classification rates in
large dataset. In naïve Bayesian tree nodes contain and split as
regular decision-trees, but the leaves contain naïve Bayesian
classifiers. The experimental results on KDD99 benchmark network
intrusion detection dataset demonstrate that this new approach scales
up the detection rates for different attack types and reduces false
positives in network intrusion detection.