Abstract: Bloom filter is a probabilistic and memory efficient
data structure designed to answer rapidly whether an element is
present in a set. It tells that the element is definitely not in the set but
its presence is with certain probability. The trade-off to use Bloom
filter is a certain configurable risk of false positives. The odds of a
false positive can be made very low if the number of hash function is
sufficiently large. For spam detection, weight is attached to each set
of elements. The spam weight for a word is a measure used to rate the
e-mail. Each word is assigned to a Bloom filter based on its weight.
The proposed work introduces an enhanced concept in Bloom filter
called Bin Bloom Filter (BBF). The performance of BBF over
conventional Bloom filter is evaluated under various optimization
techniques. Real time data set and synthetic data sets are used for
experimental analysis and the results are demonstrated for bin sizes 4,
5, 6 and 7. Finally analyzing the results, it is found that the BBF
which uses heuristic techniques performs better than the traditional
Bloom filter in spam detection.
Abstract: This study uses simulated meta-analysis to assess the effects of publication bias on meta-analysis estimates and to evaluate the efficacy of the trim and fill method in adjusting for these biases. The estimated effect sizes and the standard error were evaluated in terms of the statistical bias and the coverage probability. The results demonstrate that if publication bias is not adjusted it could lead to up to 40% bias in the treatment effect estimates. Utilization of the trim and fill method could reduce the bias in the overall estimate by more than half. The method is optimum in presence of moderate underlying bias but has minimal effects in presence of low and severe publication bias. Additionally, the trim and fill method improves the coverage probability by more than half when subjected to the same level of publication bias as those of the unadjusted data. The method however tends to produce false positive results and will incorrectly adjust the data for publication bias up to 45 % of the time. Nonetheless, the bias introduced into the estimates due to this adjustment is minimal
Abstract: Network security attacks are the violation of
information security policy that received much attention to the
computational intelligence society in the last decades. Data mining
has become a very useful technique for detecting network intrusions
by extracting useful knowledge from large number of network data
or logs. Naïve Bayesian classifier is one of the most popular data
mining algorithm for classification, which provides an optimal way
to predict the class of an unknown example. It has been tested that
one set of probability derived from data is not good enough to have
good classification rate. In this paper, we proposed a new learning
algorithm for mining network logs to detect network intrusions
through naïve Bayesian classifier, which first clusters the network
logs into several groups based on similarity of logs, and then
calculates the prior and conditional probabilities for each group of
logs. For classifying a new log, the algorithm checks in which cluster
the log belongs and then use that cluster-s probability set to classify
the new log. We tested the performance of our proposed algorithm by
employing KDD99 benchmark network intrusion detection dataset,
and the experimental results proved that it improves detection rates
as well as reduces false positives for different types of network
intrusions.
Abstract: This paper presents a new technique for detection of
human faces within color images. The approach relies on image
segmentation based on skin color, features extracted from the two-dimensional
discrete cosine transform (DCT), and self-organizing
maps (SOM). After candidate skin regions are extracted, feature
vectors are constructed using DCT coefficients computed from those
regions. A supervised SOM training session is used to cluster feature
vectors into groups, and to assign “face" or “non-face" labels to those
clusters. Evaluation was performed using a new image database of
286 images, containing 1027 faces. After training, our detection
technique achieved a detection rate of 77.94% during subsequent
tests, with a false positive rate of 5.14%. To our knowledge, the
proposed technique is the first to combine DCT-based feature
extraction with a SOM for detecting human faces within color
images. It is also one of a few attempts to combine a feature-invariant
approach, such as color-based skin segmentation, together with
appearance-based face detection. The main advantage of the new
technique is its low computational requirements, in terms of both
processing speed and memory utilization.
Abstract: Cryptography provides the secure manner of
information transmission over the insecure channel. It authenticates
messages based on the key but not on the user. It requires a lengthy
key to encrypt and decrypt the sending and receiving the messages,
respectively. But these keys can be guessed or cracked. Moreover,
Maintaining and sharing lengthy, random keys in enciphering and
deciphering process is the critical problem in the cryptography
system. A new approach is described for generating a crypto key,
which is acquired from a person-s iris pattern. In the biometric field,
template created by the biometric algorithm can only be
authenticated with the same person. Among the biometric templates,
iris features can efficiently be distinguished with individuals and
produces less false positives in the larger population. This type of iris
code distribution provides merely less intra-class variability that aids
the cryptosystem to confidently decrypt messages with an exact
matching of iris pattern. In this proposed approach, the iris features
are extracted using multi resolution wavelets. It produces 135-bit iris
codes from each subject and is used for encrypting/decrypting the
messages. The autocorrelators are used to recall original messages
from the partially corrupted data produced by the decryption process.
It intends to resolve the repudiation and key management problems.
Results were analyzed in both conventional iris cryptography system
(CIC) and non-repudiation iris cryptography system (NRIC). It
shows that this new approach provides considerably high
authentication in enciphering and deciphering processes.
Abstract: In this paper, a new learning approach for network
intrusion detection using naïve Bayesian classifier and ID3 algorithm
is presented, which identifies effective attributes from the training
dataset, calculates the conditional probabilities for the best attribute
values, and then correctly classifies all the examples of training and
testing dataset. Most of the current intrusion detection datasets are
dynamic, complex and contain large number of attributes. Some of
the attributes may be redundant or contribute little for detection
making. It has been successfully tested that significant attribute
selection is important to design a real world intrusion detection
systems (IDS). The purpose of this study is to identify effective
attributes from the training dataset to build a classifier for network
intrusion detection using data mining algorithms. The experimental
results on KDD99 benchmark intrusion detection dataset demonstrate
that this new approach achieves high classification rates and reduce
false positives using limited computational resources.
Abstract: Static analysis of source code is used for auditing web
applications to detect the vulnerabilities. In this paper, we propose a
new algorithm to analyze the PHP source code for detecting LFI and
RFI potential vulnerabilities. In our approach, we first define some
patterns for finding some functions which have potential to be abused
because of unhandled user inputs. More precisely, we use regular
expression as a fast and simple method to define some patterns for
detection of vulnerabilities. As inclusion functions could be also used
in a safe way, there could occur many false positives (FP). The first
cause of these FP-s could be that the function does not use a usersupplied
variable as an argument. So, we extract a list of usersupplied
variables to be used for detecting vulnerable lines of code.
On the other side, as vulnerability could spread among the variables
like by multi-level assignment, we also try to extract the hidden usersupplied
variables. We use the resulted list to decrease the false
positives of our method. Finally, as there exist some ways to prevent
the vulnerability of inclusion functions, we define also some patterns
to detect them and decrease our false positives.
Abstract: Breast cancer detection techniques have been reported
to aid radiologists in analyzing mammograms. We note that most
techniques are performed on uncompressed digital mammograms.
Mammogram images are huge in size necessitating the use of
compression to reduce storage/transmission requirements. In this
paper, we present an algorithm for the detection of
microcalcifications in the JPEG2000 domain. The algorithm is based
on the statistical properties of the wavelet transform that the
JPEG2000 coder employs. Simulation results were carried out at
different compression ratios. The sensitivity of this algorithm ranges
from 92% with a false positive rate of 4.7 down to 66% with a false
positive rate of 2.1 using lossless compression and lossy compression
at a compression ratio of 100:1, respectively.
Abstract: Phishing, or stealing of sensitive information on the
web, has dealt a major blow to Internet Security in recent times. Most
of the existing anti-phishing solutions fail to handle the fuzziness
involved in phish detection, thus leading to a large number of false
positives. This fuzziness is attributed to the use of highly flexible and
at the same time, highly ambiguous HTML language. We introduce a
new perspective against phishing, that tries to systematically prove,
whether a given page is phished or not, using the corresponding
original page as the basis of the comparison. It analyzes the layout of
the pages under consideration to determine the percentage distortion
between them, indicative of any form of malicious alteration. The
system design represents an intelligent system, employing dynamic
assessment which accurately identifies brand new phishing attacks
and will prove effective in reducing the number of false positives.
This framework could potentially be used as a knowledge base, in
educating the internet users against phishing.
Abstract: It has been established that microRNAs (miRNAs) play
an important role in gene expression by post-transcriptional regulation
of messengerRNAs (mRNAs). However, the precise relationships
between microRNAs and their target genes in sense of numbers,
types and biological relevance remain largely unclear. Dissecting the
miRNA-target relationships will render more insights for miRNA
targets identification and validation therefore promote the understanding
of miRNA function. In miRBase, miRanda is the key
algorithm used for target prediction for Zebrafish. This algorithm
is high-throughput but brings lots of false positives (noise). Since
validation of a large scale of targets through laboratory experiments
is very time consuming, several computational methods for miRNA
targets validation should be developed. In this paper, we present an
integrative method to investigate several aspects of the relationships
between miRNAs and their targets with the final purpose of extracting
high confident targets from miRanda predicted targets pool. This is
achieved by using the techniques ranging from statistical tests to
clustering and association rules. Our research focuses on Zebrafish.
It was found that validated targets do not necessarily associate with
the highest sequence matching. Besides, for some miRNA families,
the frequency of their predicted targets is significantly higher in the
genomic region nearby their own physical location. Finally, in a case
study of dre-miR-10 and dre-miR-196, it was found that the predicted
target genes hoxd13a, hoxd11a, hoxd10a and hoxc4a of dre-miR-
10 while hoxa9a, hoxc8a and hoxa13a of dre-miR-196 have similar
characteristics as validated target genes and therefore represent high
confidence target candidates.
Abstract: It is important problems to increase the detection rates
and reduce false positive rates in Intrusion Detection System (IDS).
Although preventative techniques such as access control and
authentication attempt to prevent intruders, these can fail, and as a
second line of defence, intrusion detection has been introduced. Rare
events are events that occur very infrequently, detection of rare
events is a common problem in many domains. In this paper we
propose an intrusion detection method that combines Rough set and
Fuzzy Clustering. Rough set has to decrease the amount of data and
get rid of redundancy. Fuzzy c-means clustering allow objects to
belong to several clusters simultaneously, with different degrees of
membership. Our approach allows us to recognize not only known
attacks but also to detect suspicious activity that may be the result of
a new, unknown attack. The experimental results on Knowledge
Discovery and Data Mining-(KDDCup 1999) Dataset show that the
method is efficient and practical for intrusion detection systems.
Abstract: Search for a tertiary substructure that geometrically
matches the 3D pattern of the binding site of a well-studied protein provides a solution to predict protein functions. In our previous work,
a web server has been built to predict protein-ligand binding sites
based on automatically extracted templates. However, a drawback of such templates is that the web server was prone to resulting in many
false positive matches. In this study, we present a sequence-order constraint to reduce the false positive matches of using automatically
extracted templates to predict protein-ligand binding sites. The binding site predictor comprises i) an automatically constructed template library and ii) a local structure alignment algorithm for
querying the library. The sequence-order constraint is employed to
identify the inconsistency between the local regions of the query protein and the templates. Experimental results reveal that the sequence-order constraint can largely reduce the false positive matches and is effective for template-based binding site prediction.
Abstract: In this paper, we present a new learning algorithm for
anomaly based network intrusion detection using improved self
adaptive naïve Bayesian tree (NBTree), which induces a hybrid of
decision tree and naïve Bayesian classifier. The proposed approach
scales up the balance detections for different attack types and keeps
the false positives at acceptable level in intrusion detection. In
complex and dynamic large intrusion detection dataset, the detection
accuracy of naïve Bayesian classifier does not scale up as well as
decision tree. It has been successfully tested in other problem
domains that naïve Bayesian tree improves the classification rates in
large dataset. In naïve Bayesian tree nodes contain and split as
regular decision-trees, but the leaves contain naïve Bayesian
classifiers. The experimental results on KDD99 benchmark network
intrusion detection dataset demonstrate that this new approach scales
up the detection rates for different attack types and reduces false
positives in network intrusion detection.