Abstract: In this paper we propose new method for
simultaneous generating multiple quantiles corresponding to given
probability levels from data streams and massive data sets. This
method provides a basis for development of single-pass low-storage
quantile estimation algorithms, which differ in complexity, storage
requirement and accuracy. We demonstrate that such algorithms may
perform well even for heavy-tailed data.
Abstract: This paper investigates the problem of sampling from transactional data streams. We introduce CFISDS as a content based sampling algorithm that works on a landmark window model of data streams and preserve more informed sample in sample space. This algorithm that work based on closed frequent itemset mining tasks, first initiate a concept lattice using initial data, then update lattice structure using an incremental mechanism.Incremental mechanism insert, update and delete nodes in/from concept lattice in batch manner. Presented algorithm extracts the final samples on demand of user. Experimental results show the accuracy of CFISDS on synthetic and real datasets, despite on CFISDS algorithm is not faster than exist sampling algorithms such as Z and DSS.
Abstract: In metal cutting industries, mathematical/statistical
models are typically used to predict tool replacement time. These
off-line methods usually result in less than optimum replacement
time thereby either wasting resources or causing quality problems.
The few online real-time methods proposed use indirect measurement
techniques and are prone to similar errors. Our idea is based on
identifying the optimal replacement time using an electronic nose to
detect the airborne compounds released when the tool wear reaches
to a chemical substrate doped into tool material during the
fabrication. The study investigates the feasibility of the idea, possible
doping materials and methods along with data stream mining
techniques for detection and monitoring different phases of tool
wear.
Abstract: Frequent pattern discovery over data stream is a hard
problem because a continuously generated nature of stream does not
allow a revisit on each data element. Furthermore, pattern discovery
process must be fast to produce timely results. Based on these
requirements, we propose an approximate approach to tackle the
problem of discovering frequent patterns over continuous stream.
Our approximation algorithm is intended to be applied to process a
stream prior to the pattern discovery process. The results of
approximate frequent pattern discovery have been reported in the
paper.
Abstract: This paper presents the design and implements the prototype of an intelligent data processing framework in ubiquitous sensor networks. Much focus is put on how to handle the sensor data stream as well as the interoperability between the low-level sensor data and application clients. Our framework first addresses systematic middleware which mitigates the interaction between the application layer and low-level sensors, for the sake of analyzing a great volume of sensor data by filtering and integrating to create value-added context information. Then, an agent-based architecture is proposed for real-time data distribution to efficiently forward a specific event to the appropriate application registered in the directory service via the open interface. The prototype implementation demonstrates that our framework can host a sophisticated application on the ubiquitous sensor network and it can autonomously evolve to new middleware, taking advantages of promising technologies such as software agents, XML, cloud computing, and the like.
Abstract: In Grid computing, a data transfer protocol called
GridFTP has been widely used for efficiently transferring a large volume
of data. Currently, two versions of GridFTP protocols, GridFTP
version 1 (GridFTP v1) and GridFTP version 2 (GridFTP v2), have
been proposed in the GGF. GridFTP v2 supports several advanced
features such as data streaming, dynamic resource allocation, and
checksum transfer, by defining a transfer mode called X-block mode.
However, in the literature, effectiveness of GridFTP v2 has not been
fully investigated. In this paper, we therefore quantitatively evaluate
performance of GridFTP v1 and GridFTP v2 using mathematical
analysis and simulation experiments. We reveal the performance
limitation of GridFTP v1, and quantitatively show effectiveness of
GridFTP v2. Through several numerical examples, we show that by
utilizing the data streaming feature, the average file transfer time of
GridFTP v2 is significantly smaller than that of GridFTP v1.
Abstract: Data stream analysis is the process of computing
various summaries and derived values from large amounts of data
which are continuously generated at a rapid rate. The nature of a
stream does not allow a revisit on each data element. Furthermore,
data processing must be fast to produce timely analysis results. These
requirements impose constraints on the design of the algorithms to
balance correctness against timely responses. Several techniques
have been proposed over the past few years to address these
challenges. These techniques can be categorized as either dataoriented
or task-oriented. The data-oriented approach analyzes a
subset of data or a smaller transformed representation, whereas taskoriented
scheme solves the problem directly via approximation
techniques. We propose a hybrid approach to tackle the data stream
analysis problem. The data stream has been both statistically
transformed to a smaller size and computationally approximated its
characteristics. We adopt a Monte Carlo method in the approximation
step. The data reduction has been performed horizontally and
vertically through our EMR sampling method. The proposed method
is analyzed by a series of experiments. We apply our algorithm on
clustering and classification tasks to evaluate the utility of our
approach.
Abstract: Recent developments in storage technology and
networking architectures have made it possible for broad areas of applications to rely on data streams for quick response and accurate
decision making. Data streams are generated from events of real world so existence of associations, which are among the occurrence of these events in real world, among concepts of data streams is
logical. Extraction of these hidden associations can be useful for prediction of subsequent concepts in concept shifting data streams. In this paper we present a new method for learning association among
concepts of data stream and prediction of what the next concept will be. Knowing the next concept, an informed update of data model will be possible. The results of conducted experiments show that the proposed method is proper for classification of concept shifting data
streams.
Abstract: In this paper, we first consider the quality of service
problems in heterogeneous wireless networks for sending the video
data, which their problem of being real-time is pronounced. At last,
we present a method for ensuring the end-to-end quality of service at
application layer level for adaptable sending of the video data at
heterogeneous wireless networks. To do this, mechanism in different
layers has been used. We have used the stop mechanism, the
adaptation mechanism and the graceful degrade at the application
layer, the multi-level congestion feedback mechanism in the network
layer and connection cutting off decision mechanism in the link
layer. At the end, the presented method and the achieved
improvement is simulated and presented in the NS-2 software.
Abstract: Knowledge is indispensable but voluminous knowledge becomes a bottleneck for efficient processing. A great challenge for data mining activity is the generation of large number of potential rules as a result of mining process. In fact sometimes result size is comparable to the original data. Traditional data mining pruning activities such as support do not sufficiently reduce the huge rule space. Moreover, many practical applications are characterized by continual change of data and knowledge, thereby making knowledge voluminous with each change. The most predominant representation of the discovered knowledge is the standard Production Rules (PRs) in the form If P Then D. Michalski & Winston proposed Censored Production Rules (CPRs), as an extension of production rules, that exhibit variable precision and supports an efficient mechanism for handling exceptions. A CPR is an augmented production rule of the form: If P Then D Unless C, where C (Censor) is an exception to the rule. Such rules are employed in situations in which the conditional statement 'If P Then D' holds frequently and the assertion C holds rarely. By using a rule of this type we are free to ignore the exception conditions, when the resources needed to establish its presence, are tight or there is simply no information available as to whether it holds or not. Thus the 'If P Then D' part of the CPR expresses important information while the Unless C part acts only as a switch changes the polarity of D to ~D. In this paper a scheme based on Dempster-Shafer Theory (DST) interpretation of a CPR is suggested for discovering CPRs from the discovered flat PRs. The discovery of CPRs from flat rules would result in considerable reduction of the already discovered rules. The proposed scheme incrementally incorporates new knowledge and also reduces the size of knowledge base considerably with each episode. Examples are given to demonstrate the behaviour of the proposed scheme. The suggested cumulative learning scheme would be useful in mining data streams.
Abstract: The increasing importance of data stream arising in a
wide range of advanced applications has led to the extensive study of
mining frequent patterns. Mining data streams poses many new
challenges amongst which are the one-scan nature, the unbounded
memory requirement and the high arrival rate of data streams. In this
paper, we propose a new approach for mining itemsets on data
stream. Our approach SFIDS has been developed based on FIDS
algorithm. The main attempts were to keep some advantages of the
previous approach and resolve some of its drawbacks, and
consequently to improve run time and memory consumption. Our
approach has the following advantages: using a data structure similar
to lattice for keeping frequent itemsets, separating regions from each
other with deleting common nodes that results in a decrease in search
space, memory consumption and run time; and Finally, considering
CPU constraint, with increasing arrival rate of data that result in
overloading system, SFIDS automatically detect this situation and
discard some of unprocessing data. We guarantee that error of results
is bounded to user pre-specified threshold, based on a probability
technique. Final results show that SFIDS algorithm could attain
about 50% run time improvement than FIDS approach.