Abstract: In recent years, real-time spatial applications, like
location-aware services and traffic monitoring, have become more
and more important. Such applications result dynamic environments
where data as well as queries are continuously moving. As a result,
there is a tremendous amount of real-time spatial data generated
every day. The growth of the data volume seems to outspeed the
advance of our computing infrastructure. For instance, in real-time
spatial Big Data, users expect to receive the results of each query
within a short time period without holding in account the load
of the system. But with a huge amount of real-time spatial data
generated, the system performance degrades rapidly especially in
overload situations. To solve this problem, we propose the use of
data partitioning as an optimization technique. Traditional horizontal
and vertical partitioning can increase the performance of the system
and simplify data management. But they remain insufficient for
real-time spatial Big data; they can’t deal with real-time and
stream queries efficiently. Thus, in this paper, we propose a novel
data partitioning approach for real-time spatial Big data named
VPA-RTSBD (Vertical Partitioning Approach for Real-Time Spatial
Big data). This contribution is an implementation of the Matching
algorithm for traditional vertical partitioning. We find, firstly, the
optimal attribute sequence by the use of Matching algorithm. Then,
we propose a new cost model used for database partitioning, for
keeping the data amount of each partition more balanced limit and
for providing a parallel execution guarantees for the most frequent
queries. VPA-RTSBD aims to obtain a real-time partitioning scheme
and deals with stream data. It improves the performance of query
execution by maximizing the degree of parallel execution. This affects
QoS (Quality Of Service) improvement in real-time spatial Big Data
especially with a huge volume of stream data. The performance of
our contribution is evaluated via simulation experiments. The results
show that the proposed algorithm is both efficient and scalable, and
that it outperforms comparable algorithms.
Abstract: Predictive data analysis and modeling involving machine learning techniques become challenging in presence of too many explanatory variables or features. Presence of too many features in machine learning is known to not only cause algorithms to slow down, but they can also lead to decrease in model prediction accuracy. This study involves housing dataset with 79 quantitative and qualitative features that describe various aspects people consider while buying a new house. Boruta algorithm that supports feature selection using a wrapper approach build around random forest is used in this study. This feature selection process leads to 49 confirmed features which are then used for developing predictive random forest models. The study also explores five different data partitioning ratios and their impact on model accuracy are captured using coefficient of determination (r-square) and root mean square error (rsme).
Abstract: As the number of networked computers grows,
intrusion detection is an essential component in keeping networks
secure. Various approaches for intrusion detection are currently
being in use with each one has its own merits and demerits. This
paper presents our work to test and improve the performance of a
new class of decision tree c-fuzzy decision tree to detect intrusion.
The work also includes identifying best candidate feature sub set to
build the efficient c-fuzzy decision tree based Intrusion Detection
System (IDS). We investigated the usefulness of c-fuzzy decision
tree for developing IDS with a data partition based on horizontal
fragmentation. Empirical results indicate the usefulness of our
approach in developing the efficient IDS.
Abstract: Flexible macroblock ordering (FMO), adopted in the
H.264 standard, allows to partition all macroblocks (MBs) in a frame
into separate groups of MBs called Slice Groups (SGs). FMO can not
only support error-resilience, but also control the size of video packets
for different network types. However, it is well-known that the number
of bits required for encoding the frame is increased by adopting FMO.
In this paper, we propose a novel algorithm that can reduce the bitrate
overhead caused by utilizing FMO. In the proposed algorithm, all MBs
are grouped in SGs based on the similarity of the transform
coefficients. Experimental results show that our algorithm can reduce
the bitrate as compared with conventional FMO.
Abstract: The demand for higher performance graphics
continues to grow because of the incessant desire towards realism.
And, rapid advances in fabrication technology have enabled us to
build several processor cores on a single die. Hence, it is important to
develop single chip parallel architectures for such data-intensive
applications. In this paper, we propose an efficient PIM architectures
tailored for computer graphics which requires a large number of
memory accesses. We then address the two important tasks necessary
for maximally exploiting the parallelism provided by the architecture,
namely, partitioning and placement of graphic data, which affect
respectively load balances and communication costs. Under the
constraints of uniform partitioning, we develop approaches for optimal
partitioning and placement, which significantly reduce search space.
We also present heuristics for identifying near-optimal placement,
since the search space for placement is impractically large despite our
optimization. We then demonstrate the effectiveness of our partitioning
and placement approaches via analysis of example scenes; simulation
results show considerable search space reductions, and our heuristics
for placement performs close to optimal – the average ratio of
communication overheads between our heuristics and the optimal was
1.05. Our uniform partitioning showed average load-balance ratio of
1.47 for geometry processing and 1.44 for rasterization, which is
reasonable.
Abstract: In this paper, we propose an algorithm to compute
initial cluster centers for K-means clustering. Data in a cell is
partitioned using a cutting plane that divides cell in two smaller cells.
The plane is perpendicular to the data axis with the highest variance
and is designed to reduce the sum squared errors of the two cells as
much as possible, while at the same time keep the two cells far apart
as possible. Cells are partitioned one at a time until the number of
cells equals to the predefined number of clusters, K. The centers of
the K cells become the initial cluster centers for K-means. The
experimental results suggest that the proposed algorithm is effective,
converge to better clustering results than those of the random
initialization method. The research also indicated the proposed
algorithm would greatly improve the likelihood of every cluster
containing some data in it.
Abstract: The H.264/AVC standard is a highly efficient video
codec providing high-quality videos at low bit-rates. As employing
advanced techniques, the computational complexity has been
increased. The complexity brings about the major problem in the
implementation of a real-time encoder and decoder. Parallelism is the
one of approaches which can be implemented by multi-core system.
We analyze macroblock-level parallelism which ensures the same bit
rate with high concurrency of processors. In order to reduce the
encoding time, dynamic data partition based on macroblock region is
proposed. The data partition has the advantages in load balancing and
data communication overhead. Using the data partition, the encoder
obtains more than 3.59x speed-up on a four-processor system. This
work can be applied to other multimedia processing applications.