Real-Time Data Stream Partitioning over a Sliding Window in Real-Time Spatial Big Data

In recent years, real-time spatial applications, like
location-aware services and traffic monitoring, have become more
and more important. Such applications result dynamic environments
where data as well as queries are continuously moving. As a result,
there is a tremendous amount of real-time spatial data generated
every day. The growth of the data volume seems to outspeed the
advance of our computing infrastructure. For instance, in real-time
spatial Big Data, users expect to receive the results of each query
within a short time period without holding in account the load
of the system. But with a huge amount of real-time spatial data
generated, the system performance degrades rapidly especially in
overload situations. To solve this problem, we propose the use of
data partitioning as an optimization technique. Traditional horizontal
and vertical partitioning can increase the performance of the system
and simplify data management. But they remain insufficient for
real-time spatial Big data; they can’t deal with real-time and
stream queries efficiently. Thus, in this paper, we propose a novel
data partitioning approach for real-time spatial Big data named
VPA-RTSBD (Vertical Partitioning Approach for Real-Time Spatial
Big data). This contribution is an implementation of the Matching
algorithm for traditional vertical partitioning. We find, firstly, the
optimal attribute sequence by the use of Matching algorithm. Then,
we propose a new cost model used for database partitioning, for
keeping the data amount of each partition more balanced limit and
for providing a parallel execution guarantees for the most frequent
queries. VPA-RTSBD aims to obtain a real-time partitioning scheme
and deals with stream data. It improves the performance of query
execution by maximizing the degree of parallel execution. This affects
QoS (Quality Of Service) improvement in real-time spatial Big Data
especially with a huge volume of stream data. The performance of
our contribution is evaluated via simulation experiments. The results
show that the proposed algorithm is both efficient and scalable, and
that it outperforms comparable algorithms.




References:
[1] A. Jindal J. Dittrich, Relax and let the database do the partitioning online.
In International Workshop on Business Intelligence for the Real-Time
Enterprise (pp. 65-80). Springer, Berlin, Heidelberg, 2011, September.
[2] C. Curino, E. Jones, Y. Zhang S. Madden, Schism: a workload-driven
approach to database replication and partitioning. Proceedings of the
VLDB Endowment, 3(1-2), 48-57, 2010.
[3] D. A. Shraddha Phansalkar, Transaction aware vertical partitioning of
database (TAVPD) for responsive OLTP applications in cloud data stores.
Journal of Theoretical and Applied Information Technology, 59(1), 2014.
[4] D. W. Comer S. Y. Philip, A vertical partitioning algorithm for
relational databases. In Data Engineering, 1987 IEEE Third International
Conference on (pp. 30-35). IEEE, 1987, February.
[5] D. W. Cornell P. S. Yu, An effective approach to vertical partitioning for
physical design of relational databases. IEEE Transactions on Software
engineering, 16(2), 248-258, 1990.
[6] G. Karypis V. Kumar, METIS–unstructured graph partitioning and sparse
matrix ordering system, version 2.0., 1995.
[7] J. A. Hoffer D. G. Severance, The use of cluster analysis in physical
data base design. In Proceedings of the 1st International Conference on
Very Large Data Bases (pp. 69-86). ACM, 1975, September.
[8] J. H. Son M. H. Kim, An adaptable vertical partitioning method in
distributed systems. Journal of Systems and Software, 73(3), 551-561,
2004.
[9] L. Rodrguez X. Li, A support-based vertical partitioning method
for database design. In Electrical Engineering Computing Science and
Automatic Control (CCE), 2011 8th International Conference on (pp.
1-6). IEEE, 2011, October.
[10] M. Guo H. Kang, The Implementation of Database Partitioning Based
on Streaming Framework. In Web Information Systems and Applications
Conference, 2016 13th(pp. 157-162). IEEE, 2016, September.
[11] M.F. Mokbel, X. Xiong, W.G. Aref, S.E. Hambrusch, S. Prabhakar
M.A. Hammad, PALACE: a query processor for handling real-time
spatio-temporal data streams, In Proceedings of the Thirtieth international
conference on Very large data bases-Volume 30, VLDB Endowment,
August, pp.1377-1380, 2004.
[12] M. Hammer B. Niamir, A heuristic approach to attribute partitioning.
In Proceedings of the 1979 ACM SIGMOD international conference on
Management of data (pp. 93-101). ACM, 1979, May.
[13] M. Liroz-Gistau, R. Akbarinia, E. Pacitti, F. Porton P. Valduriez,
Dynamic workload-based partitioning for large-scale databases. In
International Conference on Database and Expert Systems Applications
(pp. 183-190). Springer, Berlin, Heidelberg, 2012, September.
[14] M. V. Bhat A. Haupt, An efficient clustering algorithm. IEEE
Transactions on Systems, Man, and Cybernetics, (1), 61-64, 1976.
[15] P. A. Bernstein, I. Cseri, N. Dani, N. Ellis, A. Kalhan, G. Kakivaya, ...
T. Talius, Adapting microsoft SQL server for cloud computing. In Data
Engineering (ICDE), 2011 IEEE 27th International Conference on (pp.
1255-1263). IEEE, 2011, April.
[16] S. Agrawal, V. Narasayya B. Yang, Integrating vertical and horizontal
partitioning into automated physical database design. In Proceedings of
the 2004 ACM SIGMOD international conference on Management of
data (pp. 359-370). ACM, 2004, June.
[17] S. Ahirrao R. Ingle, Scalable transactions in cloud data stores. Journal
of Cloud Computing: Advances and Applications, SpringerOpen, 4:1-14,
2015.
[18] S. Hamdi, E. Bouazizi S. Faiz, A new qos management approach
in real-time gis with heterogeneous real-time geospatial data using a
feedback control scheduling. In Proceedings of the 19th International
Database Engineering Applications Symposium, pages 174179. ACM,
2015.
[19] S. Hamdi, E. Bouazizi S. Faiz, A Speculative Concurrency Control in
Real-Time Spatial Big Data Using Real-Time Nested Spatial Transactions
and Imprecise Computation. In Computer Systems and Applications
(AICCSA), 2017 IEEE/ACS 14th International Conference on (pp.
534-540). IEEE, 2017, October.
[20] S. Das, A. El Abbadi D. Agrawal, ElasTraS: An Elastic Transactional
Data Store in the Cloud. HotCloud, 9, 131-142, 2009.
[21] S. Navathe, S. Ceri, G. Wiederhold J. Dou, Vertical partitioning
algorithms for database design. ACM Transactions on Database Systems
(TODS), 9(4), 680-710, 1984.
[22] S. Navathe M. Ra, Vertical partitioning for database design: a graphical
algorithm. In ACM Sigmod Record(Vol. 18, No. 2, pp. 440-450). ACM,
1989, June.
[23] S. Papadomanolakis A. Ailamaki, An integer linear programming
approach to database design. In Data Engineering Workshop, 2007 IEEE
23rd International Conference on (pp. 442-449). IEEE, 2007, April.
[24] S. Phansalkar S. Ahirrao, Survey of data partitioning algorithms for big
data stores. In Parallel, Distributed and Grid Computing (PDGC), 2016
Fourth International Conference on (pp. 163-168). IEEE, 2016, December.
[25] TPC-DS: Transaction Performance Processing Council for Decision
Support-decision Support [online] http://www.tpc.org/tpcds/ [accessed,
April 30, 2014].
[26] W. W. Chu I. T. Ieong, A transaction-based approach to vertical
partitioning for relational database systems. IEEE Transactions on
Software Engineering, 19(8), 804-812, 1993.
[27] W. Zhao, Y. Cheng F. Rusu, Workload-Driven Vertical Partitioning for
Effective Query Processing over Raw Data., 2015.
[28] X. Lin, M. Orlowska Y. Zhang, A graph based cluster approach for
vertical partitioning in database design. Data Knowledge Engineering,
11(2), 151-169, 1993.