Enhanced Disk-Based Databases Towards Improved Hybrid In-Memory Systems

In-memory database systems are becoming popular
due to the availability and affordability of sufficiently large RAM and
processors in modern high-end servers with the capacity to manage
large in-memory database transactions. While fast and reliable inmemory
systems are still being developed to overcome cache misses,
CPU/IO bottlenecks and distributed transaction costs, disk-based data
stores still serve as the primary persistence. In addition, with the
recent growth in multi-tenancy cloud applications and associated
security concerns, many organisations consider the trade-offs and
continue to require fast and reliable transaction processing of diskbased
database systems as an available choice. For these
organizations, the only way of increasing throughput is by improving
the performance of disk-based concurrency control. This warrants a
hybrid database system with the ability to selectively apply an
enhanced disk-based data management within the context of inmemory
systems that would help improve overall throughput.
The general view is that in-memory systems substantially
outperform disk-based systems. We question this assumption and
examine how a modified variation of access invariance that we call
enhanced memory access, (EMA) can be used to allow very high
levels of concurrency in the pre-fetching of data in disk-based
systems. We demonstrate how this prefetching in disk-based systems
can yield close to in-memory performance, which paves the way for
improved hybrid database systems. This paper proposes a novel EMA
technique and presents a comparative study between disk-based EMA
systems and in-memory systems running on hardware configurations
of equivalent power in terms of the number of processors and their
speeds. The results of the experiments conducted clearly substantiate
that when used in conjunction with all concurrency control
mechanisms, EMA can increase the throughput of disk-based systems
to levels quite close to those achieved by in-memory system. The
promising results of this work show that enhanced disk-based
systems facilitate in improving hybrid data management within the
broader context of in-memory systems.





References:
[1] C. Balkesen, J. Teubner, G. Alonso, and M. T. Özsu. "Main-memory
hash joins on multi-core CPUs: Tuning to the underlying hardware". In
Proceedings of the International Conference on Data Engineering
(ICDE), 2013, pp. 362–373.
[2] P. Larson, S. Blanas, C. Diaconu, C. Freedman, J. Patel, and M.
Zwilling. “High-performance concurrency control mechanisms for mainmemory
database”. PVLDB, 5(4):298–309, 2011.
[3] F. Färber, S. K. Cha, J. Primsch, C. Bornhövd, S. Sigg, and W. Lehner.
"Saphana database: data management for modern business applications".
SIGMOD Rec., vol. 40, no. 4, pp.:45–51, Jan. 2012.
[4] P. Bailis, A. Fekete, A. Ghodsi, J. M. Hellerstein, and I. Stoica.
"Scalable atomic visibility with RAMP transactions". In ACM SIGMOD
Conference, 2014.
[5] D.Jacobs and S. Aulbach. "Ruminations on Multi-Tenant Databases". In
Proc. BTW, pp. 514–521, 2007.
[6] V., Ramanathan, S. Venkatraman, and S.R. Asaithambi, "A practical
cloud services implementation framework for e-businesses”, Book
Chapter In Tarnay, K., Xu, L and Imre, S. (Ed.), Research and
Development in E-Business through Service-Oriented Solutions, IGI
Global Publishers, USA, 2013.
[7] B.Mozafari, C. Curino, and S. Madden, “Resource and performance
prediction for building a next generation database cloud”. CIDR, 2013.
[8] S. Kaspi, and S. Venkatraman, "Performance Analysis Of Concurrency
Control Mechanisms For OLTP Databases". International Journal of
Information and Education Technology, 4, 4, pp. 313-318, August 2014.
[9] H. Plattner. A common database approach for OLTP and OLAP using an
in-memory column database. In SIGMOD Conference, 2009.
[10] J. Baker, C. Bond, J. Corbett, J. Furman, A. Khorlin, J. Larson, J.-M.
L´eon, Y. Li, A. Lloyd, and V. Yushprakh. Megastore: "Providing
scalable, highly available storage for interactive services". In Proc. Conf.
on Innovative Data Systems Research (CIDR), 2011.
[11] I. Petrov, D. Bausch, R. Gottstein, and A. Buchmann, “Data-intensive
systems on evolving memory hierarchies,” in Proc. of Workshop
Entwicklung energiebewusster Software (EEbS 2012), 42. GI
Jahrestagung, 2012.
[12] S. Das, S. Nishimura, D. Agrawal, and A. El Abbadi. "Albatross:
lightweight elasticity in shared storage databases for the cloud using live
data migration". Proc. VLDB Endow. (PVLDB), vol. 4, no. 8, 2011.
[13] P. Franaszek, J.T Robinson,.and A., Thomasian, “Access Invariance and
Its Use in High-Contention Environments”, Proceedings of the 6th
International Data Engineering Conference, Los Angeles, Feb 1990, pp
47 - 55.
[14] P. Franaszek, J.T. Robinson, and A., Thomasian, “Concurrency Control
for High Contention Environments”, ACM TODS, Vol.17, No.2, June
1992, pp 304 - 345
[15] G. Graefe. "Modern B-Tree Techniques". Foundations and Trends in
Databases, vol. 3, no. 4, pp. 203–402, 2011.
[16] J. Krueger, C. Tinnefeld, M. Grund, A. Zeier, and H. Plattner. "A case
for online mixed workload processing". In Third International
Workshop on Testing Database Systems, 2010.
[17] J. J. Levandoski, P.-A. Larson, and R. Stoica." Identifying hot and cold
data in main-memory databases". In ICDE, 2013.
[18] S. Idreos, F. Groffen, N. Nes, S. Manegold, S.Mullender, and M. L.
Kersten. "MonetDB: Two Decades of Research in Column-oriented
Database Architectures. IEEE Data Eng. Bull., vol. 35, no. 1, pp. 40–45,
2012.
[19] R. Agrawal, M. J. Carey and M. Livny. “Concurrency control
performance modeling: Alternatives and implications”. ACM
Transactions on Database Systems, 12(14): 609–654, 1987.
[20] P. A. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency Control
and Recovery in Database Systems. Addison-Wesley, 1987.
[21] P. A. Bernstein and N. Goodman. “Concurrency control in distributed
database systems”. ACM Computing Survey, 13(2):185–221, 1981.
[22] D. Agrawal and S. Sengupta. “Modular synchronization in distributed,
multiversion databases: Version control and concurrency control”. IEEE
TKDE, 5, 1993.
[23] S., Kaspi, “Optimizing Transaction throughput in databases via an
intelligent scheduler”, Proceedings of the 1997 IEEE International
Conference on Intelligent Processing Systems, Beijing, October, 1337 –
1341, 1997.
[24] C.H.C. Leung, and S. Kaspi, “A flexible paradigm for semantic
integration in cooperative heterogeneous databases” Proceedings of
FGCS '94, ICOT, Tokyo, December 1994.
[25] A., Thomasian, “A performance Comparison of locking methods with
limited wait depth”, IEEE Transactions on Knowledge and Data
Engineering, 9(3):421-434, 1997.