Approximate Range-Sum Queries over Data Cubes Using Cosine Transform

In this research, we propose to use the discrete cosine transform to approximate the cumulative distributions of data cube cells- values. The cosine transform is known to have a good energy compaction property and thus can approximate data distribution functions easily with small number of coefficients. The derived estimator is accurate and easy to update. We perform experiments to compare its performance with a well-known technique - the (Haar) wavelet. The experimental results show that the cosine transform performs much better than the wavelet in estimation accuracy, speed, space efficiency, and update easiness.


Keywords:


References:
[1] W. Acharya and P. Gibbons and V. Poosala, Aqua: A Fast Decision
Support System Using Approximate Query Answers, 1999, Proc. 25th
VLDB Conference.
[2] D. Barbara and M. Sullivan, Quasi-cubes: Exploiting approximation in
multi-dimensional databases, 1997, SIGMOD Record, 26, 12-17.
[3] C. Chan and Y. Ioannidis, Hierarchical cubes for range-sum queries,
1999, Proc. VLDB, 675-686.
[4] S. Geffner and D. Agrawal and A. Abbadi and T. Smith, Relative prefix
sums: an efficient approach for querying dynamic OLAP Data Cubes,
1999, Proc. ICDE, 328-335.
[5] S. Geffner and D. Agrawal and A. Abbadi, The dynamic data cubes,
2000, Proceeding of International Conference on Extending Database
Technology (EDBT), 237-253.
[6] J. Gray and A. Bosworth and A. Layman and H. Pirahesh, Data cube:
A relational aggregation operator generalizing group-by, cross-tab, and
sub-totals, 1996, Proc. ICDE Conference.
[7] C. Ho and R. Agrawal and N. Megiddo and R. Srikant, Range queries
in OLAP data cubes, 1997, Proc. ACM SIGMOD Conference, 73-88.
[8] L. Lakshmanan and J. Pei and J. Han, Quotient cube: How to summarize
the semantics of a data cube, 2002, Proc. 28th VLDB Conference, 528-
539.
[9] J. Lee and D. Kim and C. Chung, Multi-dimensional Selectivity Estimation
Using Compressed Histogram Information, 1999, ACM SIGMOD,
205-214.
[10] S. Li and S. Wang, Semi-closed cube: An effective approach to trading
off data cube size and query response time, Journal of Computer Science
and Technology, 20(3), 367-372.
[11] Y. Matias and J. Vitter and M. Wang, Wavelet-based histograms for
selectivity estimation, 1998, ACM SIGMOD Conference, 448-459.
[12] Y. Matias and J. Vitter and M. Wang, Dynamic Maintenance of Wavelet-
Based Histograms, 2000, Proc 26th VLDB Conference, 101-110.
[13] Y. Nievergelt, Wavelets Made Easy, 1999, Birkhauser.
[14] Y. Sismanis and N. Roussoupoulos and A. Deligiannakis and Y. Kotidis,
Dwarf: Shrinking the petacube, 2002, Proc. ACM SIGMOD Conference,
464-475.
[15] J. Vitter and M. Wang and B. Lyer, Data cube approximation and
histograms via wavelets, 1998, Proc. CIKM, 96-104.
[16] W. Wang and J. L. Feng, Condensed cube: An effective approach to
reducing data cube size, 2002, Proceedings of the 18th International
Conference on Data Engineering.
[17] G. Zipf, Human behavior and the principle of least effort, 1949, Addison-
Wesley.
[18] TPC, TPC benchmark D, decision support, 1995.
[19] BC, http://www.bls.census.gov/sipp/ ftp.html#sipp04, 2004.