Data Placement in Heterogeneous Storage of Short Videos

The overall service performance of I/O intensive system depends mainly on workload on its storage system. In heterogeneous storage environment where storage elements from different vendors with different capacity and performance are put together, workload should be distributed according to storage capability. This paper addresses data placement issue in short video sharing website. Workload contributed by a video is estimated by the number of views and life time span of existing videos in same category. Experiment was conducted on 42,000 video titles in six weeks. Result showed that the proposed algorithm distributed workload and maintained balance better than round robin and random algorithms.




References:
[1] "YouTube", http://www.youtube.com/
[2] R.J. Honicky and E.L. Miller. Replication under scalable hashing: a
family of algorithms for scalable decentralized data distribution. In
Proceedings of 18th International Parallel and Distributed Processing
Symposium (IPDPS -04), Santa Fe, NM, Apr. 2004
[3] L.W. Lee, P. Scheuermann and R. Vingralek. File assignment in parallel
I/O systems with minimal variance of service time. IEEE Trans. on
Computers, Vol 49, No.2. (2000) 127-140
[4] P. Scheuermann, G. Weikum and P. Zabback. Data partitioning and load
balancing in parallel disk systems. The VLDB Journal - The
International Journal on Very Large Data Bases, Vol. 7, No. 1. (1998)
48-66
[5] D. Feng and L. Qin. Adaptive Object Placement in Object-Based Storage
Systems with Minimal Blocking Probability. In Proceedings of the 20th
International Conference on Advanced Information Networking and
Applications (AINA ÔÇÿ06), Vienna, Austria, Apr. 2006
[6] X. Cheng, C. Dale and J. Liu. Statistics and social network of youtube
videos. In Proceedings of the 16th IEEE International Workshop on
Quality of Service (IWQoS -08), Enschede, Netherlands, Jun. 2008
[7] M. Factor, K. Meth, D. Naor, O. Rodeh and J. Satran. Object Storage:
The Future Building Block for Storage Systems. In Proceedings of the
Second International IEEE Symposium on Emergence of Globally
Distributed Data, Sardinia, Italy, Jun. 2005
[8] K. S. Tang, K. T. KO, S. Chan and E. Wong, "Optimal file placement in
VOD system using genetic algorithm", IEEE Industrial Electronics, Vol
48, No.5. (2001) 891-897
[9] W.K.S. Tang, E.W.M. Wong, S. Chan and K.-T. Ko, "Optimal video
placement scheme for batching VOD services", IEEE Trans. on
Broadcasting, Vol 50, No.1. (2004) 16-25
[10] A. Goel, C. Shahabi, D. S. Yao and R. Zimmermann. SCADDAR: An
Efficient Randomized Technique to Reorganize Continuous Media
Blocks. In Proceedings of The 18th IEEE International Conference on
Data Engineering (ICDE -02), San Jose, CA, Feb. 2002
[11] S. A. Weil, S. A. Brandt, E. L. Miller and C. Maltzahn. CRUSH:
Controlled, Scalable, Decentralized Placement of Replicated Data. In
Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (SC
-06), Tampa, FL, Nov. 2006.
[12] "Alexa Top 500 Global Sites", http://www.alexa.com/topsites.
[13] "YouTube Metadata",
http://code.google.com/apis/youtube/2.0/reference.html.