A PIM (Processor-In-Memory) for Computer Graphics : Data Partitioning and Placement Schemes

The demand for higher performance graphics continues to grow because of the incessant desire towards realism. And, rapid advances in fabrication technology have enabled us to build several processor cores on a single die. Hence, it is important to develop single chip parallel architectures for such data-intensive applications. In this paper, we propose an efficient PIM architectures tailored for computer graphics which requires a large number of memory accesses. We then address the two important tasks necessary for maximally exploiting the parallelism provided by the architecture, namely, partitioning and placement of graphic data, which affect respectively load balances and communication costs. Under the constraints of uniform partitioning, we develop approaches for optimal partitioning and placement, which significantly reduce search space. We also present heuristics for identifying near-optimal placement, since the search space for placement is impractically large despite our optimization. We then demonstrate the effectiveness of our partitioning and placement approaches via analysis of example scenes; simulation results show considerable search space reductions, and our heuristics for placement performs close to optimal – the average ratio of communication overheads between our heuristics and the optimal was 1.05. Our uniform partitioning showed average load-balance ratio of 1.47 for geometry processing and 1.44 for rasterization, which is reasonable.




References:
[1] International Technology Roadmap for Semiconductors , www.itrs.net/
[2] Keith Diefendorff, et al., How Multimedia Workloads Will Change
Processor Design, IEEE Computer, p.43-45, 1997.
[3] D. Burger, et al., Memory Bandwidth Limitations of Future
Microprocessors, In Proceedings of the 23rd Inter-national Symposium
on Computer Architecture, p.78-89, 1996.
[4] Patterson D, et al., A Case for Intelligent DRAM: IRAM, IEEE Micro,
1997.
[5] Mark Oskin, et al., Active Pages: A Computation Model for Intelligent
Memory, In Proceedings of the 23rd. Inter-national Symposium on.
Computer Architecture, p.192-203, 1998.
[6] Yi Kang, et al., FlexRAM: Toward an Advanced Intelligent Memory
System, In proceedings of 1999 IEEE International Conference on
Computer Design, p.192, 1999.
[7] Jung-Yup Kang, et al., An Efficient PIM (Processor-In-Memory)
Architecture for Motion Estimation. In proceedings of the 14th IEEE
International Conference on Application-Specific Systems, Architectures,
and Processors, p.282-292, 2003.
[8] Jung-Yup Kang, et al., Accelerating the Kernels of BLAST with an
Efficient PIM (Processor-In-Memory) Architecture, In proceedings of the
3rd International IEEE Computer Society Computational Systems
Bioinformatics Conference, p.552-553, 2004.
[9] John Montrym, et al., The GeForce 6800, IEEE Micro, p.41-51, 2005.
[10] Emmett Kilgariff, et al., The GeForce 6 Series GPU Architecture,
download.nvidia.com/ developer/GPU_Gems_2/GPU_Gems2_ch30.pdf
[11] Molner, et. al., A sorting classification of parallel rendering, Computer
Graphics and Application, IEEE, p.23-32, 1994.
[12] S. Whitman, Dynamic load balancing for parallel polygon rendering,
IEEE Computer Graphics and Applications, p.41-48, 1994.
[13] S. Whitman, Parallel Graphics Rendering Algorithms, In Proceedings of
3rd Eurographics Workshop on Rendering, Consolidation Express,
Bristol, UK, p.123-134, 1992.
[14] Tahsin M. Kurc, et al., Object-Space Parallel Polygon Rendering on
Hypercubes, Compu-ters & Graphics , p.487-503, 1998.
[15] B. Wei, et al., Performance Issues of a Distributed Frame Buffer on a
Multicomputer. In Proceedings of the ACM
SIGGRAPH/EUROGRAPHICS workshop on Graphics Hardware, p.87
-96, 1998.
[16] Vineet Kumar. A Host Interface Architecture for HIPPI. In Proceedings
of Scalable High Performance Computing Conference, p.142-149, 1994.
[17] Jae C. Cha, et al., Technical Report CENG-2007-6.
[18] Akeley, Kurt. RealityEngine Graphics. In Proceedings of
SIGGRAPH -93, New York, p.109-116, 1993.
[19] Thomas W. Crockett, et al., Rendering Algorithm for MIMD
Architectures, In Proceedings of the 1993 Parallel Rendering Symposium,
p.35-42,1993.
[20] Deering, et al., A System for Cost Effective 3D Shaded Graphics. In
Proceedings of SIGGRAPH -93, p.101-108, 1993.
[21] Ellsworth, et al.,. A New Algorithm for Interactive Graphics on
Multicomputers. IEEE Computer Graphics & Applications, p.33-40,
1994.
[22] Fuchs, Henry, et al., Pixel-Planes 5: A Heterogeneous Multiprocessor
Graphics System Using Processor-Enhanced Memories. In Proceedings
of SIGGRAPH -89, p.79-88, 1993.
[23] J. D. Foley, et al., Computer Graphics, Principles and Practice. Addison-
Wesley, 2nd edition, 1996.
[24] Francis S Hill Jr., et al., Computer Graphics Using OpenGL, Prentice Hall,
3rd edition, 2006.
[25] Tomas Akenine-Moller, et al., Real-Time Rendering, 2nd edition, A.K.
Peters Ltd, 2002.
[26] Thomas W. Crockett, An Introduction to Parallel Rendering, Parallel
Computing, p.819-843, 1997.
[27] D.R. Roble, A Load Balanced Parallel Scanline Z-Buffer Algorithm for
the iPSC Hypercube, In Proceedings of the 1st International Conference
PIXIM 88, p.177-192, 1998.
[28] D.S. Whelan, Animac: A Multiprocessor Architecture for Real time
Computer Animation, Ph.D. dissertation, California Institute of
Technology, Pasadena, CA, 1985.
[29] Carl Mueller, Hierarchical Graphics Databases in Sort-First, In
Proceedings of the IEEE Symposium on Parallel Rendering, p.49-57,
1997.
[30] David Ellsworth, A Multicomputer Polygon Rendering Algorithm for
Interactive Applications, In Proceedings of the 1993 Parallel Rendering
Symposium, p.43-48, 1993.
[31] Carl Mueller, The sort-first rendering architecture for high-performance
graphics, In Proceedings of the 1995 symposium on Interactive 3D
graphics, p.75-ff., Monterey, 1995.
[32] The Cg Tutorial: The Definitive Guide to Programmable Real-Time
Graphics, NVDIA, http://developer.nvidia.com/CgTutorial.
[33] Dirk Bartz, Rendering and Visualization in Parallel Environments, In
SIGGRAPH 2000 Course.
[34] Frederico Abraham et al., A Load-Balancing Strategy for Sort-First
Distributed Rendering, In Proceedings of SIGGRAPH -04, p.292-299,
2004.
[35] Wulf, Wm.A and McKee, S.A. Hitting the Memory Wall: Implications of
the Obvious. ACM Computer Architecture News. Vol.23, No.1, 1995.
[36] http://www.nvidia.com/page/8800_tech_specs.html
[37] http://www.xbox.com/en-AU/support/xbox360/manuals/xbox360specs.h
tm
[38] http://techreport.com/articles.x/10039/1