3D Network-on-Chip with on-Chip DRAM: An Empirical Analysis for Future Chip Multiprocessor

With the increasing number of on-chip components and the critical requirement for processing power, Chip Multiprocessor (CMP) has gained wide acceptance in both academia and industry during the last decade. However, the conventional bus-based onchip communication schemes suffer from very high communication delay and low scalability in large scale systems. Network-on-Chip (NoC) has been proposed to solve the bottleneck of parallel onchip communications by applying different network topologies which separate the communication phase from the computation phase. Observing that the memory bandwidth of the communication between on-chip components and off-chip memory has become a critical problem even in NoC based systems, in this paper, we propose a novel 3D NoC with on-chip Dynamic Random Access Memory (DRAM) in which different layers are dedicated to different functionalities such as processors, cache or memory. Results show that, by using our proposed architecture, average link utilization has reduced by 10.25% for SPLASH-2 workloads. Our proposed design costs 1.12% less execution cycles than the traditional design on average.





References:
[1] AMD, "The amd opteron 6000 series platform," May 2010,
http://www.amd.com/us/products/server/processors/6000-seriesplatform/
pages/6000-series-platform.aspx.
[2] L. Benini and G. D. Micheli, "Networks on chips: A new soc paradigm,"
IEEE Computer, vol. 35, no. 1, pp. 70-78, January 2002.
[3] S. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz,
D. Finan, P. Iyer, A. Singh, T. Jacob, S. Jain, S. Venkataraman,
Y. Hoskote, and N. Borkar, "An 80-tile 1.28tflops network-on-chip in
65nm cmos," in Solid-State Circuits Conference, 2007. ISSCC 2007.
Digest of Technical Papers. IEEE International, Feb. 2007, pp. 98-589.
[4] Intel, "Single-chip cloud computer," May 2010,
http://techresearch.intel.com/articles/Tera-Scale/1826.htm.
[5] ÔÇöÔÇö, "Intel core i7-980x processor extreme edition," May 2010,
http://ark.intel.com/Product.aspx?id=47932.
[6] S. I. Association, "The international technology
roadmap for semiconductors (itrs)," 2007,
http://www.itrs.net/Links/2007ITRS/Home2007.htm.
[7] B. M. Rogers, A. Krishna, G. B. Bell, K. Vu, X. Jiang, and Y. Solihin,
"Scaling the bandwidth wall: challenges in and avenues for cmp scaling,"
in Proceedings of the 36th annual international symposium on Computer
architecture, June 2009, pp. 371-382.
[8] A. Weldezion, Z. Lu, R. Weerasekera, and H. Tenhunen, "3-d memory
organization and performance analysis for multi-processor network-onchip
architecture," in 3D System Integration, 2009. 3DIC 2009. IEEE
International Conference on, 28-30 2009, pp. 1 -7.
[9] G. H. Loh, "3d-stacked memory architectures for multi-core processors,"
in ISCA -08: Proceedings of the 35th Annual International Symposium
on Computer Architecture. Washington, DC, USA: IEEE Computer
Society, 2008, pp. 453-464.
[10] D. Sylvester and K. Keutzer, "Getting to the bottom of deep submicron,"
in Computer-Aided Design, 1998. ICCAD 98. Digest of Technical
Papers. 1998 IEEE/ACM International Conference on, Nov 1998, pp.
203-211.
[11] T. C. Xu, A. W. Yin, P. Liljeberg, and H. Tenhunen, "A study of 3d
network-on-chip design for data parallel h.264 coding," in Proceedings
of the 27th Norchip Conference, November 2009.
[12] G. L. Loi, B. Agrawal, N. Srivastava, S.-C. Lin, T. Sherwood, and
K. Banerjee, "A thermally-aware performance analysis of vertically
integrated (3-d) processor-memory hierarchy," in DAC -06: Proceedings
of the 43rd annual Design Automation Conference. New York, NY,
USA: ACM, 2006, pp. 991-996.
[13] M. Tremblay and S. Chaudhry, "A third-generation 65nm 16-core
32-thread plus 32-scout-thread cmt sparc processor," in ISSCC 2008,
February 2008, pp. 82-83.
[14] IBM, "Ibm power 7 processor," in Hot chips 2009, August 2009.
[15] T. Shyamkumar, M. Naveen, A. J. Ho, and J. N. P., "Cacti 5.1," HP
Labs, Tech. Rep. HPL-2008-20.
[16] U. of Catania, "Noxim, an open network-on-chip simulator,"
http://noxim.sourceforge.net.
[17] S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, "The splash-
2 programs: Characterization and methodological considerations," in
Proceedings of the 22nd International Symposium on Computer Architecture,
June 1995, pp. 24-36.
[18] P. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg,
J. Hogberg, F. Larsson, A. Moestedt, and B. Werner, "Simics: A full
system simulation platform," Computer, vol. 35, no. 2, pp. 50-58,
February 2002.
[19] Intel, "Intel core i7 processor extreme edition and intel
core i7 processor datasheet, volume 1," December 2008,
http://download.intel.com/design/processor/datashts/320834.pdf.
[20] D. Wentzlaff, P. Griffin, H. Hoffmann, L. Bao, B. Edwards, C. Ramey,
M. Mattina, C.-C. Miao, J. Brown, and A. Agarwal, "On-chip interconnection
architecture of the tile processor," Micro, IEEE, vol. 27, no. 5,
pp. 15 -31, sept.-oct. 2007.
[21] T. C. Xu, P. Liljeberg, and H. Tenhunen, "A study of through silicon
via impact to 3d network-on-chip design," in Proceedings of the 2010
International Conference on Electronics and Information Engineering
(ICEIE 2010), August 2010.
[22] H. Global, "Ddr 2 memory controller ip core for fpga and asic," June
2010, http://www.hitechglobal.com/ipcores/ddr2controller.htm.
[23] H. Sullivan and T. R. Bashkow, "A large scale, homogeneous, fully distributed
parallel machine," in Proceedings of the 4th annual symposium
on Computer architecture, March 1977, pp. 105-117.
[24] C. Kim, D. Burger, and S. W. Keckler, "An adaptive, non-uniform cache
structure for wire-delay dominated on-chip caches," in ACM SIGPLAN,
October 2002, pp. 211-222.
[25] A. Patel and K. Ghose, "Energy-efficient mesi cache coherence with
pro-active snoop filtering for multicore microprocessors," in Proceeding
of the thirteenth international symposium on Low power electronics and
design, August 2008, pp. 247-252.
[26] H.-S. Wang, X. Zhu, L.-S. Peh, and S. Malik, "Orion: a powerperformance
simulator for interconnection networks," in Proceedings of
the 35th Annual IEEE/ACM International Symposium on Microarchitecture,
November 2002, pp. 294-305.