A Study of Recent Contribution on Simulation Tools for Network-on-Chip

The growth in the number of Intellectual Properties (IPs) or the number of cores on the same chip becomes a critical issue in System-on-Chip (SoC) due to the intra-communication problem between the chip elements. As a result, Network-on-Chip (NoC) has emerged as a system architecture to overcome intra-communication issues. This paper presents a study of recent contributions on simulation tools for NoC. Furthermore, an overview of NoC is covered as well as a comparison between some NoC simulators to help facilitate research in on-chip communication.

A High Level Implementation of a High Performance Data Transfer Interface for NoC

The distribution of a single global clock across a chip has become the major design bottleneck for high performance VLSI systems owing to the power dissipation, process variability and multicycle cross-chip signaling. A Network-on-Chip (NoC) architecture partitioned into several synchronous blocks has become a promising approach for attaining fine-grain power management at the system level. In a NoC architecture the communication between the blocks is handled asynchronously. To interface these blocks on a chip operating at different frequencies, an asynchronous FIFO interface is inevitable. However, these asynchronous FIFOs are not required if adjacent blocks belong to the same clock domain. In this paper, we have designed and analyzed a 16-bit asynchronous micropipelined FIFO of depth four, with the awareness of place and route on an FPGA device. We have used a commercially available Spartan 3 device and designed a high speed implementation of the asynchronous 4-phase micropipeline. The asynchronous FIFO implemented on the FPGA device shows 76 Mb/s throughput and a handshake cycle of 109 ns for write and 101.3 ns for read at the simulation under the worst case operating conditions (voltage = 0.95V) on a working chip at the room temperature.

Heuristic for Accelerating Run-Time Task Mapping in NoC-Based Heterogeneous MPSoCs

In this paper, we propose a new packing strategy to find a free resource for run-time mapping of application tasks to NoC-based Heterogeneous MPSoC. The proposed strategy minimizes the task mapping time in addition to placing the communicating tasks close to each other. To evaluate our approach, a comparative study is carried out for a platform containing single task supported PEs. Experiments show that our strategy provides better results when compared to latest dynamic mapping strategies reported in the literature.

FPGA Hardware Implementation and Evaluation of a Micro-Network Architecture for Multi-Core Systems

This paper presents the design, implementation and evaluation of a micro-network, or Network-on-Chip (NoC), based on a generic pipeline router architecture. The router is designed to efficiently support traffic generated by multimedia applications on embedded multi-core systems. It employs a simplest routing mechanism and implements the round-robin scheduling strategy to resolve output port contentions and minimize latency. A virtual channel flow control is applied to avoid the head-of-line blocking problem and enhance performance in the NoC. The hardware design of the router architecture has been implemented at the register transfer level; its functionality is evaluated in the case of the two dimensional Mesh/Torus topology, and performance results are derived from ModelSim simulator and Xilinx ISE 9.2i synthesis tool. An example of a multi-core image processing system utilizing the NoC structure has been implemented and validated to demonstrate the capability of the proposed micro-network architecture. To reduce complexity of the image compression and decompression architecture, the system use image processing algorithm based on classical discrete cosine transform with an efficient zonal processing approach. The experimental results have confirmed that both the proposed image compression scheme and NoC architecture can achieve a reasonable image quality with lower processing time.

An Address-Oriented Transmit Mechanism for GALS NoC

Since Network-on-Chip (NoC) uses network interfaces (NIs) to improve the design productivity, by now, there have been a few papers addressing the design and implementation of a NI module. However, none of them considered the difference of address encoding methods between NoC and the traditional bus-shared architecture. On the basis of this difference, in the paper, we introduce a transmit mechanism to solve such a problem for global asynchronous locally synchronous (GALS) NoC. Furthermore, we give the concrete implementation of the NI module in this transmit mechanism. Finally, we evaluate its performance and area overhead by a VHDL-based cycle-accurate RTL model and simulation results confirm the validity of this address-oriented transmit mechanism.

MinRoot and CMesh: Interconnection Architectures for Network-on-Chip Systems

The success of an electronic system in a System-on- Chip is highly dependent on the efficiency of its interconnection network, which is constructed from routers and channels (the routers move data across the channels between nodes). Since neither classical bus based nor point to point architectures can provide scalable solutions and satisfy the tight power and performance requirements of future applications, the Network-on-Chip (NoC) approach has recently been proposed as a promising solution. Indeed, in contrast to the traditional solutions, the NoC approach can provide large bandwidth with moderate area overhead. The selected topology of the components interconnects plays prime rule in the performance of NoC architecture as well as routing and switching techniques that can be used. In this paper, we present two generic NoC architectures that can be customized to the specific communication needs of an application in order to reduce the area with minimal degradation of the latency of the system. An experimental study is performed to compare these structures with basic NoC topologies represented by 2D mesh, Butterfly-Fat Tree (BFT) and SPIN. It is shown that Cluster mesh (CMesh) and MinRoot schemes achieves significant improvements in network latency and energy consumption with only negligible area overhead and complexity over existing architectures. In fact, in the case of basic NoC topologies, CMesh and MinRoot schemes provides substantial savings in area as well, because they requires fewer routers. The simulation results show that CMesh and MinRoot networks outperforms MESH, BFT and SPIN in main performance metrics.

Flexible Wormhole-Switched Network-on-chip with Two-Level Priority Data Delivery Service

A synchronous network-on-chip using wormhole packet switching and supporting guaranteed-completion best-effort with low-priority (LP) and high-priority (HP) wormhole packet delivery service is presented in this paper. Both our proposed LP and HP message services deliver a good quality of service in term of lossless packet completion and in-order message data delivery. However, the LP message service does not guarantee minimal completion bound. The HP packets will absolutely use 100% bandwidth of their reserved links if the HP packets are injected from the source node with maximum injection. Hence, the service are suitable for small size messages (less than hundred bytes). Otherwise the other HP and LP messages, which require also the links, will experience relatively high latency depending on the size of the HP message. The LP packets are routed using a minimal adaptive routing, while the HP packets are routed using a non-minimal adaptive routing algorithm. Therefore, an additional 3-bit field, identifying the packet type, is introduced in their packet headers to classify and to determine the type of service committed to the packet. Our NoC prototypes have been also synthesized using a 180-nm CMOS standard-cell technology to evaluate the cost of implementing the combination of both services.

Pipelined Control-Path Effects on Area and Performance of a Wormhole-Switched Network-on-Chip

This paper presents design trade-off and performance impacts of the amount of pipeline phase of control path signals in a wormhole-switched network-on-chip (NoC). The numbers of the pipeline phase of the control path vary between two- and one-cycle pipeline phase. The control paths consist of the routing request paths for output selection and the arbitration paths for input selection. Data communications between on-chip routers are implemented synchronously and for quality of service, the inter-router data transports are controlled by using a link-level congestion control to avoid lose of data because of an overflow. The trade-off between the area (logic cell area) and the performance (bandwidth gain) of two proposed NoC router microarchitectures are presented in this paper. The performance evaluation is made by using a traffic scenario with different number of workloads under 2D mesh NoC topology using a static routing algorithm. By using a 130-nm CMOS standard-cell technology, our NoC routers can be clocked at 1 GHz, resulting in a high speed network link and high router bandwidth capacity of about 320 Gbit/s. Based on our experiments, the amount of control path pipeline stages gives more significant impact on the NoC performance than the impact on the logic area of the NoC router.

3D Network-on-Chip with on-Chip DRAM: An Empirical Analysis for Future Chip Multiprocessor

With the increasing number of on-chip components and the critical requirement for processing power, Chip Multiprocessor (CMP) has gained wide acceptance in both academia and industry during the last decade. However, the conventional bus-based onchip communication schemes suffer from very high communication delay and low scalability in large scale systems. Network-on-Chip (NoC) has been proposed to solve the bottleneck of parallel onchip communications by applying different network topologies which separate the communication phase from the computation phase. Observing that the memory bandwidth of the communication between on-chip components and off-chip memory has become a critical problem even in NoC based systems, in this paper, we propose a novel 3D NoC with on-chip Dynamic Random Access Memory (DRAM) in which different layers are dedicated to different functionalities such as processors, cache or memory. Results show that, by using our proposed architecture, average link utilization has reduced by 10.25% for SPLASH-2 workloads. Our proposed design costs 1.12% less execution cycles than the traditional design on average.

Speedup of Data Vortex Network Architecture

In this paper, 3X3 routing nodes are proposed to provide speedup and parallel processing capability in Data Vortex network architectures. The new design not only significantly improves network throughput and latency, but also eliminates the need for distributive traffic control mechanism originally embedded among nodes and the need for nodal buffering. The cost effectiveness is studied by a comparison study with the previously proposed 2- input buffered networks, and considerable performance enhancement can be achieved with similar or lower cost of hardware. Unlike previous implementation, the network leaves small probability of contention, therefore, the packet drop rate must be kept low for such implementation to be feasible and attractive, and it can be achieved with proper choice of operation conditions.