Abstract: The growth in the number of Intellectual Properties (IPs) or the number of cores on the same chip becomes a critical issue in System-on-Chip (SoC) due to the intra-communication problem between the chip elements. As a result, Network-on-Chip (NoC) has emerged as a system architecture to overcome intra-communication issues. This paper presents a study of recent contributions on simulation tools for NoC. Furthermore, an overview of NoC is covered as well as a comparison between some NoC simulators to help facilitate research in on-chip communication.
Abstract: The distribution of a single global clock across a chip
has become the major design bottleneck for high performance VLSI
systems owing to the power dissipation, process variability and multicycle
cross-chip signaling. A Network-on-Chip (NoC) architecture
partitioned into several synchronous blocks has become a promising
approach for attaining fine-grain power management at the system
level. In a NoC architecture the communication between the blocks is
handled asynchronously. To interface these blocks on a chip
operating at different frequencies, an asynchronous FIFO interface is
inevitable. However, these asynchronous FIFOs are not required if
adjacent blocks belong to the same clock domain. In this paper, we
have designed and analyzed a 16-bit asynchronous micropipelined
FIFO of depth four, with the awareness of place and route on an
FPGA device. We have used a commercially available Spartan 3
device and designed a high speed implementation of the
asynchronous 4-phase micropipeline. The asynchronous FIFO
implemented on the FPGA device shows 76 Mb/s throughput and a
handshake cycle of 109 ns for write and 101.3 ns for read at the
simulation under the worst case operating conditions (voltage =
0.95V) on a working chip at the room temperature.
Abstract: In this paper, we propose a new packing strategy to
find a free resource for run-time mapping of application tasks to
NoC-based Heterogeneous MPSoC. The proposed strategy minimizes
the task mapping time in addition to placing the communicating tasks
close to each other. To evaluate our approach, a comparative study is
carried out for a platform containing single task supported PEs.
Experiments show that our strategy provides better results when
compared to latest dynamic mapping strategies reported in the
literature.
Abstract: This paper presents the design, implementation and evaluation of a micro-network, or Network-on-Chip (NoC), based on a generic pipeline router architecture. The router is designed to efficiently support traffic generated by multimedia applications on embedded multi-core systems. It employs a simplest routing mechanism and implements the round-robin scheduling strategy to resolve output port contentions and minimize latency. A virtual channel flow control is applied to avoid the head-of-line blocking problem and enhance performance in the NoC. The hardware design of the router architecture has been implemented at the register transfer level; its functionality is evaluated in the case of the two dimensional Mesh/Torus topology, and performance results are derived from ModelSim simulator and Xilinx ISE 9.2i synthesis tool. An example of a multi-core image processing system utilizing the NoC structure has been implemented and validated to demonstrate the capability of the proposed micro-network architecture. To reduce complexity of the image compression and decompression architecture, the system use image processing algorithm based on classical discrete cosine transform with an efficient zonal processing approach. The experimental results have confirmed that both the proposed image compression scheme and NoC architecture can achieve a reasonable image quality with lower processing time.
Abstract: Since Network-on-Chip (NoC) uses network
interfaces (NIs) to improve the design productivity, by now, there
have been a few papers addressing the design and implementation of a
NI module. However, none of them considered the difference of
address encoding methods between NoC and the traditional
bus-shared architecture. On the basis of this difference, in the paper,
we introduce a transmit mechanism to solve such a problem for global
asynchronous locally synchronous (GALS) NoC. Furthermore, we
give the concrete implementation of the NI module in this transmit
mechanism. Finally, we evaluate its performance and area overhead
by a VHDL-based cycle-accurate RTL model and simulation results
confirm the validity of this address-oriented transmit mechanism.
Abstract: The success of an electronic system in a System-on- Chip is highly dependent on the efficiency of its interconnection network, which is constructed from routers and channels (the routers move data across the channels between nodes). Since neither classical bus based nor point to point architectures can provide scalable solutions and satisfy the tight power and performance requirements of future applications, the Network-on-Chip (NoC) approach has recently been proposed as a promising solution. Indeed, in contrast to the traditional solutions, the NoC approach can provide large bandwidth with moderate area overhead. The selected topology of the components interconnects plays prime rule in the performance of NoC architecture as well as routing and switching techniques that can be used. In this paper, we present two generic NoC architectures that can be customized to the specific communication needs of an application in order to reduce the area with minimal degradation of the latency of the system. An experimental study is performed to compare these structures with basic NoC topologies represented by 2D mesh, Butterfly-Fat Tree (BFT) and SPIN. It is shown that Cluster mesh (CMesh) and MinRoot schemes achieves significant improvements in network latency and energy consumption with only negligible area overhead and complexity over existing architectures. In fact, in the case of basic NoC topologies, CMesh and MinRoot schemes provides substantial savings in area as well, because they requires fewer routers. The simulation results show that CMesh and MinRoot networks outperforms MESH, BFT and SPIN in main performance metrics.
Abstract: A synchronous network-on-chip using wormhole packet switching
and supporting guaranteed-completion best-effort with low-priority (LP)
and high-priority (HP) wormhole packet delivery service is presented in
this paper. Both our proposed LP and HP message services deliver a good
quality of service in term of lossless packet completion and in-order message
data delivery. However, the LP message service does not guarantee minimal
completion bound. The HP packets will absolutely use 100% bandwidth of
their reserved links if the HP packets are injected from the source node with
maximum injection. Hence, the service are suitable for small size messages
(less than hundred bytes). Otherwise the other HP and LP messages, which
require also the links, will experience relatively high latency depending on the
size of the HP message. The LP packets are routed using a minimal adaptive
routing, while the HP packets are routed using a non-minimal adaptive routing
algorithm. Therefore, an additional 3-bit field, identifying the packet type,
is introduced in their packet headers to classify and to determine the type
of service committed to the packet. Our NoC prototypes have been also
synthesized using a 180-nm CMOS standard-cell technology to evaluate the
cost of implementing the combination of both services.
Abstract: This paper presents design trade-off and performance impacts of
the amount of pipeline phase of control path signals in a wormhole-switched
network-on-chip (NoC). The numbers of the pipeline phase of the control
path vary between two- and one-cycle pipeline phase. The control paths
consist of the routing request paths for output selection and the arbitration
paths for input selection. Data communications between on-chip routers are
implemented synchronously and for quality of service, the inter-router data
transports are controlled by using a link-level congestion control to avoid
lose of data because of an overflow. The trade-off between the area (logic
cell area) and the performance (bandwidth gain) of two proposed NoC router
microarchitectures are presented in this paper. The performance evaluation is
made by using a traffic scenario with different number of workloads under
2D mesh NoC topology using a static routing algorithm. By using a 130-nm
CMOS standard-cell technology, our NoC routers can be clocked at 1 GHz,
resulting in a high speed network link and high router bandwidth capacity
of about 320 Gbit/s. Based on our experiments, the amount of control path
pipeline stages gives more significant impact on the NoC performance than
the impact on the logic area of the NoC router.
Abstract: With the increasing number of on-chip components and the critical requirement for processing power, Chip Multiprocessor (CMP) has gained wide acceptance in both academia and industry during the last decade. However, the conventional bus-based onchip communication schemes suffer from very high communication delay and low scalability in large scale systems. Network-on-Chip (NoC) has been proposed to solve the bottleneck of parallel onchip communications by applying different network topologies which separate the communication phase from the computation phase. Observing that the memory bandwidth of the communication between on-chip components and off-chip memory has become a critical problem even in NoC based systems, in this paper, we propose a novel 3D NoC with on-chip Dynamic Random Access Memory (DRAM) in which different layers are dedicated to different functionalities such as processors, cache or memory. Results show that, by using our proposed architecture, average link utilization has reduced by 10.25% for SPLASH-2 workloads. Our proposed design costs 1.12% less execution cycles than the traditional design on average.
Abstract: In this paper, 3X3 routing nodes are proposed to
provide speedup and parallel processing capability in Data Vortex
network architectures. The new design not only significantly
improves network throughput and latency, but also eliminates the
need for distributive traffic control mechanism originally embedded
among nodes and the need for nodal buffering. The cost effectiveness
is studied by a comparison study with the previously proposed 2-
input buffered networks, and considerable performance enhancement
can be achieved with similar or lower cost of hardware. Unlike
previous implementation, the network leaves small probability of
contention, therefore, the packet drop rate must be kept low for such
implementation to be feasible and attractive, and it can be achieved
with proper choice of operation conditions.