MinRoot and CMesh: Interconnection Architectures for Network-on-Chip Systems

The success of an electronic system in a System-on- Chip is highly dependent on the efficiency of its interconnection network, which is constructed from routers and channels (the routers move data across the channels between nodes). Since neither classical bus based nor point to point architectures can provide scalable solutions and satisfy the tight power and performance requirements of future applications, the Network-on-Chip (NoC) approach has recently been proposed as a promising solution. Indeed, in contrast to the traditional solutions, the NoC approach can provide large bandwidth with moderate area overhead. The selected topology of the components interconnects plays prime rule in the performance of NoC architecture as well as routing and switching techniques that can be used. In this paper, we present two generic NoC architectures that can be customized to the specific communication needs of an application in order to reduce the area with minimal degradation of the latency of the system. An experimental study is performed to compare these structures with basic NoC topologies represented by 2D mesh, Butterfly-Fat Tree (BFT) and SPIN. It is shown that Cluster mesh (CMesh) and MinRoot schemes achieves significant improvements in network latency and energy consumption with only negligible area overhead and complexity over existing architectures. In fact, in the case of basic NoC topologies, CMesh and MinRoot schemes provides substantial savings in area as well, because they requires fewer routers. The simulation results show that CMesh and MinRoot networks outperforms MESH, BFT and SPIN in main performance metrics.

A Novel Implementation of Application Specific Instruction-set Processor (ASIP) using Verilog

The general purpose processors that are used in embedded systems must support constraints like execution time, power consumption, code size and so on. On the other hand an Application Specific Instruction-set Processor (ASIP) has advantages in terms of power consumption, performance and flexibility. In this paper, a 16-bit Application Specific Instruction-set processor for the sensor data transfer is proposed. The designed processor architecture consists of on-chip transmitter and receiver modules along with the processing and controlling units to enable the data transmission and reception on a single die. The data transfer is accomplished with less number of instructions as compared with the general purpose processor. The ASIP core operates at a maximum clock frequency of 1.132GHz with a delay of 0.883ns and consumes 569.63mW power at an operating voltage of 1.2V. The ASIP is implemented in Verilog HDL using the Xilinx platform on Virtex4.

Flexible Wormhole-Switched Network-on-chip with Two-Level Priority Data Delivery Service

A synchronous network-on-chip using wormhole packet switching and supporting guaranteed-completion best-effort with low-priority (LP) and high-priority (HP) wormhole packet delivery service is presented in this paper. Both our proposed LP and HP message services deliver a good quality of service in term of lossless packet completion and in-order message data delivery. However, the LP message service does not guarantee minimal completion bound. The HP packets will absolutely use 100% bandwidth of their reserved links if the HP packets are injected from the source node with maximum injection. Hence, the service are suitable for small size messages (less than hundred bytes). Otherwise the other HP and LP messages, which require also the links, will experience relatively high latency depending on the size of the HP message. The LP packets are routed using a minimal adaptive routing, while the HP packets are routed using a non-minimal adaptive routing algorithm. Therefore, an additional 3-bit field, identifying the packet type, is introduced in their packet headers to classify and to determine the type of service committed to the packet. Our NoC prototypes have been also synthesized using a 180-nm CMOS standard-cell technology to evaluate the cost of implementing the combination of both services.

Independent Spanning Trees on Systems-on-chip Hypercubes Routing

Independent spanning trees (ISTs) provide a number of advantages in data broadcasting. One can cite the use in fault tolerance network protocols for distributed computing and bandwidth. However, the problem of constructing multiple ISTs is considered hard for arbitrary graphs. In this paper we present an efficient algorithm to construct ISTs on hypercubes that requires minimum resources to be performed.

DWT Based Robust Watermarking Embed Using CRC-32 Techniques

As far as the latest technological improvements are concerned, digital systems more become popular than the past. Despite this growing demand to the digital systems, content copy and attack against the digital cinema contents becomes a serious problem. To solve the above security problem, we propose “traceable watermarking using Hash functions for digital cinema system. Digital Cinema is a great application for traceable watermarking since it uses watermarking technology during content play as well as content transmission. The watermark is embedded into the randomly selected movie frames using CRC-32 techniques. CRC-32 is a Hash function. Using it, the embedding position is distributed by Hash Function so that any party cannot break off the watermarking or will not be able to change. Finally, our experimental results show that proposed DWT watermarking method using CRC-32 is much better than the convenient watermarking techniques in terms of robustness, image quality and its simple but unbreakable algorithm.

Comparative Analysis of Transient-Fault Tolerant Schemes for Network on Chips

Network on a chip (NoC) has been proposed as a viable solution to counter the inefficiency of buses in the current VLSI on-chip interconnects. However, as the silicon chip accommodates more transistors, the probability of transient faults is increasing, making fault tolerance a key concern in scaling chips. In packet based communication on a chip, transient failures can corrupt the data packet and hence, undermine the accuracy of data communication. In this paper, we present a comparative analysis of transient fault tolerant techniques including end-to-end, node-by-node, and stochastic communication based on flooding principle.

Silicon-Waveguide Based Silicide Schottky- Barrier Infrared Detector for on-Chip Applications

We prove detailed analysis of a waveguide-based Schottky barrier photodetector (SBPD) where a thin silicide film is put on the top of a silicon-on-insulator (SOI) channel waveguide to absorb light propagating along the waveguide. Taking both the confinement factor of light absorption and the wall scanning induced gain of the photoexcited carriers into account, an optimized silicide thickness is extracted to maximize the effective gain, thereby the responsivity. For typical lengths of the thin silicide film (10-20 Ðçm), the optimized thickness is estimated to be in the range of 1-2 nm, and only about 50-80% light power is absorbed to reach the maximum responsivity. Resonant waveguide-based SBPDs are proposed, which consist of a microloop, microdisc, or microring waveguide structure to allow light multiply propagating along the circular Si waveguide beneath the thin silicide film. Simulation results suggest that such resonant waveguide-based SBPDs have much higher repsonsivity at the resonant wavelengths as compared to the straight waveguidebased detectors. Some experimental results about Si waveguide-based SBPD are also reported.

Pipelined Control-Path Effects on Area and Performance of a Wormhole-Switched Network-on-Chip

This paper presents design trade-off and performance impacts of the amount of pipeline phase of control path signals in a wormhole-switched network-on-chip (NoC). The numbers of the pipeline phase of the control path vary between two- and one-cycle pipeline phase. The control paths consist of the routing request paths for output selection and the arbitration paths for input selection. Data communications between on-chip routers are implemented synchronously and for quality of service, the inter-router data transports are controlled by using a link-level congestion control to avoid lose of data because of an overflow. The trade-off between the area (logic cell area) and the performance (bandwidth gain) of two proposed NoC router microarchitectures are presented in this paper. The performance evaluation is made by using a traffic scenario with different number of workloads under 2D mesh NoC topology using a static routing algorithm. By using a 130-nm CMOS standard-cell technology, our NoC routers can be clocked at 1 GHz, resulting in a high speed network link and high router bandwidth capacity of about 320 Gbit/s. Based on our experiments, the amount of control path pipeline stages gives more significant impact on the NoC performance than the impact on the logic area of the NoC router.

A Novel FIFO Design for Data Transfer in Mixed Timing Systems

In the current scenario, with the increasing integration densities, most system-on-chip designs are partitioned into multiple clock domains. In this paper, an asynchronous FIFO (First-in First-out pipeline) design is employed as a data transfer interface between two independent clock domains. Since the clocks on the either sides of the FIFO run at a different speed, the task to ensure the correct data transmission through this FIFO is manually performed. Firstly an existing asynchronous FIFO design is discussed and simulated. Gate-level simulation results depicted the flaw in existing design. In order to solve this problem, a novel modified asynchronous FIFO design is proposed. The results obtained from proposed design are in perfect accordance with theoretical expectations. The proposed asynchronous FIFO design outperforms the existing design in terms of accuracy and speed. In order to evaluate the performance of the FIFO designs presented in this paper, the circuits were implemented in 0.24µ TSMC CMOS technology and simulated at 2.5V using HSpice (© Avant! Corporation). The layout design of the proposed FIFO is also presented.

Extended Low Power Bus Binding Combined with Data Sequence Reordering

In this paper, we address the problem of reducing the switching activity (SA) in on-chip buses through the use of a bus binding technique in high-level synthesis. While many binding techniques to reduce the SA exist, we present yet another technique for further reducing the switching activity. Our proposed method combines bus binding and data sequence reordering to explore a wider solution space. The problem is formulated as a multiple traveling salesman problem and solved using simulated annealing technique. The experimental results revealed that a binding solution obtained with the proposed method reduces 5.6-27.2% (18.0% on average) and 2.6-12.7% (6.8% on average) of the switching activity when compared with conventional binding-only and hybrid binding-encoding methods, respectively.

Explicit Delay and Power Estimation Method for CMOS Inverter Driving on-Chip RLC Interconnect Load

The resistive-inductive-capacitive behavior of long interconnects which are driven by CMOS gates are presented in this paper. The analysis is based on the ¤Ç-model of a RLC load and is developed for submicron devices. Accurate and analytical expressions for the output load voltage, the propagation delay and the short circuit power dissipation have been proposed after solving a system of differential equations which accurately describe the behavior of the circuit. The effect of coupling capacitance between input and output and the short circuit current on these performance parameters are also incorporated in the proposed model. The estimated proposed delay and short circuit power dissipation are in very good agreement with the SPICE simulation with average relative error less than 6%.

An Innovational Intermittent Algorithm in Networks-On-Chip (NOC)

Every day human life experiences new equipments more automatic and with more abilities. So the need for faster processors doesn-t seem to finish. Despite new architectures and higher frequencies, a single processor is not adequate for many applications. Parallel processing and networks are previous solutions for this problem. The new solution to put a network of resources on a chip is called NOC (network on a chip). The more usual topology for NOC is mesh topology. There are several routing algorithms suitable for this topology such as XY, fully adaptive, etc. In this paper we have suggested a new algorithm named Intermittent X, Y (IX/Y). We have developed the new algorithm in simulation environment to compare delay and power consumption with elders' algorithms.

A Smart-Visio Microphone for Audio-Visual Speech Recognition “Vmike“

The practical implementation of audio-video coupled speech recognition systems is mainly limited by the hardware complexity to integrate two radically different information capturing devices with good temporal synchronisation. In this paper, we propose a solution based on a smart CMOS image sensor in order to simplify the hardware integration difficulties. By using on-chip image processing, this smart sensor can calculate in real time the X/Y projections of the captured image. This on-chip projection reduces considerably the volume of the output data. This data-volume reduction permits a transmission of the condensed visual information via the same audio channel by using a stereophonic input available on most of the standard computation devices such as PC, PDA and mobile phones. A prototype called VMIKE (Visio-Microphone) has been designed and realised by using standard 0.35um CMOS technology. A preliminary experiment gives encouraged results. Its efficiency will be further investigated in a large variety of applications such as biometrics, speech recognition in noisy environments, and vocal control for military or disabled persons, etc.

CMOS-Compatible Silicon Nanoplasmonics for On-Chip Integration

Although silicon photonic devices provide a significantly larger bandwidth and dissipate a substantially less power than the electronic devices, they suffer from a large size due to the fundamental diffraction limit and the weak optical response of Si. A potential solution is to exploit Si plasmonics, which may not only miniaturize the photonic device far beyond the diffraction limit, but also enhance the optical response in Si due to the electromagnetic field confinement. In this paper, we discuss and summarize the recently developed metal-insulator-Si-insulator-metal nanoplasmonic waveguide as well as various passive and active plasmonic components based on this waveguide, including coupler, bend, power splitter, ring resonator, MZI, modulator, detector, etc. All these plasmonic components are CMOS compatible and could be integrated with electronic and conventional dielectric photonic devices on the same SOI chip. More potential plasmonic devices as well as plasmonic nanocircuits with complex functionalities are also addressed.

Real-Time Digital Oscilloscope Implementation in 90nm CMOS Technology FPGA

This paper describes the design of a real-time audiorange digital oscilloscope and its implementation in 90nm CMOS FPGA platform. The design consists of sample and hold circuits, A/D conversion, audio and video processing, on-chip RAM, clock generation and control logic. The design of internal blocks and modules in 90nm devices in an FPGA is elaborated. Also the key features and their implementation algorithms are presented. Finally, the timing waveforms and simulation results are put forward.

Accurate Crosstalk Analysis for RLC On-Chip VLSI Interconnect

This work proposes an accurate crosstalk noise estimation method in the presence of multiple RLC lines for the use in design automation tools. This method correctly models the loading effects of non switching aggressors and aggressor tree branches using resistive shielding effect and realistic exponential input waveforms. Noise peak and width expressions have been derived. The results obtained are at good agreement with SPICE results. Results show that average error for noise peak is 4.7% and for the width is 6.15% while allowing a very fast analysis.

Long-Term On-Chip Storage and Release of Liquid Reagents for Diagnostic Lab-on-a-Chip Applications

A new concept for long-term reagent storage for Labon- a-Chip (LoC) devices is described. Here we present a polymer multilayer stack with integrated stick packs for long-term storage of several liquid reagents, which are necessary for many diagnostic applications. Stick packs are widely used in packaging industry for storing solids and liquids for long time. The storage concept fulfills two main requirements: First, a long-term storage of reagents in stick packs without significant losses and interaction with surroundings, second, on demand releasing of liquids, which is realized by pushing a membrane against the stick pack through pneumatic pressure. This concept enables long-term on-chip storage of liquid reagents at room temperature and allows an easy implementation in different LoC devices.

3D Network-on-Chip with on-Chip DRAM: An Empirical Analysis for Future Chip Multiprocessor

With the increasing number of on-chip components and the critical requirement for processing power, Chip Multiprocessor (CMP) has gained wide acceptance in both academia and industry during the last decade. However, the conventional bus-based onchip communication schemes suffer from very high communication delay and low scalability in large scale systems. Network-on-Chip (NoC) has been proposed to solve the bottleneck of parallel onchip communications by applying different network topologies which separate the communication phase from the computation phase. Observing that the memory bandwidth of the communication between on-chip components and off-chip memory has become a critical problem even in NoC based systems, in this paper, we propose a novel 3D NoC with on-chip Dynamic Random Access Memory (DRAM) in which different layers are dedicated to different functionalities such as processors, cache or memory. Results show that, by using our proposed architecture, average link utilization has reduced by 10.25% for SPLASH-2 workloads. Our proposed design costs 1.12% less execution cycles than the traditional design on average.

Robust Digital Cinema Watermarking

With the advent of digital cinema and digital broadcasting, copyright protection of video data has been one of the most important issues. We present a novel method of watermarking for video image data based on the hardware and digital wavelet transform techniques and name it as “traceable watermarking" because the watermarked data is constructed before the transmission process and traced after it has been received by an authorized user. In our method, we embed the watermark to the lowest part of each image frame in decoded video by using a hardware LSI. Digital Cinema is an important application for traceable watermarking since digital cinema system makes use of watermarking technology during content encoding, encryption, transmission, decoding and all the intermediate process to be done in digital cinema systems. The watermark is embedded into the randomly selected movie frames using hash functions. Embedded watermark information can be extracted from the decoded video data. For that, there is no need to access original movie data. Our experimental results show that proposed traceable watermarking method for digital cinema system is much better than the convenient watermarking techniques in terms of robustness, image quality, speed, simplicity and robust structure.

Fast and Efficient On-Chip Interconnection Modeling for High Speed VLSI Systems

Timing driven physical design, synthesis, and optimization tools need efficient closed-form delay models for estimating the delay associated with each net in an integrated circuit (IC) design. The total number of nets in a modern IC design has increased dramatically and exceeded millions. Therefore efficient modeling of interconnection is needed for high speed IC-s. This paper presents closed–form expressions for RC and RLC interconnection trees in current mode signaling, which can be implemented in VLSI design tool. These analytical model expressions can be used for accurate calculation of delay after the design clock tree has been laid out and the design is fully routed. Evaluation of these analytical models is several orders of magnitude faster than simulation using SPICE.