A Survey of Baseband Architecture for Software Defined Radio

This paper is a survey of recent works that proposes a baseband processor architecture for software defined radio. A classification of different approaches is proposed. The performance of each architecture is also discussed in order to clarify the suitable approaches that meet software-defined radio constraints.

Soliton Interaction in Multi-Core Optical Fiber: Application to WDM System

The analytical bright two soliton solution of the 3- coupled nonlinear Schrödinger equations with variable coefficients in birefringent optical fiber is obtained by Darboux transformation method. To the design of ultra-speed optical devices, Soliton interaction and control in birefringence fiber is investigated. Lax pair is constructed for N coupled NLS system through AKNS method. Using two-soliton solution, we demonstrate different interaction behaviors of solitons in birefringent fiber depending on the choice of control parameters. Our results shows that interactions of optical solitons have some specific applications such as construction of logic gates, optical computing, soliton switching, and soliton amplification in wavelength division multiplexing (WDM) system.

FPGA Hardware Implementation and Evaluation of a Micro-Network Architecture for Multi-Core Systems

This paper presents the design, implementation and evaluation of a micro-network, or Network-on-Chip (NoC), based on a generic pipeline router architecture. The router is designed to efficiently support traffic generated by multimedia applications on embedded multi-core systems. It employs a simplest routing mechanism and implements the round-robin scheduling strategy to resolve output port contentions and minimize latency. A virtual channel flow control is applied to avoid the head-of-line blocking problem and enhance performance in the NoC. The hardware design of the router architecture has been implemented at the register transfer level; its functionality is evaluated in the case of the two dimensional Mesh/Torus topology, and performance results are derived from ModelSim simulator and Xilinx ISE 9.2i synthesis tool. An example of a multi-core image processing system utilizing the NoC structure has been implemented and validated to demonstrate the capability of the proposed micro-network architecture. To reduce complexity of the image compression and decompression architecture, the system use image processing algorithm based on classical discrete cosine transform with an efficient zonal processing approach. The experimental results have confirmed that both the proposed image compression scheme and NoC architecture can achieve a reasonable image quality with lower processing time.

A Survey on Performance Tools for OpenMP

Advances in processors architecture, such as multicore, increase the size of complexity of parallel computer systems. With multi-core architecture there are different parallel languages that can be used to run parallel programs. One of these languages is OpenMP which embedded in C/Cµ or FORTRAN. Because of this new architecture and the complexity, it is very important to evaluate the performance of OpenMP constructs, kernels, and application program on multi-core systems. Performance is the activity of collecting the information about the execution characteristics of a program. Performance tools consists of at least three interfacing software layers, including instrumentation, measurement, and analysis. The instrumentation layer defines the measured performance events. The measurement layer determines what performance event is actually captured and how it is measured by the tool. The analysis layer processes the performance data and summarizes it into a form that can be displayed in performance tools. In this paper, a number of OpenMP performance tools are surveyed, explaining how each is used to collect, analyse, and display data collection.

JConqurr - A Multi-Core Programming Toolkit for Java

With the popularity of the multi-core and many-core architectures there is a great requirement for software frameworks which can support parallel programming methodologies. In this paper we introduce an Eclipse toolkit, JConqurr which is easy to use and provides robust support for flexible parallel progrmaming. JConqurr is a multi-core and many-core programming toolkit for Java which is capable of providing support for common parallel programming patterns which include task, data, divide and conquer and pipeline parallelism. The toolkit uses an annotation and a directive mechanism to convert the sequential code into parallel code. In addition to that we have proposed a novel mechanism to achieve the parallelism using graphical processing units (GPU). Experiments with common parallelizable algorithms have shown that our toolkit can be easily and efficiently used to convert sequential code to parallel code and significant performance gains can be achieved.

Aspect based Reusable Synchronization Schemes

Concurrency and synchronization are becoming big issues as every new PC comes with multi-core processors. A major reason for Object-Oriented Programming originally was to enable easier reuse: encode your algorithm into a class and thoroughly debug it, then you can reuse the class again and again. However, when we get to concurrency and synchronization, this is often not possible. Thread-safety issues means that synchronization constructs need to be entangled into every class involved. We contributed a detailed literature review of issues and challenges in concurrent programming and present a methodology that uses the Aspect- Oriented paradigm to address this problem. Aspects will allow us to extract the synchronization concerns as schemes to be “weaved in" later into the main code. This allows the aspects to be separately tested and verified. Hence, the functional components can be weaved with reusable synchronization schemes that are robust and scalable.

CScheme in Traditional Concurrency Problems

CScheme, a concurrent programming paradigm based on scheme concept enables concurrency schemes to be constructed from smaller synchronization units through a GUI based composer and latter be reused on other concurrency problems of a similar nature. This paradigm is particularly important in the multi-core environment prevalent nowadays. In this paper, we demonstrate techniques to separate concurrency from functional code using the CScheme paradigm. Then we illustrate how the CScheme methodology can be used to solve some of the traditional concurrency problems – critical section problem, and readers-writers problem - using synchronization schemes such as Single Threaded Execution Scheme, and Readers Writers Scheme.

Splitting Modified Donor-Cell Schemes for Spectral Action Balance Equation

The spectral action balance equation is an equation that used to simulate short-crested wind-generated waves in shallow water areas such as coastal regions and inland waters. This equation consists of two spatial dimensions, wave direction, and wave frequency which can be solved by finite difference method. When this equation with dominating propagation velocity terms are discretized using central differences, stability problems occur when the grid spacing is chosen too coarse. In this paper, we introduce the splitting modified donorcell scheme for avoiding stability problems and prove that it is consistent to the modified donor-cell scheme with same accuracy. The splitting modified donor-cell scheme was adopted to split the wave spectral action balance equation into four one-dimensional problems, which for each small problem obtains the independently tridiagonal linear systems. For each smaller system can be solved by direct or iterative methods at the same time which is very fast when performed by a multi-cores computer.

Dynamic Data Partition Algorithm for a Parallel H.264 Encoder

The H.264/AVC standard is a highly efficient video codec providing high-quality videos at low bit-rates. As employing advanced techniques, the computational complexity has been increased. The complexity brings about the major problem in the implementation of a real-time encoder and decoder. Parallelism is the one of approaches which can be implemented by multi-core system. We analyze macroblock-level parallelism which ensures the same bit rate with high concurrency of processors. In order to reduce the encoding time, dynamic data partition based on macroblock region is proposed. The data partition has the advantages in load balancing and data communication overhead. Using the data partition, the encoder obtains more than 3.59x speed-up on a four-processor system. This work can be applied to other multimedia processing applications.