Performance Improvements of DSP Applications on a Generic Reconfigurable Platform

Speedups from mapping four real-life DSP applications on an embedded system-on-chip that couples coarsegrained reconfigurable logic with an instruction-set processor are presented. The reconfigurable logic is realized by a 2-Dimensional Array of Processing Elements. A design flow for improving application-s performance is proposed. Critical software parts, called kernels, are accelerated on the Coarse-Grained Reconfigurable Array. The kernels are detected by profiling the source code. For mapping the detected kernels on the reconfigurable logic a prioritybased mapping algorithm has been developed. Two 4x4 array architectures, which differ in their interconnection structure among the Processing Elements, are considered. The experiments for eight different instances of a generic system show that important overall application speedups have been reported for the four applications. The performance improvements range from 1.86 to 3.67, with an average value of 2.53, compared with an all-software execution. These speedups are quite close to the maximum theoretical speedups imposed by Amdahl-s law.

WAF: an Interface Web Agent Framework

A trend in agent community or enterprises is that they are shifting from closed to open architectures composed of a large number of autonomous agents. One of its implications could be that interface agent framework is getting more important in multi-agent system (MAS); so that systems constructed for different application domains could share a common understanding in human computer interface (HCI) methods, as well as human-agent and agent-agent interfaces. However, interface agent framework usually receives less attention than other aspects of MAS. In this paper, we will propose an interface web agent framework which is based on our former project called WAF and a Distributed HCI template. A group of new functionalities and implications will be discussed, such as web agent presentation, off-line agent reference, reconfigurable activation map of agents, etc. Their enabling techniques and current standards (e.g. existing ontological framework) are also suggested and shown by examples from our own implementation in WAF.

A New History Based Method to Handle the Recurring Concept Shifts in Data Streams

Recent developments in storage technology and networking architectures have made it possible for broad areas of applications to rely on data streams for quick response and accurate decision making. Data streams are generated from events of real world so existence of associations, which are among the occurrence of these events in real world, among concepts of data streams is logical. Extraction of these hidden associations can be useful for prediction of subsequent concepts in concept shifting data streams. In this paper we present a new method for learning association among concepts of data stream and prediction of what the next concept will be. Knowing the next concept, an informed update of data model will be possible. The results of conducted experiments show that the proposed method is proper for classification of concept shifting data streams.

Performance Analysis of Digital Signal Processors Using SMV Benchmark

Unlike general-purpose processors, digital signal processors (DSP processors) are strongly application-dependent. To meet the needs for diverse applications, a wide variety of DSP processors based on different architectures ranging from the traditional to VLIW have been introduced to the market over the years. The functionality, performance, and cost of these processors vary over a wide range. In order to select a processor that meets the design criteria for an application, processor performance is usually the major concern for digital signal processing (DSP) application developers. Performance data are also essential for the designers of DSP processors to improve their design. Consequently, several DSP performance benchmarks have been proposed over the past decade or so. However, none of these benchmarks seem to have included recent new DSP applications. In this paper, we use a new benchmark that we recently developed to compare the performance of popular DSP processors from Texas Instruments and StarCore. The new benchmark is based on the Selectable Mode Vocoder (SMV), a speech-coding program from the recent third generation (3G) wireless voice applications. All benchmark kernels are compiled by the compilers of the respective DSP processors and run on their simulators. Weighted arithmetic mean of clock cycles and arithmetic mean of code size are used to compare the performance of five DSP processors. In addition, we studied how the performance of a processor is affected by code structure, features of processor architecture and optimization of compiler. The extensive experimental data gathered, analyzed, and presented in this paper should be helpful for DSP processor and compiler designers to meet their specific design goals.

Identification of Complex Sense-antisense Gene's Module on 17q11.2 Associated with Breast Cancer Aggressiveness and Patient's Survival

Sense-antisense gene pair (SAGP) is a pair of two oppositely transcribed genes sharing a common region on a chromosome. In the mammalian genomes, SAGPs can be organized in more complex sense-antisense gene architectures (CSAGA) in which at least one gene could share loci with two or more antisense partners. Many dozens of CSAGAs can be found in the human genome. However, CSAGAs have not been systematically identified and characterized in context of their role in human diseases including cancers. In this work we characterize the structural-functional properties of a cluster of 5 genes –TMEM97, IFT20, TNFAIP1, POLDIP2 and TMEM199, termed TNFAIP1 / POLDIP2 module. This cluster is organized as CSAGA in cytoband 17q11.2. Affymetrix U133A&B expression data of two large cohorts (410 atients, in total) of breast cancer patients and patient survival data were used. For the both studied cohorts, we demonstrate (i) strong and reproducible transcriptional co-regulatory patterns of genes of TNFAIP1/POLDIP2 module in breast cancer cell subtypes and (ii) significant associations of TNFAIP1/POLDIP2 CSAGA with amplification of the CSAGA region in breast cancer, (ii) cancer aggressiveness (e.g. genetic grades) and (iv) disease free patient-s survival. Moreover, gene pairs of this module demonstrate strong synergetic effect in the prognosis of time of breast cancer relapse. We suggest that TNFAIP1/ POLDIP2 cluster can be considered as a novel type of structural-functional gene modules in the human genome.

An Efficient Architecture for Interleaved Modular Multiplication

Modular multiplication is the basic operation in most public key cryptosystems, such as RSA, DSA, ECC, and DH key exchange. Unfortunately, very large operands (in order of 1024 or 2048 bits) must be used to provide sufficient security strength. The use of such big numbers dramatically slows down the whole cipher system, especially when running on embedded processors. So far, customized hardware accelerators - developed on FPGAs or ASICs - were the best choice for accelerating modular multiplication in embedded environments. On the other hand, many algorithms have been developed to speed up such operations. Examples are the Montgomery modular multiplication and the interleaved modular multiplication algorithms. Combining both customized hardware with an efficient algorithm is expected to provide a much faster cipher system. This paper introduces an enhanced architecture for computing the modular multiplication of two large numbers X and Y modulo a given modulus M. The proposed design is compared with three previous architectures depending on carry save adders and look up tables. Look up tables should be loaded with a set of pre-computed values. Our proposed architecture uses the same carry save addition, but replaces both look up tables and pre-computations with an enhanced version of sign detection techniques. The proposed architecture supports higher frequencies than other architectures. It also has a better overall absolute time for a single operation.

Modular Workflow System for HPC Applications

Nowadays, HPC, Grid and Cloud systems are evolving very rapidly. However, the development of infrastructure solutions related to HPC is lagging behind. While the existing infrastructure is sufficient for simple cases, many computational problems have more complex requirements.Such computational experiments use different resources simultaneously to start a large number of computational jobs.These resources are heterogeneous. They have different purposes, architectures, performance and used software.Users need a convenient tool that allows to describe and to run complex computational experiments under conditions of HPC environment. This paper introduces a modularworkflow system called SEGL which makes it possible to run complex computational experiments under conditions of a real HPC organization. The system can be used in a great number of organizations, which provide HPC power. Significant requirements to this system are high efficiency and interoperability with the existing HPC infrastructure of the organization without any changes.

Cloud Computing Databases: Latest Trends and Architectural Concepts

The Economic factors are leading to the rise of infrastructures provides software and computing facilities as a service, known as cloud services or cloud computing. Cloud services can provide efficiencies for application providers, both by limiting up-front capital expenses, and by reducing the cost of ownership over time. Such services are made available in a data center, using shared commodity hardware for computation and storage. There is a varied set of cloud services available today, including application services (salesforce.com), storage services (Amazon S3), compute services (Google App Engine, Amazon EC2) and data services (Amazon SimpleDB, Microsoft SQL Server Data Services, Google-s Data store). These services represent a variety of reformations of data management architectures, and more are on the horizon.

A Software-Supported Methodology for Designing General-Purpose Interconnection Networks for Reconfigurable Architectures

Modern applications realized onto FPGAs exhibit high connectivity demands. Throughout this paper we study the routing constraints of Virtex devices and we propose a systematic methodology for designing a novel general-purpose interconnection network targeting to reconfigurable architectures. This network consists of multiple segment wires and SB patterns, appropriately selected and assigned across the device. The goal of our proposed methodology is to maximize the hardware utilization of fabricated routing resources. The derived interconnection scheme is integrated on a Virtex style FPGA. This device is characterized both for its high-performance, as well as for its low-energy requirements. Due to this, the design criterion that guides our architecture selections was the minimal Energy×Delay Product (EDP). The methodology is fully-supported by three new software tools, which belong to MEANDER Design Framework. Using a typical set of MCNC benchmarks, extensive comparison study in terms of several critical parameters proves the effectiveness of the derived interconnection network. More specifically, we achieve average Energy×Delay Product reduction by 63%, performance increase by 26%, reduction in leakage power by 21%, reduction in total energy consumption by 11%, at the expense of increase of channel width by 20%.

Memristor: The Missing Circuit Element and its Application

Memristor is also known as the fourth fundamental passive circuit element. When current flows in one direction through the device, the electrical resistance increases and when current flows in the opposite direction, the resistance decreases. When the current is stopped, the component retains the last resistance that it had, and when the flow of charge starts again, the resistance of the circuit will be what it was when it was last active. It behaves as a nonlinear resistor with memory. Recently memristors have generated wide research interest and have found many applications. In this paper we survey the various applications of memristors which include non volatile memory, nanoelectronic memories, computer logic, neuromorphic computer architectures low power remote sensing applications, crossbar latches as transistor replacements, analog computations and switches.

Concurrency without Locking in Parallel Hash Structures used for Data Processing

Various mechanisms providing mutual exclusion and thread synchronization can be used to support parallel processing within a single computer. Instead of using locks, semaphores, barriers or other traditional approaches in this paper we focus on alternative ways for making better use of modern multithreaded architectures and preparing hash tables for concurrent accesses. Hash structures will be used to demonstrate and compare two entirely different approaches (rule based cooperation and hardware synchronization support) to an efficient parallel implementation using traditional locks. Comparison includes implementation details, performance ranking and scalability issues. We aim at understanding the effects the parallelization schemes have on the execution environment with special focus on the memory system and memory access characteristics.

Speedup of Data Vortex Network Architecture

In this paper, 3X3 routing nodes are proposed to provide speedup and parallel processing capability in Data Vortex network architectures. The new design not only significantly improves network throughput and latency, but also eliminates the need for distributive traffic control mechanism originally embedded among nodes and the need for nodal buffering. The cost effectiveness is studied by a comparison study with the previously proposed 2- input buffered networks, and considerable performance enhancement can be achieved with similar or lower cost of hardware. Unlike previous implementation, the network leaves small probability of contention, therefore, the packet drop rate must be kept low for such implementation to be feasible and attractive, and it can be achieved with proper choice of operation conditions.

Strongly Adequate Software Architecture

Components of a software system may be related in a wide variety of ways. These relationships need to be represented in software architecture in order develop quality software. In practice, software architecture is immensely challenging, strikingly multifaceted, extravagantly domain based, perpetually changing, rarely cost-effective, and deceptively ambiguous. This paper analyses relations among the major components of software systems and argues for using several broad categories for software architecture for assessment purposes: strongly adequate, weakly adequate and functionally adequate software architectures among other categories. These categories are intended for formative assessments of architectural designs.

MIMCA: A Modelling and Simulation Approach in Support of the Design and Construction of Manufacturing Control Systems Using Modular Petri net

A new generation of manufacturing machines so-called MIMCA (modular and integrated machine control architecture) capable of handling much increased complexity in manufacturing control-systems is presented. Requirement for more flexible and effective control systems for manufacturing machine systems is investigated and dimensioned-which highlights a need for improved means of coordinating and monitoring production machinery and equipment used to- transport material. The MIMCA supports simulation based on machine modeling, was conceived by the authors to address the issues. Essentially MIMCA comprises an organized unification of selected architectural frameworks and modeling methods, which include: NISTRCS, UMC and Colored Timed Petri nets (CTPN). The unification has been achieved; to support the design and construction of hierarchical and distributed machine control which realized the concurrent operation of reusable and distributed machine control components; ability to handle growing complexity; and support requirements for real- time control systems. Thus MIMCA enables mapping between 'what a machine should do' and 'how the machine does it' in a well-defined but flexible way designed to facilitate reconfiguration of machine systems.

A Security Analysis for Home Gateway Architectures

Providing Services at Home has become over the last few years a very dynamic and promising technological domain. It is likely to enable wide dissemination of secure and automated living environments. We propose a methodology for identifying threats to Services at Home Delivery systems, as well as a threat analysis of a multi-provider Home Gateway architecture. This methodology is based on a dichotomous positive/preventive study of the target system: it aims at identifying both what the system must do, and what it must not do. This approach completes existing methods with a synthetic view of potential security flaws, thus enabling suitable measures to be taken into account. Security implications of the evolution of a given system become easier to deal with. A prototype is built based on the conclusions of this analysis.

3D Quantum Numerical Simulation of Horizontal Rectangular Dual Metal Gate\Gate All Around MOSFETs

The integrity and issues related to electrostatic performance associated with scaling Si MOSFET bulk sub 10nm channel length promotes research in new device architectures such as SOI, double gate and GAA MOSFET. In this paper, we present some novel characteristic of horizontal rectangular gate\gate all around MOSFETs with dual metal of gate we obtained using SILVACO TCAD tools. We will also exhibit some simulation results we obtained relating to the influence of some parameters variation on our structure, that having a direct impact on their threshold voltage and drain current. In addition, our TFET showed reasonable ION/IOFF ratio of (104) and low drain induced barrier lowering (DIBL) of 39 mV/V.

Proposal of a Means for Reducing the Torque Variation on a Vertical-Axis Water Turbine by Increasing the Blade Number

This paper presents a means for reducing the torque variation during the revolution of a vertical-axis water turbine (VAWaterT) by increasing the blade number. For this purpose, twodimensional CFD analyses have been performed on a straight-bladed Darrieus-type rotor. After describing the computational model and the relative validation procedure, a complete campaign of simulations, based on full RANS unsteady calculations, is proposed for a three, four and five-bladed rotor architectures, characterized by a NACA 0025 airfoil. For each proposed rotor configuration, flow field characteristics are investigated at several values of tip speed ratio, allowing a quantification of the influence of blade number on flow geometric features and dynamic quantities, such as rotor torque and power. Finally, torque and power curves are compared for the three analyzed architectures, achieving a quantification of the effect of blade number on overall rotor performance.

Meta Model Based EA for Complex Optimization

Evolutionary Algorithms are population-based, stochastic search techniques, widely used as efficient global optimizers. However, many real life optimization problems often require finding optimal solution to complex high dimensional, multimodal problems involving computationally very expensive fitness function evaluations. Use of evolutionary algorithms in such problem domains is thus practically prohibitive. An attractive alternative is to build meta models or use an approximation of the actual fitness functions to be evaluated. These meta models are order of magnitude cheaper to evaluate compared to the actual function evaluation. Many regression and interpolation tools are available to build such meta models. This paper briefly discusses the architectures and use of such meta-modeling tools in an evolutionary optimization context. We further present two evolutionary algorithm frameworks which involve use of meta models for fitness function evaluation. The first framework, namely the Dynamic Approximate Fitness based Hybrid EA (DAFHEA) model [14] reduces computation time by controlled use of meta-models (in this case approximate model generated by Support Vector Machine regression) to partially replace the actual function evaluation by approximate function evaluation. However, the underlying assumption in DAFHEA is that the training samples for the metamodel are generated from a single uniform model. This does not take into account uncertain scenarios involving noisy fitness functions. The second model, DAFHEA-II, an enhanced version of the original DAFHEA framework, incorporates a multiple-model based learning approach for the support vector machine approximator to handle noisy functions [15]. Empirical results obtained by evaluating the frameworks using several benchmark functions demonstrate their efficiency

Assamese Numeral Corpus for Speech Recognition using Cooperative ANN Architecture

Speech corpus is one of the major components in a Speech Processing System where one of the primary requirements is to recognize an input sample. The quality and details captured in speech corpus directly affects the precision of recognition. The current work proposes a platform for speech corpus generation using an adaptive LMS filter and LPC cepstrum, as a part of an ANN based Speech Recognition System which is exclusively designed to recognize isolated numerals of Assamese language- a major language in the North Eastern part of India. The work focuses on designing an optimal feature extraction block and a few ANN based cooperative architectures so that the performance of the Speech Recognition System can be improved.