Abstract: Speedups from mapping four real-life DSP
applications on an embedded system-on-chip that couples coarsegrained
reconfigurable logic with an instruction-set processor are
presented. The reconfigurable logic is realized by a 2-Dimensional
Array of Processing Elements. A design flow for improving
application-s performance is proposed. Critical software parts, called
kernels, are accelerated on the Coarse-Grained Reconfigurable
Array. The kernels are detected by profiling the source code. For
mapping the detected kernels on the reconfigurable logic a prioritybased
mapping algorithm has been developed. Two 4x4 array
architectures, which differ in their interconnection structure among
the Processing Elements, are considered. The experiments for eight
different instances of a generic system show that important overall
application speedups have been reported for the four applications.
The performance improvements range from 1.86 to 3.67, with an
average value of 2.53, compared with an all-software execution.
These speedups are quite close to the maximum theoretical speedups
imposed by Amdahl-s law.
Abstract: A trend in agent community or enterprises is that they are shifting from closed to open architectures composed of a large number of autonomous agents. One of its implications could be that interface agent framework is getting more important in multi-agent system (MAS); so that systems constructed for different application domains could share a common understanding in human computer interface (HCI) methods, as well as human-agent and agent-agent interfaces. However, interface agent framework usually receives less attention than other aspects of MAS. In this paper, we will propose an interface web agent framework which is based on our former project called WAF and a Distributed HCI template. A group of new functionalities and implications will be discussed, such as web agent presentation, off-line agent reference, reconfigurable activation map of agents, etc. Their enabling techniques and current standards (e.g. existing ontological framework) are also suggested and shown by examples from our own implementation in WAF.
Abstract: Recent developments in storage technology and
networking architectures have made it possible for broad areas of applications to rely on data streams for quick response and accurate
decision making. Data streams are generated from events of real world so existence of associations, which are among the occurrence of these events in real world, among concepts of data streams is
logical. Extraction of these hidden associations can be useful for prediction of subsequent concepts in concept shifting data streams. In this paper we present a new method for learning association among
concepts of data stream and prediction of what the next concept will be. Knowing the next concept, an informed update of data model will be possible. The results of conducted experiments show that the proposed method is proper for classification of concept shifting data
streams.
Abstract: Unlike general-purpose processors, digital signal
processors (DSP processors) are strongly application-dependent. To
meet the needs for diverse applications, a wide variety of DSP
processors based on different architectures ranging from the
traditional to VLIW have been introduced to the market over the
years. The functionality, performance, and cost of these processors
vary over a wide range. In order to select a processor that meets the
design criteria for an application, processor performance is usually
the major concern for digital signal processing (DSP) application
developers. Performance data are also essential for the designers of
DSP processors to improve their design. Consequently, several DSP
performance benchmarks have been proposed over the past decade or
so. However, none of these benchmarks seem to have included recent
new DSP applications.
In this paper, we use a new benchmark that we recently developed
to compare the performance of popular DSP processors from Texas
Instruments and StarCore. The new benchmark is based on the
Selectable Mode Vocoder (SMV), a speech-coding program from the
recent third generation (3G) wireless voice applications. All
benchmark kernels are compiled by the compilers of the respective
DSP processors and run on their simulators. Weighted arithmetic
mean of clock cycles and arithmetic mean of code size are used to
compare the performance of five DSP processors.
In addition, we studied how the performance of a processor is
affected by code structure, features of processor architecture and
optimization of compiler. The extensive experimental data gathered,
analyzed, and presented in this paper should be helpful for DSP
processor and compiler designers to meet their specific design goals.
Abstract: Sense-antisense gene pair (SAGP) is a pair of two oppositely transcribed genes sharing a common region on a chromosome. In the mammalian genomes, SAGPs can be organized in more complex sense-antisense gene architectures (CSAGA) in which at least one gene could share loci with two or more antisense partners. Many dozens of CSAGAs can be found in the human genome. However, CSAGAs have not been systematically identified and characterized in context of their role in human diseases including cancers. In this work we characterize the structural-functional properties of a cluster of 5 genes –TMEM97, IFT20, TNFAIP1, POLDIP2 and TMEM199, termed TNFAIP1 / POLDIP2 module. This cluster is organized as CSAGA in cytoband 17q11.2. Affymetrix U133A&B expression data of two large cohorts (410 atients, in total) of breast cancer patients and patient survival data were used. For the both studied cohorts, we demonstrate (i) strong and reproducible transcriptional co-regulatory patterns of genes of TNFAIP1/POLDIP2 module in breast cancer cell subtypes and (ii) significant associations of TNFAIP1/POLDIP2 CSAGA with amplification of the CSAGA region in breast cancer, (ii) cancer aggressiveness (e.g. genetic grades) and (iv) disease free patient-s survival. Moreover, gene pairs of this module demonstrate strong synergetic effect in the prognosis of time of breast cancer relapse. We suggest that TNFAIP1/ POLDIP2 cluster can be considered as a novel type of structural-functional gene modules in the human genome.
Abstract: Modular multiplication is the basic operation
in most public key cryptosystems, such as RSA, DSA, ECC,
and DH key exchange. Unfortunately, very large operands
(in order of 1024 or 2048 bits) must be used to provide
sufficient security strength. The use of such big numbers
dramatically slows down the whole cipher system, especially
when running on embedded processors.
So far, customized hardware accelerators - developed on
FPGAs or ASICs - were the best choice for accelerating
modular multiplication in embedded environments. On the
other hand, many algorithms have been developed to speed
up such operations. Examples are the Montgomery modular
multiplication and the interleaved modular multiplication
algorithms. Combining both customized hardware with
an efficient algorithm is expected to provide a much faster
cipher system.
This paper introduces an enhanced architecture for computing
the modular multiplication of two large numbers X
and Y modulo a given modulus M. The proposed design is
compared with three previous architectures depending on
carry save adders and look up tables. Look up tables should
be loaded with a set of pre-computed values. Our proposed
architecture uses the same carry save addition, but replaces
both look up tables and pre-computations with an enhanced
version of sign detection techniques. The proposed architecture
supports higher frequencies than other architectures.
It also has a better overall absolute time for a single operation.
Abstract: Nowadays, HPC, Grid and Cloud systems are evolving
very rapidly. However, the development of infrastructure solutions
related to HPC is lagging behind. While the existing infrastructure is
sufficient for simple cases, many computational problems have more
complex requirements.Such computational experiments use different
resources simultaneously to start a large number of computational
jobs.These resources are heterogeneous. They have different
purposes, architectures, performance and used software.Users need a
convenient tool that allows to describe and to run complex
computational experiments under conditions of HPC environment.
This paper introduces a modularworkflow system called SEGL
which makes it possible to run complex computational experiments
under conditions of a real HPC organization. The system can be used
in a great number of organizations, which provide HPC power.
Significant requirements to this system are high efficiency and
interoperability with the existing HPC infrastructure of the
organization without any changes.
Abstract: The Economic factors are leading to the rise of
infrastructures provides software and computing facilities as a
service, known as cloud services or cloud computing. Cloud services
can provide efficiencies for application providers, both by limiting
up-front capital expenses, and by reducing the cost of ownership over
time. Such services are made available in a data center, using shared
commodity hardware for computation and storage. There is a varied
set of cloud services available today, including application services
(salesforce.com), storage services (Amazon S3), compute services
(Google App Engine, Amazon EC2) and data services (Amazon
SimpleDB, Microsoft SQL Server Data Services, Google-s Data
store). These services represent a variety of reformations of data
management architectures, and more are on the horizon.
Abstract: The optimal bisection width of r-dimensional N×
· · ·× N grid is known to be Nr-1 when N is even, but when
N is odd, only approximate values are available. This paper
shows that the exact bisection width of grid is Nr
-1
N-1 when N is odd.
Abstract: Modern applications realized onto FPGAs exhibit high connectivity demands. Throughout this paper we study the routing constraints of Virtex devices and we propose a systematic methodology for designing a novel general-purpose interconnection network targeting to reconfigurable architectures. This network consists of multiple segment wires and SB patterns, appropriately selected and assigned across the device. The goal of our proposed methodology is to maximize the hardware utilization of fabricated routing resources. The derived interconnection scheme is integrated on a Virtex style FPGA. This device is characterized both for its high-performance, as well as for its low-energy requirements. Due to this, the design criterion that guides our architecture selections was the minimal Energy×Delay Product (EDP). The methodology is fully-supported by three new software tools, which belong to MEANDER Design Framework. Using a typical set of MCNC benchmarks, extensive comparison study in terms of several critical parameters proves the effectiveness of the derived interconnection network. More specifically, we achieve average Energy×Delay Product reduction by 63%, performance increase by 26%, reduction in leakage power by 21%, reduction in total energy consumption by 11%, at the expense of increase of channel width by 20%.
Abstract: Memristor is also known as the fourth fundamental
passive circuit element. When current flows in one direction through
the device, the electrical resistance increases and when current flows
in the opposite direction, the resistance decreases. When the current
is stopped, the component retains the last resistance that it had, and
when the flow of charge starts again, the resistance of the circuit will
be what it was when it was last active. It behaves as a nonlinear
resistor with memory. Recently memristors have generated wide
research interest and have found many applications. In this paper we
survey the various applications of memristors which include non
volatile memory, nanoelectronic memories, computer logic,
neuromorphic computer architectures low power remote sensing
applications, crossbar latches as transistor replacements, analog
computations and switches.
Abstract: Various mechanisms providing mutual exclusion and
thread synchronization can be used to support parallel processing
within a single computer. Instead of using locks, semaphores, barriers
or other traditional approaches in this paper we focus on alternative
ways for making better use of modern multithreaded architectures
and preparing hash tables for concurrent accesses. Hash structures
will be used to demonstrate and compare two entirely different
approaches (rule based cooperation and hardware synchronization
support) to an efficient parallel implementation using traditional
locks. Comparison includes implementation details, performance
ranking and scalability issues. We aim at understanding the effects
the parallelization schemes have on the execution environment with
special focus on the memory system and memory access
characteristics.
Abstract: In this paper, 3X3 routing nodes are proposed to
provide speedup and parallel processing capability in Data Vortex
network architectures. The new design not only significantly
improves network throughput and latency, but also eliminates the
need for distributive traffic control mechanism originally embedded
among nodes and the need for nodal buffering. The cost effectiveness
is studied by a comparison study with the previously proposed 2-
input buffered networks, and considerable performance enhancement
can be achieved with similar or lower cost of hardware. Unlike
previous implementation, the network leaves small probability of
contention, therefore, the packet drop rate must be kept low for such
implementation to be feasible and attractive, and it can be achieved
with proper choice of operation conditions.
Abstract: Components of a software system may be related in a
wide variety of ways. These relationships need to be represented in
software architecture in order develop quality software. In practice, software architecture is immensely challenging, strikingly
multifaceted, extravagantly domain based, perpetually changing,
rarely cost-effective, and deceptively ambiguous. This paper analyses
relations among the major components of software systems and
argues for using several broad categories for software architecture for
assessment purposes: strongly adequate, weakly adequate and
functionally adequate software architectures among other categories.
These categories are intended for formative assessments of
architectural designs.
Abstract: A new generation of manufacturing machines
so-called MIMCA (modular and integrated machine control
architecture) capable of handling much increased complexity in
manufacturing control-systems is presented. Requirement for more
flexible and effective control systems for manufacturing machine
systems is investigated and dimensioned-which highlights a need for
improved means of coordinating and monitoring production
machinery and equipment used to- transport material. The MIMCA
supports simulation based on machine modeling, was conceived by
the authors to address the issues. Essentially MIMCA comprises an
organized unification of selected architectural frameworks and
modeling methods, which include: NISTRCS, UMC and Colored
Timed Petri nets (CTPN). The unification has been achieved; to
support the design and construction of hierarchical and distributed
machine control which realized the concurrent operation of reusable
and distributed machine control components; ability to handle
growing complexity; and support requirements for real- time control
systems. Thus MIMCA enables mapping between 'what a machine
should do' and 'how the machine does it' in a well-defined but
flexible way designed to facilitate reconfiguration of machine
systems.
Abstract: Providing Services at Home has become over the last
few years a very dynamic and promising technological domain. It is
likely to enable wide dissemination of secure and automated living
environments. We propose a methodology for identifying threats to
Services at Home Delivery systems, as well as a threat analysis
of a multi-provider Home Gateway architecture. This methodology
is based on a dichotomous positive/preventive study of the target
system: it aims at identifying both what the system must do, and
what it must not do. This approach completes existing methods with
a synthetic view of potential security flaws, thus enabling suitable
measures to be taken into account. Security implications of the
evolution of a given system become easier to deal with. A prototype
is built based on the conclusions of this analysis.
Abstract: The integrity and issues related to electrostatic performance associated with scaling Si MOSFET bulk sub 10nm channel length promotes research in new device architectures such as SOI, double gate and GAA MOSFET. In this paper, we present some novel characteristic of horizontal rectangular gate\gate all around MOSFETs with dual metal of gate we obtained using SILVACO TCAD tools. We will also exhibit some simulation results we obtained relating to the influence of some parameters variation on our structure, that having a direct impact on their threshold voltage and drain current. In addition, our TFET showed reasonable ION/IOFF ratio of (104) and low drain induced barrier lowering (DIBL) of 39 mV/V.
Abstract: This paper presents a means for reducing the torque
variation during the revolution of a vertical-axis water turbine
(VAWaterT) by increasing the blade number. For this purpose, twodimensional
CFD analyses have been performed on a straight-bladed
Darrieus-type rotor. After describing the computational model and
the relative validation procedure, a complete campaign of
simulations, based on full RANS unsteady calculations, is proposed
for a three, four and five-bladed rotor architectures, characterized by
a NACA 0025 airfoil. For each proposed rotor configuration, flow
field characteristics are investigated at several values of tip speed
ratio, allowing a quantification of the influence of blade number on
flow geometric features and dynamic quantities, such as rotor torque
and power. Finally, torque and power curves are compared for the
three analyzed architectures, achieving a quantification of the effect
of blade number on overall rotor performance.
Abstract: Evolutionary Algorithms are population-based,
stochastic search techniques, widely used as efficient global
optimizers. However, many real life optimization problems often
require finding optimal solution to complex high dimensional,
multimodal problems involving computationally very expensive
fitness function evaluations. Use of evolutionary algorithms in such
problem domains is thus practically prohibitive. An attractive
alternative is to build meta models or use an approximation of the
actual fitness functions to be evaluated. These meta models are order
of magnitude cheaper to evaluate compared to the actual function
evaluation. Many regression and interpolation tools are available to
build such meta models. This paper briefly discusses the
architectures and use of such meta-modeling tools in an evolutionary
optimization context. We further present two evolutionary algorithm
frameworks which involve use of meta models for fitness function
evaluation. The first framework, namely the Dynamic Approximate
Fitness based Hybrid EA (DAFHEA) model [14] reduces
computation time by controlled use of meta-models (in this case
approximate model generated by Support Vector Machine
regression) to partially replace the actual function evaluation by
approximate function evaluation. However, the underlying
assumption in DAFHEA is that the training samples for the metamodel
are generated from a single uniform model. This does not take
into account uncertain scenarios involving noisy fitness functions.
The second model, DAFHEA-II, an enhanced version of the original
DAFHEA framework, incorporates a multiple-model based learning
approach for the support vector machine approximator to handle
noisy functions [15]. Empirical results obtained by evaluating the
frameworks using several benchmark functions demonstrate their
efficiency
Abstract: Speech corpus is one of the major components in a
Speech Processing System where one of the primary requirements
is to recognize an input sample. The quality and details captured
in speech corpus directly affects the precision of recognition. The
current work proposes a platform for speech corpus generation using
an adaptive LMS filter and LPC cepstrum, as a part of an ANN
based Speech Recognition System which is exclusively designed to
recognize isolated numerals of Assamese language- a major language
in the North Eastern part of India. The work focuses on designing an
optimal feature extraction block and a few ANN based cooperative
architectures so that the performance of the Speech Recognition
System can be improved.