Abstract: The arithmetic operations over GF(2m) have been
extensively used in error correcting codes and public-key
cryptography schemes. Finite field arithmetic includes addition,
multiplication, division and inversion operations. Addition is very
simple and can be implemented with an extremely simple circuit.
The other operations are much more complex. The multiplication
is the most important for cryptosystems, such as the elliptic
curve cryptosystem, since computing exponentiation, division, and
computing multiplicative inverse can be performed by computing
multiplication iteratively. In this paper, we present a parallel
computation algorithm that operates Montgomery multiplication over
finite field using redundant basis. Also, based on the multiplication
algorithm, we present an efficient semi-systolic multiplier over finite
field. The multiplier has less space and time complexities compared
to related multipliers. As compared to the corresponding existing
structures, the multiplier saves at least 5% area, 50% time, and 53%
area-time (AT) complexity. Accordingly, it is well suited for VLSI
implementation and can be easily applied as a basic component for
computing complex operations over finite field, such as inversion and
division operation.
Abstract: Full search block matching algorithm is widely used for hardware implementation of motion estimators in video compression algorithms. In this paper we are proposing a new architecture, which consists of a 2D parallel processing unit and a 1D unit both working in parallel. The proposed architecture reduces both data access power and computational power which are the main causes of power consumption in integer motion estimation. It also completes the operations with nearly the same number of clock cycles as compared to a 2D systolic array architecture. In this work sum of absolute difference (SAD)-the most repeated operation in block matching, is calculated in two steps. The first step is to calculate the SAD for alternate rows by a 2D parallel unit. If the SAD calculated by the parallel unit is less than the stored minimum SAD, the SAD of the remaining rows is calculated by the 1D unit. Early termination, which stops avoidable computations has been achieved with the help of alternate rows method proposed in this paper and by finding a low initial SAD value based on motion vector prediction. Data reuse has been applied to the reference blocks in the same search area which significantly reduced the memory access.
Abstract: In this paper is investigated a possible
optimization of some linear algebra problems which can be
solved by parallel processing using the special arrays called
systolic arrays. In this paper are used some special types of
transformations for the designing of these arrays. We show
the characteristics of these arrays. The main focus is on
discussing the advantages of these arrays in parallel
computation of matrix product, with special approach to the
designing of systolic array for matrix multiplication.
Multiplication of large matrices requires a lot of
computational time and its complexity is O(n3 ). There are
developed many algorithms (both sequential and parallel) with
the purpose of minimizing the time of calculations. Systolic
arrays are good suited for this purpose. In this paper we show
that using an appropriate transformation implicates in finding
more optimal arrays for doing the calculations of this type.