Parallel Vector Processing Using Multi Level Orbital DATA

Many applications use vector operations by applying
single instruction to multiple data that map to different locations
in conventional memory. Transferring data from memory is limited
by access latency and bandwidth affecting the performance gain of
vector processing. We present a memory system that makes all of
its content available to processors in time so that processors need
not to access the memory, we force each location to be available to
all processors at a specific time. The data move in different orbits
to become available to other processors in higher orbits at different
time. We use this memory to apply parallel vector operations to data
streams at first orbit level. Data processed in the first level move
to upper orbit one data element at a time, allowing a processor in
that orbit to apply another vector operation to deal with serial code
limitations inherited in all parallel applications and interleaved it with
lower level vector operations.

Authors:



References:
[1] J. Hennessy, D. A. Patterson Computer Architecture: A Quantitative
Approach Morgan Kaufmann Publishers, Inc, San Francisco, CA, 1996.
[2] Agarwal, B. H. Lim, D. Kranz and J. Kubiatowicz, April: A processor
architecture for Multiprocessing, in Proceedings of the 17th Annual
International Symposium on Computer Architectures, pages 104-114,
May 1990.
[3] D. Burger, J. R. Goodman, and A. Kagi, Memory Bandwidth of
Future Microprocessors, In Proc. 23rd Annual Int. Symp. on Computer
Architecture, (ISCA’96), pp.78-89, Philadelphia, PA, 1996.
[4] Saulsbury, A.; Nowatzyk, A. Missing the memory wall: the case for
processor memory integration, ISCA96: The 23rd Annual International
Conference on Computer Architecture, Philadelphia, PA, USA, 22-24
May 1996 p.90-101.
[5] G. Hinton, D. Sager, M. Upton, D. Boggs, D. Camean, A. Kyker,
and P. Roussel, The microarchitecture of the Pentium 4 processor, Intel
Technology Journal, 5(1), pages 1-133, Feb. 2001.
[6] Eichenberger et al., International Business Machines Corporation, Armonk,
NY (US) Vector LoadsWith Multiple Vector Elements From a Same Cache
Line in a Scattered Load Operation, US 8,904,153 B2 Dec. 2, 2014.
[7] Mekhiel, Data processing with time-based memory access, US 8914612B2
Dec 16, 2014.
[8] Introducing TAM: ”Time Based Access Memory”, Nagi Mekhiel, IEEE
Access journal, March 30, 2016. P. 1061-1073 Volume 4.