Abstract: Large scale computing infrastructures have been widely
developed with the core objective of providing a suitable platform
for high-performance and high-throughput computing. These systems
are designed to support resource-intensive and complex applications,
which can be found in many scientific and industrial areas. Currently,
large scale data-intensive applications are hindered by the high
latencies that result from the access to vastly distributed data.
Recent works have suggested that improving data locality is key to
move towards exascale infrastructures efficiently, as solutions to this
problem aim to reduce the bandwidth consumed in data transfers, and
the overheads that arise from them. There are several techniques that
attempt to move computations closer to the data. In this survey we
analyse the different mechanisms that have been proposed to provide
data locality for large scale high-performance and high-throughput
systems. This survey intends to assist scientific computing community
in understanding the various technical aspects and strategies that
have been reported in recent literature regarding data locality. As a
result, we present an overview of locality-oriented techniques, which
are grouped in four main categories: application development, task
scheduling, in-memory computing and storage platforms. Finally, the
authors include a discussion on future research lines and synergies
among the former techniques.
Abstract: The demand for higher performance graphics
continues to grow because of the incessant desire towards realism.
And, rapid advances in fabrication technology have enabled us to
build several processor cores on a single die. Hence, it is important to
develop single chip parallel architectures for such data-intensive
applications. In this paper, we propose an efficient PIM architectures
tailored for computer graphics which requires a large number of
memory accesses. We then address the two important tasks necessary
for maximally exploiting the parallelism provided by the architecture,
namely, partitioning and placement of graphic data, which affect
respectively load balances and communication costs. Under the
constraints of uniform partitioning, we develop approaches for optimal
partitioning and placement, which significantly reduce search space.
We also present heuristics for identifying near-optimal placement,
since the search space for placement is impractically large despite our
optimization. We then demonstrate the effectiveness of our partitioning
and placement approaches via analysis of example scenes; simulation
results show considerable search space reductions, and our heuristics
for placement performs close to optimal – the average ratio of
communication overheads between our heuristics and the optimal was
1.05. Our uniform partitioning showed average load-balance ratio of
1.47 for geometry processing and 1.44 for rasterization, which is
reasonable.