A Parallel Approach for 3D-Variational Data Assimilation on GPUs in Ocean Circulation Models
This work is the first dowel in a rather wide research
activity in collaboration with Euro Mediterranean Center for Climate
Changes, aimed at introducing scalable approaches in Ocean
Circulation Models. We discuss designing and implementation of
a parallel algorithm for solving the Variational Data Assimilation
(DA) problem on Graphics Processing Units (GPUs). The algorithm
is based on the fully scalable 3DVar DA model, previously proposed
by the authors, which uses a Domain Decomposition approach
(we refer to this model as the DD-DA model). We proceed with
an incremental porting process consisting of 3 distinct stages:
requirements and source code analysis, incremental development of
CUDA kernels, testing and optimization. Experiments confirm the
theoretic performance analysis based on the so-called scale up factor
demonstrating that the DD-DA model can be suitably mapped on
GPU architectures.
[1] L. Carracciuolo, L. D’Amore, A. Murli, Towards a parallel component
for imaging in PETSc programming environment: A case study in 3-D
echocardiography, Parallel Computing, Vol. 32, (1), 2006, pp. 67-83.
[2] L. D’Amore, R. Arcucci, L. Marcellino and A. Murli, HPC
computation issues of the incremental 3D variational data assimilation
scheme in OceanVar software - Journal of Numerical Analysis,
Industrial and Applied Mathematics, vol. 7, no. 3-4, 2012, pp. 91-105.
[3] L. D’Amore, R. Arcucci, L. Marcellino, A. Murli - A Parallel
Three-dimensional Variational Data Assimilation Scheme - Numerical
Analysis and Applied Mathematics, AIP Conference Proccedings, Vol.
1389, 2011, pp. 1829-1831.
[4] L. D’Amore, R. Arcucci, L. Carracciuolo, A. Murli - DD-OceanVar:
a Domain Decomposition fully parallel Data Assimilation software
in Mediterranean Sea - Procedia Computer Science 18, 2013, pp.
1235-1244.
[5] L. D’Amore, R. Arcucci, L. Carracciuolo, A. Murli - A Scalable
Approach for Variational Data Assimilation - Journal of Scientific
Computing, Vol. 61, 2014, pp. 239-257.
[6] L. D’Amore, D. Casaburi, A. Galletti, L. Marcellino, A. Murli -
Integration of emerging computer technologies for an efficient image
sequences analysis, Vol. 18, (4), 2011, pp. 365-378.
[7] L. D’Amore, A. Murli, V. Boccia, L. Carracciuolo - Insertion of
PETSc in the NEMO stack software Driving NEMO towards Exascale
Computing, High Performance Computing and Simulation (HPCS),
July 2014, pp. 724 - 731, DOI:10.1109/HPCSim.2014.6903761.
[8] L. D’Amore, G. Laccetti, D. Romano, G. Scotti, A. Murli - Towards
a parallel component in a GPU-CUDA environment: a case study
with the L-BFGS Harwell routine - International Journal of Computer
Mathematics, DOI: 10.1080/00207160.2014.899589, 2015, Vol 92 (1),
pp. 59-76.
[9] L. D’Amore , D. Casaburi, A. Galletti, L. Marcellino, A. Murli -
Integration of emerging computer technologies for an efficient image
sequences analysis - Integrated Computer-Aided Engineering, Vol. 18,
(4), 2011, pp. 365-378. [10] S. Dobricic, N. Pinardi, An oceanographic three-dimensional
variational data assimilation scheme - Ocean Modelling 22, 2008, pp.
89-105.
[11] S.A. Haben, A.S. Lawless,N.K. Nichols: Conditioning of the 3DVAR
Data Assimilation Problem, Mathematics Report 3/2009. Department
of Mathematics, University of Reading (2009)
[12] M. Harris - How to Implement Performance Metrics in CUDA C/C++
- November 7 2012, NVIDIA Web Site.
[13] NVIDIA, NVIDIA Compute Unified Device Architecture programming
guide version 2.3, NVIDIA Developer Web Site, (2009). Available at
http://developer.download.nvidia.com.
[14] NVIDIA, NVIDIA CUDA Programming Guide 3.1.1, 2010.
[15] E. Kalnay - Atmospheric Modeling, Data Assimilation and
Predictability. - Cambridge University Press, Cambridge, MA (2003)
[16] Khronos OpenCL Working Group, The OpenCL Specification: Version
797 1.1, 2010.
[17] The NEMO System Home Page - http://www.nemo-ocean.eu
[18] M. Showerman, J. Enos, A. Pant, V. Kindratenko, C. Steffen, R.
Pennington, W.M. Hwu - QP: A heterogeneous multi-accelerator
cluster - Proceedings of the 10th LCD International Conference on
High-Performance Clustered Computing, Boulder, Colorado, 2009.
[19] TOP500 Supercomputer Site. 2014. TOP500 Supercomputer
Novermeber 2014 List. http://www.top500.org/lists/2014/11
[20] C. Zhu, R.H. Byrd, P. Lu, and J. Nocedal, Algorithm 778: L-BFGS-B:
Fortran subroutines for large-scale bound constrained optimization,
ACM Trans. Math. Softw. 23, 1997, pp. 550-560.
[1] L. Carracciuolo, L. D’Amore, A. Murli, Towards a parallel component
for imaging in PETSc programming environment: A case study in 3-D
echocardiography, Parallel Computing, Vol. 32, (1), 2006, pp. 67-83.
[2] L. D’Amore, R. Arcucci, L. Marcellino and A. Murli, HPC
computation issues of the incremental 3D variational data assimilation
scheme in OceanVar software - Journal of Numerical Analysis,
Industrial and Applied Mathematics, vol. 7, no. 3-4, 2012, pp. 91-105.
[3] L. D’Amore, R. Arcucci, L. Marcellino, A. Murli - A Parallel
Three-dimensional Variational Data Assimilation Scheme - Numerical
Analysis and Applied Mathematics, AIP Conference Proccedings, Vol.
1389, 2011, pp. 1829-1831.
[4] L. D’Amore, R. Arcucci, L. Carracciuolo, A. Murli - DD-OceanVar:
a Domain Decomposition fully parallel Data Assimilation software
in Mediterranean Sea - Procedia Computer Science 18, 2013, pp.
1235-1244.
[5] L. D’Amore, R. Arcucci, L. Carracciuolo, A. Murli - A Scalable
Approach for Variational Data Assimilation - Journal of Scientific
Computing, Vol. 61, 2014, pp. 239-257.
[6] L. D’Amore, D. Casaburi, A. Galletti, L. Marcellino, A. Murli -
Integration of emerging computer technologies for an efficient image
sequences analysis, Vol. 18, (4), 2011, pp. 365-378.
[7] L. D’Amore, A. Murli, V. Boccia, L. Carracciuolo - Insertion of
PETSc in the NEMO stack software Driving NEMO towards Exascale
Computing, High Performance Computing and Simulation (HPCS),
July 2014, pp. 724 - 731, DOI:10.1109/HPCSim.2014.6903761.
[8] L. D’Amore, G. Laccetti, D. Romano, G. Scotti, A. Murli - Towards
a parallel component in a GPU-CUDA environment: a case study
with the L-BFGS Harwell routine - International Journal of Computer
Mathematics, DOI: 10.1080/00207160.2014.899589, 2015, Vol 92 (1),
pp. 59-76.
[9] L. D’Amore , D. Casaburi, A. Galletti, L. Marcellino, A. Murli -
Integration of emerging computer technologies for an efficient image
sequences analysis - Integrated Computer-Aided Engineering, Vol. 18,
(4), 2011, pp. 365-378. [10] S. Dobricic, N. Pinardi, An oceanographic three-dimensional
variational data assimilation scheme - Ocean Modelling 22, 2008, pp.
89-105.
[11] S.A. Haben, A.S. Lawless,N.K. Nichols: Conditioning of the 3DVAR
Data Assimilation Problem, Mathematics Report 3/2009. Department
of Mathematics, University of Reading (2009)
[12] M. Harris - How to Implement Performance Metrics in CUDA C/C++
- November 7 2012, NVIDIA Web Site.
[13] NVIDIA, NVIDIA Compute Unified Device Architecture programming
guide version 2.3, NVIDIA Developer Web Site, (2009). Available at
http://developer.download.nvidia.com.
[14] NVIDIA, NVIDIA CUDA Programming Guide 3.1.1, 2010.
[15] E. Kalnay - Atmospheric Modeling, Data Assimilation and
Predictability. - Cambridge University Press, Cambridge, MA (2003)
[16] Khronos OpenCL Working Group, The OpenCL Specification: Version
797 1.1, 2010.
[17] The NEMO System Home Page - http://www.nemo-ocean.eu
[18] M. Showerman, J. Enos, A. Pant, V. Kindratenko, C. Steffen, R.
Pennington, W.M. Hwu - QP: A heterogeneous multi-accelerator
cluster - Proceedings of the 10th LCD International Conference on
High-Performance Clustered Computing, Boulder, Colorado, 2009.
[19] TOP500 Supercomputer Site. 2014. TOP500 Supercomputer
Novermeber 2014 List. http://www.top500.org/lists/2014/11
[20] C. Zhu, R.H. Byrd, P. Lu, and J. Nocedal, Algorithm 778: L-BFGS-B:
Fortran subroutines for large-scale bound constrained optimization,
ACM Trans. Math. Softw. 23, 1997, pp. 550-560.
@article{"International Journal of Information, Control and Computer Sciences:69934", author = "Rossella Arcucci and Luisa D’Amore and Simone Celestino and Giuseppe Scotti and Giuliano Laccetti", title = "A Parallel Approach for 3D-Variational Data Assimilation on GPUs in Ocean Circulation Models", abstract = "This work is the first dowel in a rather wide research
activity in collaboration with Euro Mediterranean Center for Climate
Changes, aimed at introducing scalable approaches in Ocean
Circulation Models. We discuss designing and implementation of
a parallel algorithm for solving the Variational Data Assimilation
(DA) problem on Graphics Processing Units (GPUs). The algorithm
is based on the fully scalable 3DVar DA model, previously proposed
by the authors, which uses a Domain Decomposition approach
(we refer to this model as the DD-DA model). We proceed with
an incremental porting process consisting of 3 distinct stages:
requirements and source code analysis, incremental development of
CUDA kernels, testing and optimization. Experiments confirm the
theoretic performance analysis based on the so-called scale up factor
demonstrating that the DD-DA model can be suitably mapped on
GPU architectures.", keywords = "Data Assimilation, Parallel Algorithm, GPU
architectures, Ocean Models.", volume = "9", number = "5", pages = "1220-7", }