Abstract: Biclustering is the way of two-dimensional data
analysis. For several years it became possible to express such issue
in terms of Boolean reasoning, for processing continuous, discrete
and binary data. The mathematical backgrounds of such approach —
proved ability of induction of exact and inclusion–maximal biclusters
fulfilling assumed criteria — are strong advantages of the method.
Unfortunately, the core of the method has quite high computational
complexity. In the paper the basics of Boolean reasoning approach
for biclustering are presented. In such context the problems of
computation parallelization are risen.
Abstract: Since its inception, predictive analysis has revolutionized the IT industry through its robustness and decision-making facilities. It involves the application of a set of data processing techniques and algorithms in order to create predictive models. Its principle is based on finding relationships between explanatory variables and the predicted variables. Past occurrences are exploited to predict and to derive the unknown outcome. With the advent of big data, many studies have suggested the use of predictive analytics in order to process and analyze big data. Nevertheless, they have been curbed by the limits of classical methods of predictive analysis in case of a large amount of data. In fact, because of their volumes, their nature (semi or unstructured) and their variety, it is impossible to analyze efficiently big data via classical methods of predictive analysis. The authors attribute this weakness to the fact that predictive analysis algorithms do not allow the parallelization and distribution of calculation. In this paper, we propose to extend the predictive analysis algorithm, Classification And Regression Trees (CART), in order to adapt it for big data analysis. The major changes of this algorithm are presented and then a version of the extended algorithm is defined in order to make it applicable for a huge quantity of data.
Abstract: Urban flooding resulting from a sudden release of
water due to dam-break or excessive rainfall is a serious threatening
environment hazard, which causes loss of human life and large
economic losses. Anticipating floods before they occur could
minimize human and economic losses through the implementation
of appropriate protection, provision, and rescue plans. This work
reports on the numerical modelling of flash flood propagation
in urban areas after an excessive rainfall event or dam-break.
A two-dimensional (2D) depth-averaged shallow water model is
used with a refined unstructured grid of triangles for representing
the urban area topography. The 2D shallow water equations are
solved using a second-order well-balanced discontinuous Galerkin
scheme. Theoretical test case and three flood events are described
to demonstrate the potential benefits of the scheme: (i) wetting and
drying in a parabolic basin (ii) flash flood over a physical model of
the urbanized Toce River valley in Italy; (iii) wave propagation on
the Reyran river valley in consequence of the Malpasset dam-break
in 1959 (France); and (iv) dam-break flood in October 1982 at the
town of Sumacarcel (Spain). The capability of the scheme is also
verified against alternative models. Computational results compare
well with recorded data and show that the scheme is at least as
efficient as comparable second-order finite volume schemes, with
notable efficiency speedup due to parallelization.
Abstract: Even though past, current and future trends suggest that multicore and cloud computing systems are increasingly prevalent/ubiquitous, this class of parallel systems is nonetheless underutilized, in general, and barely used for research on employing parallel Delaunay triangulation for parallel surface modeling and generation, in particular. The performances, of actual/physical and virtual/cloud multicore systems/machines, at executing various algorithms, which implement various parallelization strategies of the incremental insertion technique of the Delaunay triangulation algorithm, were evaluated. T-tests were run on the data collected, in order to determine whether various performance metrics differences (including execution time, speedup and efficiency) were statistically significant. Results show that the actual machine is approximately twice faster than the virtual machine at executing the same programs for the various parallelization strategies. Results, which furnish the scalability behaviors of the various parallelization strategies, also show that some of the differences between the performances of these systems, during different runs of the algorithms on the systems, were statistically significant. A few pseudo superlinear speedup results, which were computed from the raw data collected, are not true superlinear speedup values. These pseudo superlinear speedup values, which arise as a result of one way of computing speedups, disappear and give way to asymmetric speedups, which are the accurate kind of speedups that occur in the experiments performed.
Abstract: Many organizations are faced with the challenge of how to analyze and build Machine Learning models using their sensitive telemetry data. In this paper, we discuss how users can leverage the power of R without having to move their big data around as well as a cloud based solution for organizations willing to host their data in the cloud. By using ScaleR technology to benefit from parallelization and remote computing or R Services on premise or in the cloud, users can leverage the power of R at scale without having to move their data around.
Abstract: There are about 1% of the world population suffering
from the hidden disability known as epilepsy and major developing
countries are not fully equipped to counter this problem. In order to
reduce the inconvenience and danger of epilepsy, different methods
have been researched by using a artificial neural network (ANN)
classification to distinguish epileptic waveforms from normal brain
waveforms. This paper outlines the aim of achieving massive
ANN parallelization through a dedicated hardware using bit-serial
processing. The design of this bit-serial Neural Processing Element
(NPE) is presented which implements the functionality of a complete
neuron using variable accuracy. The proposed design has been tested
taking into consideration non-idealities of a hardware ANN. The NPE
consists of a bit-serial multiplier which uses only 16 logic elements
on an Altera Cyclone IV FPGA and a bit-serial ALU as well as a
look-up table. Arrays of NPEs can be driven by a single controller
which executes the neural processing algorithm. In conclusion, the
proposed compact NPE design allows the construction of complex
hardware ANNs that can be implemented in a portable equipment
that suits the needs of a single epileptic patient in his or her daily
activities to predict the occurrences of impending tonic conic seizures.
Abstract: This paper describes the Message Passing Interface
(MPI) implementation of ADETRAN language, and its evaluation
on SX-ACE supercomputers. ADETRAN language includes pdo
statement that specifies the data distribution and parallel computations
and pass statement that specifies the redistribution of arrays. Two
methods for implementation of pass statement are discussed and the
performance evaluation using Splitting-Up CG method is presented.
The effectiveness of the parallelization is evaluated and the advantage
of one dimensional distribution is empirically confirmed by using the
results of experiments.
Abstract: In the Solid-State-Drive (SSD) performance, whether
the data has been well parallelized is an important factor. SSD
parallelization is affected by allocation scheme and it is directly
connected to SSD performance. There are dynamic allocation and
static allocation in representative allocation schemes. Dynamic
allocation is more adaptive in exploiting write operation parallelism,
while static allocation is better in read operation parallelism.
Therefore, it is hard to select the appropriate allocation scheme when
the workload is mixed read and write operations. We simulated
conditions on a few mixed data patterns and analyzed the results to
help the right choice for better performance. As the results, if data
arrival interval is long enough prior operations to be finished and
continuous read intensive data environment static allocation is more
suitable. Dynamic allocation performs the best on write performance
and random data patterns.
Abstract: The main goal of this article is to describe the online
flood monitoring and prediction system Floreon+ primarily developed
for the Moravian-Silesian region in the Czech Republic and the basic
process it uses for running automatic rainfall-runoff and
hydrodynamic simulations along with their calibration and
uncertainty modeling. It takes a long time to execute such process
sequentially, which is not acceptable in the online scenario, so the use
of a high performance computing environment is proposed for all
parts of the process to shorten their duration. Finally, a case study on
the Ostravice River catchment is presented that shows actual
durations and their gain from the parallel implementation.
Abstract: Scale Invariant Feature Transform (SIFT) has been
widely applied, but extracting SIFT feature is complicated and
time-consuming. In this paper, to meet the demand of the real-time
applications, SIFT is parallelized and optimized on cluster system,
which is named pSIFT. Redundancy storage and communication are
used for boundary data to improve the performance, and before
representation of feature descriptor, data reallocation is adopted to
keep load balance in pSIFT. Experimental results show that pSIFT
achieves good speedup and scalability.
Abstract: A strip domain decomposition parallel algorithm for fast direct Poisson solver is presented on a 3D Cartesian staggered grid. The parallel algorithm follows the principles of sequential algorithm for fast direct Poisson solver. Both Dirichlet and Neumann boundary conditions are addressed. Several test cases are likewise addressed in order to shed light on accuracy and efficiency in the strip domain parallelization algorithm. Actually the current implementation shows a very high efficiency when dealing with a large grid mesh up to 3.6 * 109 under massive parallel approach, which explicitly demonstrates that the proposed algorithm is ready for massive parallel computing.
Abstract: The Partitioned Global Address Space (PGAS) programming
paradigm offers ease-of-use in expressing parallelism
through a global shared address space while emphasizing performance
by providing locality awareness through the partitioning of
this address space. Therefore, the interest in PGAS programming
languages is growing and many new languages have emerged and
are becoming ubiquitously available on nearly all modern parallel
architectures. Recently, new parallel machines with multiple cores
are designed for targeting high performance applications. Most of the
efforts have gone into benchmarking but there are a few examples of
real high performance applications running on multicore machines.
In this paper, we present and evaluate a parallelization technique
for implementing a local DNA sequence alignment algorithm using
a PGAS based language, UPC (Unified Parallel C) on a chip
multithreading architecture, the UltraSPARC T1.
Abstract: In this paper we describe the design and implementation of a parallel algorithm for data assimilation with ensemble Kalman filter (EnKF) for oil reservoir history matching problem. The use of large number of observations from time-lapse seismic leads to a large turnaround time for the analysis step, in addition to the time consuming simulations of the realizations. For efficient parallelization it is important to consider parallel computation at the analysis step. Our experiments show that parallelization of the analysis step in addition to the forecast step has good scalability, exploiting the same set of resources with some additional efforts.
Abstract: In today-s new technology era, cluster has become a
necessity for the modern computing and data applications since many
applications take more time (even days or months) for computation.
Although after parallelization, computation speeds up, still time
required for much application can be more. Thus, reliability of the
cluster becomes very important issue and implementation of fault
tolerant mechanism becomes essential. The difficulty in designing a
fault tolerant cluster system increases with the difficulties of various
failures. The most imperative obsession is that the algorithm, which
avoids a simple failure in a system, must tolerate the more severe
failures. In this paper, we implemented the theory of watchdog timer
in a parallel environment, to take care of failures. Implementation of
simple algorithm in our project helps us to take care of different
types of failures; consequently, we found that the reliability of this
cluster improves.
Abstract: Many studies have shown that parallelization decreases efficiency [1], [2]. There are many reasons for these decrements. This paper investigates those which appear in the context of parallel data integration. Integration processes generally cannot be allocated to packages of identical size (i. e. tasks of identical complexity). The reason for this is unknown heterogeneous input data which result in variable task lengths. Process delay is defined by the slowest processing node. It leads to a detrimental effect on the total processing time. With a real world example, this study will show that while process delay does initially increase with the introduction of more nodes it ultimately decreases again after a certain point. The example will make use of the cloud computing platform Hadoop and be run inside Amazon-s EC2 compute cloud. A stochastic model will be set up which can explain this effect.
Abstract: In a previous work, we presented the numerical
solution of the two dimensional second order telegraph partial
differential equation discretized by the centred and rotated five-point
finite difference discretizations, namely the explicit group (EG) and
explicit decoupled group (EDG) iterative methods, respectively. In
this paper, we utilize a domain decomposition algorithm on these
group schemes to divide the tasks involved in solving the same
equation. The objective of this study is to describe the development
of the parallel group iterative schemes under OpenMP programming
environment as a way to reduce the computational costs of the
solution processes using multicore technologies. A detailed
performance analysis of the parallel implementations of points and
group iterative schemes will be reported and discussed.
Abstract: Protein 3D structure prediction has always been an
important research area in bioinformatics. In particular, the
prediction of secondary structure has been a well-studied research
topic. Despite the recent breakthrough of combining multiple
sequence alignment information and artificial intelligence algorithms
to predict protein secondary structure, the Q3 accuracy of various
computational prediction algorithms rarely has exceeded 75%. In a
previous paper [1], this research team presented a rule-based method
called RT-RICO (Relaxed Threshold Rule Induction from Coverings)
to predict protein secondary structure. The average Q3 accuracy on
the sample datasets using RT-RICO was 80.3%, an improvement
over comparable computational methods. Although this demonstrated
that RT-RICO might be a promising approach for predicting
secondary structure, the algorithm-s computational complexity and
program running time limited its use. Herein a parallelized
implementation of a slightly modified RT-RICO approach is
presented. This new version of the algorithm facilitated the testing of
a much larger dataset of 396 protein domains [2]. Parallelized RTRICO
achieved a Q3 score of 74.6%, which is higher than the
consensus prediction accuracy of 72.9% that was achieved for the
same test dataset by a combination of four secondary structure
prediction methods [2].
Abstract: A one-step conservative level set method, combined with a global mass correction method, is developed in this study to simulate the incompressible two-phase flows. The present framework do not need to solve the conservative level set scheme at two separated steps, and the global mass can be exactly conserved. The present method is then more efficient than two-step conservative level set scheme. The dispersion-relation-preserving schemes are utilized for the advection terms. The pressure Poisson equation solver is applied to GPU computation using the pCDR library developed by National Center for High-Performance Computing, Taiwan. The SMP parallelization is used to accelerate the rest of calculations. Three benchmark problems were done for the performance evaluation. Good agreements with the referenced solutions are demonstrated for all the investigated problems.
Abstract: The deterministic quantum transfer-matrix (QTM)
technique and its mathematical background are presented. This
important tool in computational physics can be applied to a class of
the real physical low-dimensional magnetic systems described by the
Heisenberg hamiltonian which includes the macroscopic molecularbased
spin chains, small size magnetic clusters embedded in some
supramolecules and other interesting compounds. Using QTM, the
spin degrees of freedom are accurately taken into account, yielding
the thermodynamical functions at finite temperatures.
In order to test the application for the susceptibility calculations to
run in the parallel environment, the speed-up and efficiency of
parallelization are analyzed on our platform SGI Origin 3800 with
p = 128 processor units. Using Message Parallel Interface (MPI)
system libraries we find the efficiency of the code of 94% for
p = 128 that makes our application highly scalable.
Abstract: Various mechanisms providing mutual exclusion and
thread synchronization can be used to support parallel processing
within a single computer. Instead of using locks, semaphores, barriers
or other traditional approaches in this paper we focus on alternative
ways for making better use of modern multithreaded architectures
and preparing hash tables for concurrent accesses. Hash structures
will be used to demonstrate and compare two entirely different
approaches (rule based cooperation and hardware synchronization
support) to an efficient parallel implementation using traditional
locks. Comparison includes implementation details, performance
ranking and scalability issues. We aim at understanding the effects
the parallelization schemes have on the execution environment with
special focus on the memory system and memory access
characteristics.