Development of an Automatic Calibration Framework for Hydrologic Modelling Using Approximate Bayesian Computation

Hydrologic models are increasingly used as tools to predict stormwater quantity and quality from urban catchments. However, due to a range of practical issues, most models produce gross errors in simulating complex hydraulic and hydrologic systems. Difficulty in finding a robust approach for model calibration is one of the main issues. Though automatic calibration techniques are available, they are rarely used in common commercial hydraulic and hydrologic modelling software e.g. MIKE URBAN. This is partly due to the need for a large number of parameters and large datasets in the calibration process. To overcome this practical issue, a framework for automatic calibration of a hydrologic model was developed in R platform and presented in this paper. The model was developed based on the time-area conceptualization. Four calibration parameters, including initial loss, reduction factor, time of concentration and time-lag were considered as the primary set of parameters. Using these parameters, automatic calibration was performed using Approximate Bayesian Computation (ABC). ABC is a simulation-based technique for performing Bayesian inference when the likelihood is intractable or computationally expensive to compute. To test the performance and usefulness, the technique was used to simulate three small catchments in Gold Coast. For comparison, simulation outcomes from the same three catchments using commercial modelling software, MIKE URBAN were used. The graphical comparison shows strong agreement of MIKE URBAN result within the upper and lower 95% credible intervals of posterior predictions as obtained via ABC. Statistical validation for posterior predictions of runoff result using coefficient of determination (CD), root mean square error (RMSE) and maximum error (ME) was found reasonable for three study catchments. The main benefit of using ABC over MIKE URBAN is that ABC provides a posterior distribution for runoff flow prediction, and therefore associated uncertainty in predictions can be obtained. In contrast, MIKE URBAN just provides a point estimate. Based on the results of the analysis, it appears as though ABC the developed framework performs well for automatic calibration.

A General Framework for Knowledge Discovery Using High Performance Machine Learning Algorithms

The aim of this paper is to propose a general framework for storing, analyzing, and extracting knowledge from two-dimensional echocardiographic images, color Doppler images, non-medical images, and general data sets. A number of high performance data mining algorithms have been used to carry out this task. Our framework encompasses four layers namely physical storage, object identification, knowledge discovery, user level. Techniques such as active contour model to identify the cardiac chambers, pixel classification to segment the color Doppler echo image, universal model for image retrieval, Bayesian method for classification, parallel algorithms for image segmentation, etc., were employed. Using the feature vector database that have been efficiently constructed, one can perform various data mining tasks like clustering, classification, etc. with efficient algorithms along with image mining given a query image. All these facilities are included in the framework that is supported by state-of-the-art user interface (UI). The algorithms were tested with actual patient data and Coral image database and the results show that their performance is better than the results reported already.

Multinomial Dirichlet Gaussian Process Model for Classification of Multidimensional Data

We present probabilistic multinomial Dirichlet classification model for multidimensional data and Gaussian process priors. Here, we have considered efficient computational method that can be used to obtain the approximate posteriors for latent variables and parameters needed to define the multiclass Gaussian process classification model. We first investigated the process of inducing a posterior distribution for various parameters and latent function by using the variational Bayesian approximations and important sampling method, and next we derived a predictive distribution of latent function needed to classify new samples. The proposed model is applied to classify the synthetic multivariate dataset in order to verify the performance of our model. Experiment result shows that our model is more accurate than the other approximation methods.

Variational EM Inference Algorithm for Gaussian Process Classification Model with Multiclass and Its Application to Human Action Classification

In this paper, we propose the variational EM inference algorithm for the multi-class Gaussian process classification model that can be used in the field of human behavior recognition. This algorithm can drive simultaneously both a posterior distribution of a latent function and estimators of hyper-parameters in a Gaussian process classification model with multiclass. Our algorithm is based on the Laplace approximation (LA) technique and variational EM framework. This is performed in two steps: called expectation and maximization steps. First, in the expectation step, using the Bayesian formula and LA technique, we derive approximately the posterior distribution of the latent function indicating the possibility that each observation belongs to a certain class in the Gaussian process classification model. Second, in the maximization step, using a derived posterior distribution of latent function, we compute the maximum likelihood estimator for hyper-parameters of a covariance matrix necessary to define prior distribution for latent function. These two steps iteratively repeat until a convergence condition satisfies. Moreover, we apply the proposed algorithm with human action classification problem using a public database, namely, the KTH human action data set. Experimental results reveal that the proposed algorithm shows good performance on this data set.

An Automatic Bayesian Classification System for File Format Selection

This paper presents an approach for the classification of an unstructured format description for identification of file formats. The main contribution of this work is the employment of data mining techniques to support file format selection with just the unstructured text description that comprises the most important format features for a particular organisation. Subsequently, the file format indentification method employs file format classifier and associated configurations to support digital preservation experts with an estimation of required file format. Our goal is to make use of a format specification knowledge base aggregated from a different Web sources in order to select file format for a particular institution. Using the naive Bayes method, the decision support system recommends to an expert, the file format for his institution. The proposed methods facilitate the selection of file format and the quality of a digital preservation process. The presented approach is meant to facilitate decision making for the preservation of digital content in libraries and archives using domain expert knowledge and specifications of file formats. To facilitate decision-making, the aggregated information about the file formats is presented as a file format vocabulary that comprises most common terms that are characteristic for all researched formats. The goal is to suggest a particular file format based on this vocabulary for analysis by an expert. The sample file format calculation and the calculation results including probabilities are presented in the evaluation section.

Choosing between the Regression Correlation, the Rank Correlation, and the Correlation Curve

This paper presents a rank correlation curve. The traditional correlation coefficient is valid for both continuous variables and for integer variables using rank statistics. Since the correlation coefficient has already been established in rank statistics by Spearman, such a calculation can be extended to the correlation curve. This paper presents two survey questions. The survey collected non-continuous variables. We will show weak to moderate correlation. Obviously, one question has a negative effect on the other. A review of the qualitative literature can answer which question and why. The rank correlation curve shows which collection of responses has a positive slope and which collection of responses has a negative slope. Such information is unavailable from the flat, ”first-glance” correlation statistics.

FEM Models of Glued Laminated Timber Beams Enhanced by Bayesian Updating of Elastic Moduli

Two finite element (FEM) models are presented in this paper to address the random nature of the response of glued timber structures made of wood segments with variable elastic moduli evaluated from 3600 indentation measurements. This total database served to create the same number of ensembles as was the number of segments in the tested beam. Statistics of these ensembles were then assigned to given segments of beams and the Latin Hypercube Sampling (LHS) method was called to perform 100 simulations resulting into the ensemble of 100 deflections subjected to statistical evaluation. Here, a detailed geometrical arrangement of individual segments in the laminated beam was considered in the construction of two-dimensional FEM model subjected to in fourpoint bending to comply with the laboratory tests. Since laboratory measurements of local elastic moduli may in general suffer from a significant experimental error, it appears advantageous to exploit the full scale measurements of timber beams, i.e. deflections, to improve their prior distributions with the help of the Bayesian statistical method. This, however, requires an efficient computational model when simulating the laboratory tests numerically. To this end, a simplified model based on Mindlin’s beam theory was established. The improved posterior distributions show that the most significant change of the Young’s modulus distribution takes place in laminae in the most strained zones, i.e. in the top and bottom layers within the beam center region. Posterior distributions of moduli of elasticity were subsequently utilized in the 2D FEM model and compared with the original simulations.

Investigation on Performance of Change Point Algorithm in Time Series Dynamical Regimes and Effect of Data Characteristics

In this paper, Bayesian online inference in models of data series are constructed by change-points algorithm, which separated the observed time series into independent series and study the change and variation of the regime of the data with related statistical characteristics. variation of statistical characteristics of time series data often represent separated phenomena in the some dynamical system, like a change in state of brain dynamical reflected in EEG signal data measurement or a change in important regime of data in many dynamical system. In this paper, prediction algorithm for studying change point location in some time series data is simulated. It is verified that pattern of proposed distribution of data has important factor on simpler and smother fluctuation of hazard rate parameter and also for better identification of change point locations. Finally, the conditions of how the time series distribution effect on factors in this approach are explained and validated with different time series databases for some dynamical system.

An Optimal Bayesian Maintenance Policy for a Partially Observable System Subject to Two Failure Modes

In this paper, we present a new maintenance model for a partially observable system subject to two failure modes, namely a catastrophic failure and a failure due to the system degradation. The system is subject to condition monitoring and the degradation process is described by a hidden Markov model. A cost-optimal Bayesian control policy is developed for maintaining the system. The control problem is formulated in the semi-Markov decision process framework. An effective computational algorithm is developed, illustrated by a numerical example.

Spatial Time Series Models for Rice and Cassava Yields Based On Bayesian Linear Mixed Models

This paper proposes a linear mixed model (LMM) with spatial effects to forecast rice and cassava yields in Thailand at the same time. A multivariate conditional autoregressive (MCAR) model is assumed to present the spatial effects. A Bayesian method is used for parameter estimation via Gibbs sampling Markov Chain Monte Carlo (MCMC). The model is applied to the rice and cassava yields monthly data which have been extracted from the Office of Agricultural Economics, Ministry of Agriculture and Cooperatives of Thailand. The results show that the proposed model has better performance in most provinces in both fitting part and validation part compared to the simple exponential smoothing and conditional auto regressive models (CAR) from our previous study.

Forecasting Models for Steel Demand Uncertainty Using Bayesian Methods

 A forecasting model for steel demand uncertainty in Thailand is proposed. It consists of trend, autocorrelation, and outliers in a hierarchical Bayesian frame work. The proposed model uses a cumulative Weibull distribution function, latent first-order autocorrelation, and binary selection, to account for trend, time-varying autocorrelation, and outliers, respectively. The Gibbs sampling Markov Chain Monte Carlo (MCMC) is used for parameter estimation. The proposed model is applied to steel demand index data in Thailand. The root mean square error (RMSE), mean absolute percentage error (MAPE), and mean absolute error (MAE) criteria are used for model comparison. The study reveals that the proposed model is more appropriate than the exponential smoothing method.

Spatio-Temporal Analysis and Mapping of Malaria in Thailand

This paper proposes a GLMM with spatial and temporal effects for malaria data in Thailand. A Bayesian method is used for parameter estimation via Gibbs sampling MCMC. A conditional autoregressive (CAR) model is assumed to present the spatial effects. The temporal correlation is presented through the covariance matrix of the random effects. The malaria quarterly data have been extracted from the Bureau of Epidemiology, Ministry of Public Health of Thailand. The factors considered are rainfall and temperature. The result shows that rainfall and temperature are positively related to the malaria morbidity rate. The posterior means of the estimated morbidity rates are used to construct the malaria maps. The top 5 highest morbidity rates (per 100,000 population) are in Trat (Q3, 111.70), Chiang Mai (Q3, 104.70), Narathiwat (Q4, 97.69), Chiang Mai (Q2, 88.51), and Chanthaburi (Q3, 86.82). According to the DIC criterion, the proposed model has a better performance than the GLMM with spatial effects but without temporal terms.

Improved Dynamic Bayesian Networks Applied to Arabic on Line Characters Recognition

Work is in on line Arabic character recognition and the principal motivation is to study the Arab manuscript with on line technology. This system is a Markovian system, which one can see as like a Dynamic Bayesian Network (DBN). One of the major interests of these systems resides in the complete models training (topology and parameters) starting from training data. Our approach is based on the dynamic Bayesian Networks formalism. The DBNs theory is a Bayesians networks generalization to the dynamic processes. Among our objective, amounts finding better parameters, which represent the links (dependences) between dynamic network variables. In applications in pattern recognition, one will carry out the fixing of the structure, which obliges us to admit some strong assumptions (for example independence between some variables). Our application will relate to the Arabic isolated characters on line recognition using our laboratory database: NOUN. A neural tester proposed for DBN external optimization. The DBN scores and DBN mixed are respectively 70.24% and 62.50%, which lets predict their further development; other approaches taking account time were considered and implemented until obtaining a significant recognition rate 94.79%.

On the Parameter of the Burr Type X under Bayesian Principles

A comprehensive Bayesian analysis has been carried out in the context of informative and non-informative priors for the shape parameter of the Burr type X distribution under different symmetric and asymmetric loss functions. Elicitation of hyperparameter through prior predictive approach is also discussed. Also we derive the expression for posterior predictive distributions, predictive intervals and the credible Intervals. As an illustration, comparisons of these estimators are made through simulation study.

Effect of Progressive Type-I Right Censoring on Bayesian Statistical Inference of Simple Step–Stress Acceleration Life Testing Plan under Weibull Life Distribution

This paper discusses the effects of using progressive Type-I right censoring on the design of the Simple Step Accelerated Life testing using Bayesian approach for Weibull life products under the assumption of cumulative exposure model. The optimization criterion used in this paper is to minimize the expected pre-posterior variance of the Pth percentile time of failures. The model variables are the stress changing time and the stress value for the first step. A comparison between the conventional and the progressive Type-I right censoring is provided. The results have shown that the progressive Type-I right censoring reduces the cost of testing on the expense of the test precision when the sample size is small. Moreover, the results have shown that using strong priors or large sample size reduces the sensitivity of the test precision to the censoring proportion. Hence, the progressive Type-I right censoring is recommended in these cases as progressive Type-I right censoring reduces the cost of the test and doesn't affect the precision of the test a lot. Moreover, the results have shown that using direct or indirect priors affects the precision of the test.

A Human Activity Recognition System Based On Sensory Data Related to Object Usage

Sensor-based Activity Recognition systems usually accounts which sensors have been activated to perform an activity. The system then combines the conditional probabilities of those sensors to represent different activities and takes the decision based on that. However, the information about the sensors which are not activated may also be of great help in deciding which activity has been performed. This paper proposes an approach where the sensory data related to both usage and non-usage of objects are utilized to make the classification of activities. Experimental results also show the promising performance of the proposed method.

Sequential Partitioning Brainbow Image Segmentation Using Bayesian

This paper proposes a data-driven, biology-inspired neural segmentation method of 3D drosophila Brainbow images. We use Bayesian Sequential Partitioning algorithm for probabilistic modeling, which can be used to detect somas and to eliminate crosstalk effects. This work attempts to develop an automatic methodology for neuron image segmentation, which nowadays still lacks a complete solution due to the complexity of the image. The proposed method does not need any predetermined, risk-prone thresholds, since biological information is inherently included inside the image processing procedure. Therefore, it is less sensitive to variations in neuron morphology; meanwhile, its flexibility would be beneficial for tracing the intertwining structure of neurons.

On Bayesian Analysis of Failure Rate under Topp Leone Distribution using Complete and Censored Samples

The article is concerned with analysis of failure rate (shape parameter) under the Topp Leone distribution using a Bayesian framework. Different loss functions and a couple of noninformative priors have been assumed for posterior estimation. The posterior predictive distributions have also been derived. A simulation study has been carried to compare the performance of different estimators. A real life example has been used to illustrate the applicability of the results obtained. The findings of the study suggest  that the precautionary loss function based on Jeffreys prior and singly type II censored samples can effectively be employed to obtain the Bayes estimate of the failure rate under Topp Leone distribution.

A Face-to-Face Education Support System Capable of Lecture Adaptation and Q&A Assistance Based On Probabilistic Inference

Keys to high-quality face-to-face education are ensuring flexibility in the way lectures are given, and providing care and responsiveness to learners. This paper describes a face-to-face education support system that is designed to raise the satisfaction of learners and reduce the workload on instructors. This system consists of a lecture adaptation assistance part, which assists instructors in adapting teaching content and strategy, and a Q&A assistance part, which provides learners with answers to their questions. The core component of the former part is a “learning achievement map", which is composed of a Bayesian network (BN). From learners- performance in exercises on relevant past lectures, the lecture adaptation assistance part obtains information required to adapt appropriately the presentation of the next lecture. The core component of the Q&A assistance part is a case base, which accumulates cases consisting of questions expected from learners and answers to them. The Q&A assistance part is a case-based search system equipped with a search index which performs probabilistic inference. A prototype face-to-face education support system has been built, which is intended for the teaching of Java programming, and this approach was evaluated using this system. The expected degree of understanding of each learner for a future lecture was derived from his or her performance in exercises on past lectures, and this expected degree of understanding was used to select one of three adaptation levels. A model for determining the adaptation level most suitable for the individual learner has been identified. An experimental case base was built to examine the search performance of the Q&A assistance part, and it was found that the rate of successfully finding an appropriate case was 56%.

Integrating Low and High Level Object Recognition Steps by Probabilistic Networks

In pattern recognition applications the low level segmentation and the high level object recognition are generally considered as two separate steps. The paper presents a method that bridges the gap between the low and the high level object recognition. It is based on a Bayesian network representation and network propagation algorithm. At the low level it uses hierarchical structure of quadratic spline wavelet image bases. The method is demonstrated for a simple circuit diagram component identification problem.