Abstract: This paper presents an approach for easy creation and
classification of institutional risk profiles supporting endangerment
analysis of file formats. The main contribution of this work is the
employment of data mining techniques to support set up of the most
important risk factors. Subsequently, risk profiles employ risk factors
classifier and associated configurations to support digital preservation
experts with a semi-automatic estimation of endangerment group
for file format risk profiles. Our goal is to make use of an expert
knowledge base, accuired through a digital preservation survey
in order to detect preservation risks for a particular institution.
Another contribution is support for visualisation of risk factors for
a requried dimension for analysis. Using the naive Bayes method,
the decision support system recommends to an expert the matching
risk profile group for the previously selected institutional risk profile.
The proposed methods improve the visibility of risk factor values
and the quality of a digital preservation process. The presented
approach is designed to facilitate decision making for the preservation
of digital content in libraries and archives using domain expert
knowledge and values of file format risk profiles. To facilitate
decision-making, the aggregated information about the risk factors
is presented as a multidimensional vector. The goal is to visualise
particular dimensions of this vector for analysis by an expert and
to define its profile group. The sample risk profile calculation and
the visualisation of some risk factor dimensions is presented in the
evaluation section.
Abstract: This paper is to compare the parameter estimation of
the mean in normal distribution by Maximum Likelihood (ML),
Bayes, and Markov Chain Monte Carlo (MCMC) methods. The ML
estimator is estimated by the average of data, the Bayes method is
considered from the prior distribution to estimate Bayes estimator,
and MCMC estimator is approximated by Gibbs sampling from
posterior distribution. These methods are also to estimate a parameter
then the hypothesis testing is used to check a robustness of the
estimators. Data are simulated from normal distribution with the true
parameter of mean 2, and variance 4, 9, and 16 when the sample
sizes is set as 10, 20, 30, and 50. From the results, it can be seen
that the estimation of MLE, and MCMC are perceivably different
from the true parameter when the sample size is 10 and 20 with
variance 16. Furthermore, the Bayes estimator is estimated from the
prior distribution when mean is 1, and variance is 12 which showed
the significant difference in mean with variance 9 at the sample size
10 and 20.
Abstract: This paper presents an approach for the classification of
an unstructured format description for identification of file formats.
The main contribution of this work is the employment of data mining
techniques to support file format selection with just the unstructured
text description that comprises the most important format features for
a particular organisation. Subsequently, the file format indentification
method employs file format classifier and associated configurations to
support digital preservation experts with an estimation of required file
format. Our goal is to make use of a format specification knowledge
base aggregated from a different Web sources in order to select file
format for a particular institution. Using the naive Bayes method,
the decision support system recommends to an expert, the file format
for his institution. The proposed methods facilitate the selection of
file format and the quality of a digital preservation process. The
presented approach is meant to facilitate decision making for the
preservation of digital content in libraries and archives using domain
expert knowledge and specifications of file formats. To facilitate
decision-making, the aggregated information about the file formats is
presented as a file format vocabulary that comprises most common
terms that are characteristic for all researched formats. The goal is to
suggest a particular file format based on this vocabulary for analysis
by an expert. The sample file format calculation and the calculation
results including probabilities are presented in the evaluation section.
Abstract: Searching similar documents and document
management subjects have important place in text mining. One of the
most important parts of similar document research studies is the
process of classifying or clustering the documents. In this study, a
similar document search approach that includes discussion of out the
case of belonging to multiple categories (multiple categories
problem) has been carried. The proposed method that based on Fuzzy
Similarity Classification (FSC) has been compared with Rocchio
algorithm and naive Bayes method which are widely used in text
mining. Empirical results show that the proposed method is quite
successful and can be applied effectively. For the second stage,
multiple categories vector method based on information of categories
regarding to frequency of being seen together has been used.
Empirical results show that achievement is increased almost two
times, when proposed method is compared with classical approach.
Abstract: This paper considers inference under progressive type II censoring with a compound Rayleigh failure time distribution. The maximum likelihood (ML), and Bayes methods are used for estimating the unknown parameters as well as some lifetime parameters, namely reliability and hazard functions. We obtained Bayes estimators using the conjugate priors for two shape and scale parameters. When the two parameters are unknown, the closed-form expressions of the Bayes estimators cannot be obtained. We use Lindley.s approximation to compute the Bayes estimates. Another Bayes estimator has been obtained based on continuous-discrete joint prior for the unknown parameters. An example with the real data is discussed to illustrate the proposed method. Finally, we made comparisons between these estimators and the maximum likelihood estimators using a Monte Carlo simulation study.
Abstract: Among various HLM techniques, the Multivariate Hierarchical Linear Model (MHLM) is desirable to use, particularly when multivariate criterion variables are collected and the covariance structure has information valuable for data analysis. In order to reflect prior information or to obtain stable results when the sample size and the number of groups are not sufficiently large, the Bayes method has often been employed in hierarchical data analysis. In these cases, although the Markov Chain Monte Carlo (MCMC) method is a rather powerful tool for parameter estimation, Procedures regarding MCMC have not been formulated for MHLM. For this reason, this research presents concrete procedures for parameter estimation through the use of the Gibbs samplers. Lastly, several future topics for the use of MCMC approach for HLM is discussed.
Abstract: This research is aimed to compare the percentages of correct classification of Empirical Bayes method (EB) to Classical method when data are constructed as near normal, short-tailed and long-tailed symmetric, short-tailed and long-tailed asymmetric. The study is performed using conjugate prior, normal distribution with known mean and unknown variance. The estimated hyper-parameters obtained from EB method are replaced in the posterior predictive probability and used to predict new observations. Data are generated, consisting of training set and test set with the sample sizes 100, 200 and 500 for the binary classification. The results showed that EB method exhibited an improved performance over Classical method in all situations under study.