Abstract: This paper presents an approach for easy creation and
classification of institutional risk profiles supporting endangerment
analysis of file formats. The main contribution of this work is the
employment of data mining techniques to support set up of the most
important risk factors. Subsequently, risk profiles employ risk factors
classifier and associated configurations to support digital preservation
experts with a semi-automatic estimation of endangerment group
for file format risk profiles. Our goal is to make use of an expert
knowledge base, accuired through a digital preservation survey
in order to detect preservation risks for a particular institution.
Another contribution is support for visualisation of risk factors for
a requried dimension for analysis. Using the naive Bayes method,
the decision support system recommends to an expert the matching
risk profile group for the previously selected institutional risk profile.
The proposed methods improve the visibility of risk factor values
and the quality of a digital preservation process. The presented
approach is designed to facilitate decision making for the preservation
of digital content in libraries and archives using domain expert
knowledge and values of file format risk profiles. To facilitate
decision-making, the aggregated information about the risk factors
is presented as a multidimensional vector. The goal is to visualise
particular dimensions of this vector for analysis by an expert and
to define its profile group. The sample risk profile calculation and
the visualisation of some risk factor dimensions is presented in the
evaluation section.
Abstract: This paper presents the open science philosophy and paradigm of scientific research on how to transform classical research and innovation approaches. Open science is the practice of providing free and unrestricted online access to the products of scholarly research. Open science advocates for the immediate and unrestricted online access to published, peer-reviewed research in digital format. Open science research is made available for free in perpetuity and includes guidelines and/or licenses that communicate how researchers and readers can share and re-use the digital content. The emergence of open science has changed the scholarly research and publishing landscape, making research more broadly accessible to academic and non-academic audiences alike. Consequently, open science philosophy and its practice are discussed to cover all aspects of cyberscience in the context of research and innovation excellence for the benefit of global society.
Abstract: The purpose of this research is to improve the convenience of waiting for trains at level crossings and stations and to prevent accidents resulting from forcible entry into level crossings, by providing level crossing users and passengers with information that tells them when the next train will pass through or arrive. For this paper, we proposed methods for estimating operation by means of an average value method, variable response smoothing method, and exponential smoothing method, on the basis of open data, which has low accuracy, but for which performance schedules are distributed in real time. We then examined the accuracy of the estimations. The results showed that the application of an exponential smoothing method is valid.
Abstract: Big Data has been attracted a lot of attentions in many fields for analyzing research issues based on a large number of maternal data. Electronic Toll Collection (ETC) is one of Intelligent Transportation System (ITS) applications in Taiwan, used to record starting point, end point, distance and travel time of vehicle on the national freeway. This study, taking advantage of ETC big data, combined with urban planning theory, attempts to explore various phenomena of inter-city transportation activities. ETC, one of government's open data, is numerous, complete and quick-update. One may recall that living area has been delimited with location, population, area and subjective consciousness. However, these factors cannot appropriately reflect what people’s movement path is in daily life. In this study, the concept of "Living Area" is replaced by "Influence Range" to show dynamic and variation with time and purposes of activities. This study uses data mining with Python and Excel, and visualizes the number of trips with GIS to explore influence range of Tainan city and the purpose of trips, and discuss living area delimited in current. It dialogues between the concepts of "Central Place Theory" and "Living Area", presents the new point of view, integrates the application of big data, urban planning and transportation. The finding will be valuable for resource allocation and land apportionment of spatial planning.
Abstract: Policy makers are increasingly looking to make evidence-based decisions. Evidence-based decisions have historically used rigorous methodologies of empirical studies by research institutes, as well as less reliable immediate survey/polls often with limited sample sizes. As we move into the era of Big Data analytics, policy makers are looking to different methodologies to deliver reliable empirics in real-time. The question is not why did these people do this for the last 10 years, but why are these people doing this now, and if the this is undesirable, and how can we have an impact to promote change immediately. Big data analytics rely heavily on government data that has been released in to the public domain. The open data movement promises greater productivity and more efficient delivery of services; however, Australian government agencies remain reluctant to release their data to the general public. This paper considers the barriers to releasing government data as open data, and how these barriers might be overcome.
Abstract: Margin-Based Principle has been proposed for a long
time, it has been proved that this principle could reduce the
structural risk and improve the performance in both theoretical
and practical aspects. Meanwhile, feed-forward neural network is
a traditional classifier, which is very hot at present with a deeper
architecture. However, the training algorithm of feed-forward neural
network is developed and generated from Widrow-Hoff Principle that
means to minimize the squared error. In this paper, we propose
a new training algorithm for feed-forward neural networks based
on Margin-Based Principle, which could effectively promote the
accuracy and generalization ability of neural network classifiers
with less labelled samples and flexible network. We have conducted
experiments on four UCI open datasets and achieved good results
as expected. In conclusion, our model could handle more sparse
labelled and more high-dimension dataset in a high accuracy while
modification from old ANN method to our method is easy and almost
free of work.
Abstract: Governments collect and produce large amounts of
data. Increasingly, governments worldwide have started to implement
open data initiatives and also launch open data portals to enable the
release of these data in open and reusable formats. Therefore, a large
number of open data repositories, catalogues and portals have been
emerging in the world. The greater availability of interoperable and
linkable open government data catalyzes secondary use of such data,
so they can be used for building useful applications which leverage
their value, allow insight, provide access to government services, and
support transparency. The efficient development of successful open
data portals makes it necessary to evaluate them systematic, in order
to understand them better and assess the various types of value they
generate, and identify the required improvements for increasing this
value. Thus, the attention of this paper is directed particularly to the
field of open data portals. The main aim of this paper is to compare
the selected open data portals on the national level using content
analysis and propose a new evaluation framework, which further
improves the quality of these portals. It also establishes a set of
considerations for involving businesses and citizens to create eservices
and applications that leverage on the datasets available from
these portals.
Abstract: This paper describes the architectural design
considerations for building a new class of application, a Personal
Knowledge Integrator and a particular example a Knowledge Theatre.
It then supports this description by describing a scenario of a child
acquiring knowledge and how this process could be augmented by
the proposed architecture and design of a Knowledge Theatre. David
Merrill-s first “principles of instruction" are kept in focus to provide
a background to view the learning potential.
Abstract: SeqWord Gene Island Sniffer, a new program for
the identification of mobile genetic elements in sequences of bacterial chromosomes is presented. This program is based on the
analysis of oligonucleotide usage variations in DNA sequences. 3,518 mobile genetic elements were identified in 637 bacterial
genomes and further analyzed by sequence similarity and the
functionality of encoded proteins. The results of this study are stored in an open database http://anjie.bi.up.ac.za/geidb/geidbhome.
php). The developed computer program and the database provide the information valuable for further investigation of the
distribution of mobile genetic elements and virulence factors among bacteria. The program is available for download at www.bi.up.ac.za/SeqWord/sniffer/index.html.