Abstract: Policy makers are increasingly looking to make evidence-based decisions. Evidence-based decisions have historically used rigorous methodologies of empirical studies by research institutes, as well as less reliable immediate survey/polls often with limited sample sizes. As we move into the era of Big Data analytics, policy makers are looking to different methodologies to deliver reliable empirics in real-time. The question is not why did these people do this for the last 10 years, but why are these people doing this now, and if the this is undesirable, and how can we have an impact to promote change immediately. Big data analytics rely heavily on government data that has been released in to the public domain. The open data movement promises greater productivity and more efficient delivery of services; however, Australian government agencies remain reluctant to release their data to the general public. This paper considers the barriers to releasing government data as open data, and how these barriers might be overcome.
Abstract: Mining big data represents a big challenge nowadays. Many types of research are concerned with mining massive amounts of data and big data streams. Mining big data faces a lot of challenges including scalability, speed, heterogeneity, accuracy, provenance and privacy. In telecommunication industry, mining big data is like a mining for gold; it represents a big opportunity and maximizing the revenue streams in this industry. This paper discusses the characteristics of big data (volume, variety, velocity and veracity), data mining techniques and tools for handling very large data sets, mining big data in telecommunication and the benefits and opportunities gained from them.
Abstract: Recently, many users have begun to frequently share
their opinions on diverse issues using various social media. Therefore,
numerous governments have attempted to establish or improve
national policies according to the public opinions captured from
various social media. In this paper, we indicate several limitations of
the traditional approaches to analyze public opinion on science and
technology and provide an alternative methodology to overcome these
limitations. First, we distinguish between the science and technology
analysis phase and the social issue analysis phase to reflect the fact that
public opinion can be formed only when a certain science and
technology is applied to a specific social issue. Next, we successively
apply a start list and a stop list to acquire clarified and interesting
results. Finally, to identify the most appropriate documents that fit
with a given subject, we develop a new logical filter concept that
consists of not only mere keywords but also a logical relationship
among the keywords. This study then analyzes the possibilities for the
practical use of the proposed methodology thorough its application to
discover core issues and public opinions from 1,700,886 documents
comprising SNS, blogs, news, and discussions.
Abstract: In this paper a very simple and effective user
administration view of computing clusters systems is implemented in
order of friendly provide the configuration and monitoring of
distributed application executions. The user view, the administrator
view, and an internal control module create an illusionary
management environment for better system usability. The
architecture, properties, performance, and the comparison with others
software for cluster management are briefly commented.
Abstract: Recently, numerous documents including large
volumes of unstructured data and text have been created because of the
rapid increase in the use of social media and the Internet. Usually,
these documents are categorized for the convenience of users. Because
the accuracy of manual categorization is not guaranteed, and such
categorization requires a large amount of time and incurs huge costs.
Many studies on automatic categorization have been conducted to help
mitigate the limitations of manual categorization. Unfortunately, most
of these methods cannot be applied to categorize complex documents
with multiple topics because they work on the assumption that
individual documents can be categorized into single categories only.
Therefore, to overcome this limitation, some studies have attempted to
categorize each document into multiple categories. However, the
learning process employed in these studies involves training using a
multi-categorized document set. These methods therefore cannot be
applied to the multi-categorization of most documents unless
multi-categorized training sets using traditional multi-categorization
algorithms are provided. To overcome this limitation, in this study, we
review our novel methodology for extending the category of a
single-categorized document to multiple categorizes, and then
introduce a survey-based verification scenario for estimating the
accuracy of our automatic categorization methodology.
Abstract: In this paper, we introduce an NLG application for the automatic creation of ready-to-publish texts from big data. The resulting fully automatic generated news stories have a high resemblance to the style in which the human writer would draw up such a story. Topics include soccer games, stock exchange market reports, and weather forecasts. Each generated text is unique. Readyto-publish stories written by a computer application can help humans to quickly grasp the outcomes of big data analyses, save timeconsuming pre-formulations for journalists and cater to rather small audiences by offering stories that would otherwise not exist.
Abstract: The system for analyzing and eliciting public
grievances serves its main purpose to receive and process all sorts of
complaints from the public and respond to users. Due to the more
number of complaint data becomes big data which is difficult to store
and process. The proposed system uses HDFS to store the big data
and uses MapReduce to process the big data. The concept of cache
was applied in the system to provide immediate response and timely
action using big data analytics. Cache enabled big data increases the
response time of the system. The unstructured data provided by the
users are efficiently handled through map reduce algorithm. The
processing of complaints takes place in the order of the hierarchy of
the authority. The drawbacks of the traditional database system used
in the existing system are set forth by our system by using Cache
enabled Hadoop Distributed File System. MapReduce framework
codes have the possible to leak the sensitive data through
computation process. We propose a system that add noise to the
output of the reduce phase to avoid signaling the presence of
sensitive data. If the complaints are not processed in the ample time,
then automatically it is forwarded to the higher authority. Hence it
ensures assurance in processing. A copy of the filed complaint is sent
as a digitally signed PDF document to the user mail id which serves
as a proof. The system report serves to be an essential data while
making important decisions based on legislation.
Abstract: This paper presents a novel algorithm for secure,
reliable and flexible transmission of big data in two hop wireless
networks using cooperative jamming scheme. Two hop wireless
networks consist of source, relay and destination nodes. Big data has
to transmit from source to relay and from relay to destination by
deploying security in physical layer. Cooperative jamming scheme
determines transmission of big data in more secure manner by
protecting it from eavesdroppers and malicious nodes of unknown
location. The novel algorithm that ensures secure and energy balance
transmission of big data, includes selection of data transmitting
region, segmenting the selected region, determining probability ratio
for each node (capture node, non-capture and eavesdropper node) in
every segment, evaluating the probability using binary based
evaluation. If it is secure transmission resume with the two- hop
transmission of big data, otherwise prevent the attackers by
cooperative jamming scheme and transmit the data in two-hop
transmission.
Abstract: Big Data and analytics have gained a huge momentum
in recent years. Big Data feeds into the field of Learning Analytics
(LA) that may allow academic institutions to better understand the
learners’ needs and proactively address them. Hence, it is important
to have an understanding of Big Data and its applications. The
purpose of this descriptive paper is to provide an overview of Big
Data, the technologies used in Big Data, and some of the applications
of Big Data in education. Additionally, it discusses some of the
concerns related to Big Data and current research trends. While Big
Data can provide big benefits, it is important that institutions
understand their own needs, infrastructure, resources, and limitation
before jumping on the Big Data bandwagon.
Abstract: This paper describes the problem of building secure
computational services for encrypted information in the Cloud
Computing without decrypting the encrypted data; therefore, it meets
the yearning of computational encryption algorithmic aspiration
model that could enhance the security of big data for privacy,
confidentiality, availability of the users. The cryptographic model
applied for the computational process of the encrypted data is the
Fully Homomorphic Encryption Scheme. We contribute a theoretical
presentations in a high-level computational processes that are based
on number theory and algebra that can easily be integrated and
leveraged in the Cloud computing with detail theoretic mathematical
concepts to the fully homomorphic encryption models. This
contribution enhances the full implementation of big data analytics
based cryptographic security algorithm.
Abstract: This study aims to investigate the possibility of crime
prevention through CCTV by analyzing the appropriateness of the
CCTV location, whether it is installed in the hotspot of crime-prone
areas, and exploring the crime prevention effect and transition effect.
The real crime and CCTV locations of case city were converted into
the spatial data by using GIS. The data was analyzed by hotspot
analysis and weighted displacement quotient (WDQ). As study
methods, it analyzed existing relevant studies for identifying the trends
of CCTV and crime studies based on big data from 1800 to 2014 and
understanding the relation between CCTV and crime. Second, it
investigated the current situation of nationwide CCTVs and analyzed
the guidelines of CCTV installation and operation to draw attention to
the problems and indicating points of CCTV use. Third, it investigated
the crime occurrence in case areas and the current situation of CCTV
installation in the spatial aspects, and analyzed the appropriateness and
effectiveness of CCTV installation to suggest a rational installation of
CCTV and the strategic direction of crime prevention. The results
demonstrate that there was no significant effect in the installation of
CCTV on crime prevention in the case area. This indicates that CCTV
should be installed and managed in a more scientific way reflecting
local crime situations. In terms of CCTV, the methods of spatial
analysis such as GIS, which can evaluate the installation effect, and the
methods of economic analysis like cost-benefit analysis should be
developed. In addition, these methods should be distributed to local
governments across the nation for the appropriate installation of
CCTV and operation. This study intended to find a design guideline of
the optimum CCTV installation. In this regard, this study is
meaningful in that it will contribute to the creation of a safe city.
Abstract: Governments collect and produce large amounts of
data. Increasingly, governments worldwide have started to implement
open data initiatives and also launch open data portals to enable the
release of these data in open and reusable formats. Therefore, a large
number of open data repositories, catalogues and portals have been
emerging in the world. The greater availability of interoperable and
linkable open government data catalyzes secondary use of such data,
so they can be used for building useful applications which leverage
their value, allow insight, provide access to government services, and
support transparency. The efficient development of successful open
data portals makes it necessary to evaluate them systematic, in order
to understand them better and assess the various types of value they
generate, and identify the required improvements for increasing this
value. Thus, the attention of this paper is directed particularly to the
field of open data portals. The main aim of this paper is to compare
the selected open data portals on the national level using content
analysis and propose a new evaluation framework, which further
improves the quality of these portals. It also establishes a set of
considerations for involving businesses and citizens to create eservices
and applications that leverage on the datasets available from
these portals.
Abstract: Large-scale data stream analysis has become one of
the important business and research priorities lately. Social networks
like Twitter and other micro-blogging platforms hold an enormous
amount of data that is large in volume, velocity and variety.
Extracting valuable information and trends out of these data would
aid in a better understanding and decision-making. Multiple analysis
techniques are deployed for English content. Moreover, one of the
languages that produce a large amount of data over social networks
and is least analyzed is the Arabic language. The proposed paper is a
survey on the research efforts to analyze the Arabic content in
Twitter focusing on the tools and methods used to extract the
sentiments for the Arabic content on Twitter.
Abstract: This paper seeks to analyse the benefits of big data
and more importantly the challenges it pose to the subject of privacy
and data protection. First, the nature of big data will be briefly
deliberated before presenting the potential of big data in the present
days. Afterwards, the issue of privacy and data protection is
highlighted before discussing the challenges of implementing this
issue in big data. In conclusion, the paper will put forward the debate
on the adequacy of the existing legal framework in protecting
personal data in the era of big data.
Abstract: Over the past era, there have been a lot of efforts and
studies are carried out in growing proficient tools for performing
various tasks in big data. Recently big data have gotten a lot of
publicity for their good reasons. Due to the large and complex
collection of datasets it is difficult to process on traditional data
processing applications. This concern turns to be further mandatory
for producing various tools in big data. Moreover, the main aim of
big data analytics is to utilize the advanced analytic techniques
besides very huge, different datasets which contain diverse sizes from
terabytes to zettabytes and diverse types such as structured or
unstructured and batch or streaming. Big data is useful for data sets
where their size or type is away from the capability of traditional
relational databases for capturing, managing and processing the data
with low-latency. Thus the out coming challenges tend to the
occurrence of powerful big data tools. In this survey, a various
collection of big data tools are illustrated and also compared with the
salient features.
Abstract: Since big data has become substantially more accessible and manageable due to the development of powerful tools for dealing with unstructured data, people are eager to mine information from social media resources that could not be handled in the past. Sentiment analysis, as a novel branch of text mining, has in the last decade become increasingly important in marketing analysis, customer risk prediction and other fields. Scientists and researchers have undertaken significant work in creating and improving their sentiment models. In this paper, we present a concept of selecting appropriate classifiers based on the features and qualities of data sources by comparing the performances of five classifiers with three popular social media data sources: Twitter, Amazon Customer Reviews, and Movie Reviews. We introduced a couple of innovative models that outperform traditional sentiment classifiers for these data sources, and provide insights on how to further improve the predictive power of sentiment analysis. The modeling and testing work was done in R and Greenplum in-database analytic tools.
Abstract: Depending on the big data analysis becomes important, yield prediction using data from the semiconductor process is essential. In general, yield prediction and analysis of the causes of the failure are closely related. The purpose of this study is to analyze pattern affects the final test results using a die map based clustering. Many researches have been conducted using die data from the semiconductor test process. However, analysis has limitation as the test data is less directly related to the final test results. Therefore, this study proposes a framework for analysis through clustering using more detailed data than existing die data. This study consists of three phases. In the first phase, die map is created through fail bit data in each sub-area of die. In the second phase, clustering using map data is performed. And the third stage is to find patterns that affect final test result. Finally, the proposed three steps are applied to actual industrial data and experimental results showed the potential field application.
Abstract: Big data has the potential to improve the quality of services; enable infrastructure that businesses depend on to adapt continually and efficiently; improve the performance of employees; help organizations better understand customers; and reduce liability risks. Analytics and marketing models of fixed and mobile operators are falling short in combating churn and declining revenue per user. Big Data presents new method to reverse the way and improve profitability. The benefits of Big Data and next-generation network, however, are more exorbitant than improved customer relationship management. Next generation of networks are in a prime position to monetize rich supplies of customer information—while being mindful of legal and privacy issues. As data assets are transformed into new revenue streams will become integral to high performance.
Abstract: The importance of logistics has changed enormously in the last few decades. While logistics was formerly one of the core functions of most companies, logistics or at least parts of their functions are nowadays outsourced to external logistic service providers in terms of contracts. As a result of this shift new business models like the fourth party logistics provider emerged, which designs, plans and monitors the resulting logistics networks. This new business model and topics such as Synchromodality or Big Data impose new requirements on the underlying IT, which cannot be met with conventional concepts and approaches.
In this paper, the challenges of logistics network monitoring are outlined by using a scenario. The most common layers in a logical multilayered architecture for an information system are used to point out the arising challenges for IT. In addition, first appropriate solution approaches are introduced.