Abstract: A data warehouse (DW) is a system which has value and role for decision-making by querying. Queries to DW are critical regarding to their complexity and length. They often access millions of tuples, and involve joins between relations and aggregations. Materialized views are able to provide the better performance for DW queries. However, these views have maintenance cost, so materialization of all views is not possible. An important challenge of DW environment is materialized view selection because we have to realize the trade-off between performance and view maintenance. Therefore, in this paper, we introduce a new approach aimed to solve this challenge based on Two-Phase Optimization (2PO), which is a combination of Simulated Annealing (SA) and Iterative Improvement (II), with the use of Multiple View Processing Plan (MVPP). Our experiments show that 2PO outperform the original algorithms in terms of query processing cost and view maintenance cost.
Abstract: Data Warehousing tools have become very popular and currently many of them have moved to Web-based user interfaces to make it easier to access and use the tools. The next step is to enable these tools to be used within a portal framework. The portal framework consists of pages having several small windows that contain individual data warehouse query results. There are several issues that need to be considered when designing the architecture for a portal enabled data warehouse query tool. Some issues need special techniques that can overcome the limitations that are imposed by the nature of data warehouse queries. Issues such as single sign-on, query result caching and sharing, customization, scheduling and authorization need to be considered. This paper discusses such issues and suggests an architecture to support data warehouse queries within Web portal frameworks.
Abstract: Typical Intelligent Decision Support System is 4-based, its design composes of Data Warehouse, Online Analytical Processing, Data Mining and Decision Supporting based on models, which is called Decision Support System Based on Data Warehouse (DSSBDW). This way takes ETL,OLAP and DM as its implementing means, and integrates traditional model-driving DSS and data-driving DSS into a whole. For this kind of problem, this paper analyzes the DSSBDW architecture and DW model, and discusses the following key issues: ETL designing and Realization; metadata managing technology using XML; SQL implementing, optimizing performance, data mapping in OLAP; lastly, it illustrates the designing principle and method of DW in DSSBDW.
Abstract: In-place sorting algorithms play an important role in many fields such as very large database systems, data warehouses, data mining, etc. Such algorithms maximize the size of data that can be processed in main memory without input/output operations. In this paper, a novel in-place sorting algorithm is presented. The algorithm comprises two phases; rearranging the input unsorted array in place, resulting segments that are ordered relative to each other but whose elements are yet to be sorted. The first phase requires linear time, while, in the second phase, elements of each segment are sorted inplace in the order of z log (z), where z is the size of the segment, and O(1) auxiliary storage. The algorithm performs, in the worst case, for an array of size n, an O(n log z) element comparisons and O(n log z) element moves. Further, no auxiliary arithmetic operations with indices are required. Besides these theoretical achievements of this algorithm, it is of practical interest, because of its simplicity. Experimental results also show that it outperforms other in-place sorting algorithms. Finally, the analysis of time and space complexity, and required number of moves are presented, along with the auxiliary storage requirements of the proposed algorithm.
Abstract: Since supply chains highly impact the financial
performance of companies, it is important to optimize and analyze
their Key Performance Indicators (KPI). The synergistic combination
of Particle Swarm Optimization (PSO) and Monte Carlo simulation is
applied to determine the optimal reorder point of warehouses in
supply chains. The goal of the optimization is the minimization of the
objective function calculated as the linear combination of holding and
order costs. The required values of service levels of the warehouses
represent non-linear constraints in the PSO. The results illustrate that
the developed stochastic simulator and optimization tool is flexible
enough to handle complex situations.
Abstract: Selecting the data modeling technique for an
information system is determined by the objective of the resultant
data model. Dimensional modeling is the preferred modeling
technique for data destined for data warehouses and data mining,
presenting data models that ease analysis and queries which are in
contrast with entity relationship modeling. The establishment of data
warehouses as components of information system landscapes in
many organizations has subsequently led to the development of
dimensional modeling. This has been significantly more developed
and reported for the commercial database management systems as
compared to the open sources thereby making it less affordable for
those in resource constrained settings. This paper presents
dimensional modeling of HIV patient information using open source
modeling tools. It aims to take advantage of the fact that the most
affected regions by the HIV virus are also heavily resource
constrained (sub-Saharan Africa) whereas having large quantities of
HIV data. Two HIV data source systems were studied to identify
appropriate dimensions and facts these were then modeled using two
open source dimensional modeling tools. Use of open source would
reduce the software costs for dimensional modeling and in turn make
data warehousing and data mining more feasible even for those in
resource constrained settings but with data available.
Abstract: Data Warehouses (DWs) are repositories which contain the unified history of an enterprise for decision support. The data must be Extracted from information sources, Transformed and integrated to be Loaded (ETL) into the DW, using ETL tools. These tools focus on data movement, where the models are only used as a means to this aim. Under a conceptual viewpoint, the authors want to innovate the ETL process in two ways: 1) to make clear compatibility between models in a declarative fashion, using correspondence assertions and 2) to identify the instances of different sources that represent the same entity in the real-world. This paper presents the overview of the proposed framework to model the ETL process, which is based on the use of a reference model and perspective schemata. This approach provides the designer with a better understanding of the semantic associated with the ETL process.
Abstract: The main aim of Supply Chain Management (SCM) is
to produce, distribute, logistics and deliver goods and equipment in
right location, right time, right amount to satisfy costumers, with
minimum time and cost waste. So implementing techniques that
reduce project time and cost, and improve productivity and
performance is very important. Emerging technologies such as the
Radio Frequency Identification (RFID) are now making it possible to
automate supply chains in a real time manner and making them more
efficient than the simple supply chain of the past for tracing and
monitoring goods and products and capturing data on movements of
goods and other events. This paper considers concepts, components
and RFID technology characteristics by concentration of warehouse
and inventories management. Additionally, utilization of RFID in the
role of improving information management in supply chain is
discussed. Finally, the facts of installation and this technology-s
results in direction with warehouse and inventory management and
business development will be presented.
Abstract: The paper gives the pilot results of the project that is
oriented on the use of data mining techniques and knowledge
discoveries from production systems through them. They have been
used in the management of these systems. The simulation models of
manufacturing systems have been developed to obtain the necessary
data about production. The authors have developed the way of
storing data obtained from the simulation models in the data
warehouse. Data mining model has been created by using specific
methods and selected techniques for defined problems of production
system management. The new knowledge has been applied to
production management system. Gained knowledge has been tested
on simulation models of the production system. An important benefit
of the project has been proposal of the new methodology. This
methodology is focused on data mining from the databases that store
operational data about the production process.
Abstract: This paper proposes a modeling methodology for the
development of data analysis solution. The Author introduce the
approach to address data warehousing issues at the at enterprise level.
The methodology covers the process of the requirements eliciting and
analysis stage as well as initial design of data warehouse. The paper
reviews extended business process model, which satisfy the needs of
data warehouse development. The Author considers that the use of
business process models is necessary, as it reflects both enterprise
information systems and business functions, which are important for
data analysis. The Described approach divides development into
three steps with different detailed elaboration of models. The
Described approach gives possibility to gather requirements and
display them to business users in easy manner.
Abstract: The healthcare environment is generally perceived as
being information rich yet knowledge poor. However, there is a lack
of effective analysis tools to discover hidden relationships and trends
in data. In fact, valuable knowledge can be discovered from
application of data mining techniques in healthcare system. In this
study, a proficient methodology for the extraction of significant
patterns from the Coronary Heart Disease warehouses for heart
attack prediction, which unfortunately continues to be a leading cause
of mortality in the whole world, has been presented. For this purpose,
we propose to enumerate dynamically the optimal subsets of the
reduced features of high interest by using rough sets technique
associated to dynamic programming. Therefore, we propose to
validate the classification using Random Forest (RF) decision tree to
identify the risky heart disease cases. This work is based on a large
amount of data collected from several clinical institutions based on
the medical profile of patient. Moreover, the experts- knowledge in
this field has been taken into consideration in order to define the
disease, its risk factors, and to establish significant knowledge
relationships among the medical factors. A computer-aided system is
developed for this purpose based on a population of 525 adults. The
performance of the proposed model is analyzed and evaluated based
on set of benchmark techniques applied in this classification problem.
Abstract: Current trends in manufacturing are characterized by
production broadening, innovation cycle shortening, and the products
having a new shape, material and functions. The production strategy
focused on time needed change from the traditional functional
production structure to flexible manufacturing cells and lines.
Production by automated manufacturing system (AMS) is one of the
most important manufacturing philosophies in the last years. The
main goals of the project we are involved in lies on building a
laboratory in which will be located a flexible manufacturing system
consisting of at least two production machines with NC control
(milling machines, lathe). These machines will be linked to a
transport system and they will be served by industrial robots. Within
this flexible manufacturing system a station for the quality control
consisting of a camera system and rack warehouse will be also
located. The design, analysis and improvement of this manufacturing
system, specially with a special focus on the communication among
devices constitute the main aims of this paper. The key determining
factors for the manufacturing system design are: the product, the
production volume, the used machines, the disposable manpower, the
disposable infrastructure and the legislative frame for the specific
cases.
Abstract: The radio frequency identification (RFID) is a
technology for automatic identification of items, particularly in
supply chain, but it is becoming increasingly important for industrial
applications. Unlike barcode technology that detects the optical
signals reflected from barcode labels, RFID uses radio waves to
transmit the information from an RFID tag affixed to the physical
object. In contrast to today most often use of this technology in
warehouse inventory and supply chain, the focus of this paper is an
overview of the structure of RFID systems used by RFID technology
and it also presents a solution based on the application of RFID for
brand authentication, traceability and tracking, by implementing a
production management system and extending its use to traders.
Abstract: Business rules and data warehouse are concepts and
technologies that impact a wide variety of organizational tasks. In
general, each area has evolved independently, impacting application
development and decision-making. Generating knowledge from data
warehouse is a complex process. This paper outlines an approach to
ease import of information and knowledge from a data warehouse
star schema through an inference class of business rules. The paper
utilizes the Oracle database for illustrating the working of the
concepts. The star schema structure and the business rules are stored
within a relational database. The approach is explained through a
prototype in Oracle-s PL/SQL Server Pages.
Abstract: With the extensive inclusion of document, especially
text, in the business systems, data mining does not cover the full
scope of Business Intelligence. Data mining cannot deliver its impact
on extracting useful details from the large collection of unstructured
and semi-structured written materials based on natural languages.
The most pressing issue is to draw the potential business intelligence
from text. In order to gain competitive advantages for the business, it
is necessary to develop the new powerful tool, text mining, to expand
the scope of business intelligence.
In this paper, we will work out the strong points of text mining in
extracting business intelligence from huge amount of textual
information sources within business systems. We will apply text
mining to each stage of Business Intelligence systems to prove that
text mining is the powerful tool to expand the scope of BI. After
reviewing basic definitions and some related technologies, we will
discuss the relationship and the benefits of these to text mining. Some
examples and applications of text mining will also be given. The
motivation behind is to develop new approach to effective and
efficient textual information analysis. Thus we can expand the scope
of Business Intelligence using the powerful tool, text mining.
Abstract: The Informational Infrastructures of small and medium-sized manufacturing enterprises are relatively poor, there are serious shortages of capitals which can be invested in informatization construction, computer hardware and software resources, and human resources. To address the informatization issue in small and medium-sized manufacturing enterprises, and enable them to the application of advanced management thinking and enhance their competitiveness, the paper establish a manufacturing-oriented small and medium-sized enterprises informatization platform based on the ASP business intelligence technology, which effectively improves the scientificity of enterprises decision and management informatization.
Abstract: Today's business environment requires that companies have access to highly relevant information in a matter of seconds.
Modern Business Intelligence tools rely on data structured mostly in traditional dimensional database schemas, typically represented by
star schemas. Dimensional modeling is already recognized as a
leading industry standard in the field of data warehousing although
several drawbacks and pitfalls were reported. This paper focuses on
the analysis of another data warehouse modeling technique - the
anchor modeling, and its characteristics in context with the standardized dimensional modeling technique from a query performance perspective. The results of the analysis show
information about performance of queries executed on database
schemas structured according to principles of each database modeling
technique.
Abstract: A Data Warehouses is a repository of information
integrated from source data. Information stored in data warehouse is
the form of materialized in order to provide the better performance
for answering the queries. Deciding which appropriated views to be
materialized is one of important problem. In order to achieve this
requirement, the constructing search space close to optimal is a
necessary task. It will provide effective result for selecting view to be
materialized. In this paper we have proposed an approach to reoptimize
Multiple View Processing Plan (MVPP) by using global
common subexpressions. The merged queries which have query
processing cost not close to optimal would be rewritten. The
experiment shows that our approach can help to improve the total
query processing cost of MVPP and sum of query processing cost
and materialized view maintenance cost is reduced as well after views
are selected to be materialized.
Abstract: A data warehouse (DW) is a system which has value and role for decision-making by querying. Queries to DW are critical regarding to their complexity and length. They often access millions of tuples, and involve joins between relations and aggregations. Materialized views are able to provide the better performance for DW queries. However, these views have maintenance cost, so materialization of all views is not possible. An important challenge of DW environment is materialized view selection because we have to realize the trade-off between performance and view maintenance cost. Therefore, in this paper, we introduce a new approach aimed at solve this challenge based on Two-Phase Optimization (2PO), which is a combination of Simulated Annealing (SA) and Iterative Improvement (II), with the use of Multiple View Processing Plan (MVPP). Our experiments show that our method provides a further improvement in term of query processing cost and view maintenance cost.
Abstract: The data exchanged on the Web are of different nature
from those treated by the classical database management systems;
these data are called semi-structured data since they do not have a
regular and static structure like data found in a relational database;
their schema is dynamic and may contain missing data or types.
Therefore, the needs for developing further techniques and
algorithms to exploit and integrate such data, and extract relevant
information for the user have been raised. In this paper we present
the system OSIX (Osiris based System for Integration of XML
Sources). This system has a Data Warehouse model designed for the
integration of semi-structured data and more precisely for the
integration of XML documents. The architecture of OSIX relies on
the Osiris system, a DL-based model designed for the representation
and management of databases and knowledge bases. Osiris is a viewbased
data model whose indexing system supports semantic query
optimization. We show that the problem of query processing on a
XML source is optimized by the indexing approach proposed by
Osiris.