Big Brain: A Single Database System for a Federated Data Warehouse Architecture

Traditional federated architectures for data warehousing work well when corporations have existing regional data warehouses and there is a need to aggregate data at a global level. Schibsted Media Group has been maturing from a decentralised organisation into a more globalised one and needed to build both some of the regional data warehouses for some brands at the same time as the global one. In this paper, we present the architectural alternatives studied and why a custom federated approach was the notable recommendation to go further with the implementation. Although the data warehouses are logically federated, the implementation uses a single database system which presented many advantages like: cost reduction and improved data access to global users allowing consumers of the data to have a common data model for detailed analysis across different geographies and a flexible layer for local specific needs in the same place.

Agile Methodology for Modeling and Design of Data Warehouses -AM4DW-

The organizations have structured and unstructured information in different formats, sources, and systems. Part of these come from ERP under OLTP processing that support the information system, however these organizations in OLAP processing level, presented some deficiencies, part of this problematic lies in that does not exist interesting into extract knowledge from their data sources, as also the absence of operational capabilities to tackle with these kind of projects.  Data Warehouse and its applications are considered as non-proprietary tools, which are of great interest to business intelligence, since they are repositories basis for creating models or patterns (behavior of customers, suppliers, products, social networks and genomics) and facilitate corporate decision making and research. The following paper present a structured methodology, simple, inspired from the agile development models as Scrum, XP and AUP. Also the models object relational, spatial data models, and the base line of data modeling under UML and Big data, from this way sought to deliver an agile methodology for the developing of data warehouses, simple and of easy application. The methodology naturally take into account the application of process for the respectively information analysis, visualization and data mining, particularly for patterns generation and derived models from the objects facts structured.

Application of Building Information Modeling in Energy Management of Individual Departments Occupying University Facilities

To assist individual departments within universities in their energy management tasks, this study explores the application of Building Information Modeling in establishing the ‘BIM based Energy Management Support System’ (BIM-EMSS). The BIM-EMSS consists of six components: (1) sensors installed for each occupant and each equipment, (2) electricity sub-meters (constantly logging lighting, HVAC, and socket electricity consumptions of each room), (3) BIM models of all rooms within individual departments’ facilities, (4) data warehouse (for storing occupancy status and logged electricity consumption data), (5) building energy management system that provides energy managers with various energy management functions, and (6) energy simulation tool (such as eQuest) that generates real time 'standard energy consumptions' data against which 'actual energy consumptions' data are compared and energy efficiency evaluated. Through the building energy management system, the energy manager is able to (a) have 3D visualization (BIM model) of each room, in which the occupancy and equipment status detected by the sensors and the electricity consumptions data logged are displayed constantly; (b) perform real time energy consumption analysis to compare the actual and standard energy consumption profiles of a space; (c) obtain energy consumption anomaly detection warnings on certain rooms so that energy management corrective actions can be further taken (data mining technique is employed to analyze the relation between space occupancy pattern with current space equipment setting to indicate an anomaly, such as when appliances turn on without occupancy); and (d) perform historical energy consumption analysis to review monthly and annually energy consumption profiles and compare them against historical energy profiles. The BIM-EMSS was further implemented in a research lab in the Department of Architecture of NTUST in Taiwan and implementation results presented to illustrate how it can be used to assist individual departments within universities in their energy management tasks.

On-line Control of the Natural and Anthropogenic Safety in Krasnoyarsk Region

This paper presents an approach of on-line control of the state of technosphere and environment objects based on the integration of Data Warehouse, OLAP and Expert systems technologies. It looks at the structure and content of data warehouse that provides consolidation and storage of monitoring data. There is a description of OLAP-models that provide a multidimensional analysis of monitoring data and dynamic analysis of principal parameters of controlled objects. The authors suggest some criteria of emergency risk assessment using expert knowledge about danger levels. It is demonstrated now some of the proposed solutions could be adopted in territorial decision making support systems. Operational control allows authorities to detect threat, prevent natural and anthropogenic emergencies and ensure a comprehensive safety of territory.

A Review: Comparative Study of Diverse Collection of Data Mining Tools

There have been a lot of efforts and researches undertaken in developing efficient tools for performing several tasks in data mining. Due to the massive amount of information embedded in huge data warehouses maintained in several domains, the extraction of meaningful pattern is no longer feasible. This issue turns to be more obligatory for developing several tools in data mining. Furthermore the major aspire of data mining software is to build a resourceful predictive or descriptive model for handling large amount of information more efficiently and user friendly. Data mining mainly contracts with excessive collection of data that inflicts huge rigorous computational constraints. These out coming challenges lead to the emergence of powerful data mining technologies. In this survey a diverse collection of data mining tools are exemplified and also contrasted with the salient features and performance behavior of each tool.

Coalescing Data Marts

OLAP uses multidimensional structures, to provide access to data for analysis. Traditionally, OLAP operations are more focused on retrieving data from a single data mart. An exception is the drill across operator. This, however, is restricted to retrieving facts on common dimensions of the multiple data marts. Our concern is to define further operations while retrieving data from multiple data marts. Towards this, we have defined six operations which coalesce data marts. While doing so we consider the common as well as the non-common dimensions of the data marts.

A Conceptual Query-Driven Design Framework for Data Warehouse

Data warehouse is a dedicated database used for querying and reporting. Queries in this environment show special characteristics such as multidimensionality and aggregation. Exploiting the nature of queries, in this paper we propose a query driven design framework. The proposed framework is general and allows a designer to generate a schema based on a set of queries.

Evaluation of Risk Attributes Driven by Periodically Changing System Functionality

Modeling of the distributed systems allows us to represent the whole its functionality. The working system instance rarely fulfils the whole functionality represented by model; usually some parts of this functionality should be accessible periodically. The reporting system based on the Data Warehouse concept seams to be an intuitive example of the system that some of its functionality is required only from time to time. Analyzing an enterprise risk associated with the periodical change of the system functionality, we should consider not only the inaccessibility of the components (object) but also their functions (methods), and the impact of such a situation on the system functionality from the business point of view. In the paper we suggest that the risk attributes should be estimated from risk attributes specified at the requirements level (Use Case in the UML model) on the base of the information about the structure of the model (presented at other levels of the UML model). We argue that it is desirable to consider the influence of periodical changes in requirements on the enterprise risk estimation. Finally, the proposition of such a solution basing on the UML system model is presented.

The Relevance of Data Warehousing and Data Mining in the Field of Evidence-based Medicine to Support Healthcare Decision Making

Evidence-based medicine is a new direction in modern healthcare. Its task is to prevent, diagnose and medicate diseases using medical evidence. Medical data about a large patient population is analyzed to perform healthcare management and medical research. In order to obtain the best evidence for a given disease, external clinical expertise as well as internal clinical experience must be available to the healthcare practitioners at right time and in the right manner. External evidence-based knowledge can not be applied directly to the patient without adjusting it to the patient-s health condition. We propose a data warehouse based approach as a suitable solution for the integration of external evidence-based data sources into the existing clinical information system and data mining techniques for finding appropriate therapy for a given patient and a given disease. Through integration of data warehousing, OLAP and data mining techniques in the healthcare area, an easy to use decision support platform, which supports decision making process of care givers and clinical managers, is built. We present three case studies, which show, that a clinical data warehouse that facilitates evidence-based medicine is a reliable, powerful and user-friendly platform for strategic decision making, which has a great relevance for the practice and acceptance of evidence-based medicine.

A Data Warehouse System to Help Assist Breast Cancer Screening in Diagnosis, Education and Research

Early detection of breast cancer is considered as a major public health issue. Breast cancer screening is not generalized to the entire population due to a lack of resources, staff and appropriate tools. Systematic screening can result in a volume of data which can not be managed by present computer architecture, either in terms of storage capabilities or in terms of exploitation tools. We propose in this paper to design and develop a data warehouse system in radiology-senology (DWRS). The aim of such a system is on one hand, to support this important volume of information providing from multiple sources of data and images and for the other hand, to help assist breast cancer screening in diagnosis, education and research.

Application of Exact String Matching Algorithms towards SMILES Representation of Chemical Structure

Bioinformatics and Cheminformatics use computer as disciplines providing tools for acquisition, storage, processing, analysis, integrate data and for the development of potential applications of biological and chemical data. A chemical database is one of the databases that exclusively designed to store chemical information. NMRShiftDB is one of the main databases that used to represent the chemical structures in 2D or 3D structures. SMILES format is one of many ways to write a chemical structure in a linear format. In this study we extracted Antimicrobial Structures in SMILES format from NMRShiftDB and stored it in our Local Data Warehouse with its corresponding information. Additionally, we developed a searching tool that would response to user-s query using the JME Editor tool that allows user to draw or edit molecules and converts the drawn structure into SMILES format. We applied Quick Search algorithm to search for Antimicrobial Structures in our Local Data Ware House.

Improved Data Warehousing: Lessons Learnt from the Systems Approach

Data warehousing success is not high enough. User dissatisfaction and failure to adhere to time frames and budgets are too common. Most traditional information systems practices are rooted in hard systems thinking. Today, the great systems thinkers are forgotten by information systems developers. A data warehouse is still a system and it is worth investigating whether systems thinkers such as Churchman can enhance our practices today. This paper investigates data warehouse development practices from a systems thinking perspective. An empirical investigation is done in order to understand the everyday practices of data warehousing professionals from a systems perspective. The paper presents a model for the application of Churchman-s systems approach in data warehouse development.

Two-Phase Optimization for Selecting Materialized Views in a Data Warehouse

A data warehouse (DW) is a system which has value and role for decision-making by querying. Queries to DW are critical regarding to their complexity and length. They often access millions of tuples, and involve joins between relations and aggregations. Materialized views are able to provide the better performance for DW queries. However, these views have maintenance cost, so materialization of all views is not possible. An important challenge of DW environment is materialized view selection because we have to realize the trade-off between performance and view maintenance. Therefore, in this paper, we introduce a new approach aimed to solve this challenge based on Two-Phase Optimization (2PO), which is a combination of Simulated Annealing (SA) and Iterative Improvement (II), with the use of Multiple View Processing Plan (MVPP). Our experiments show that 2PO outperform the original algorithms in terms of query processing cost and view maintenance cost.

Issues and Architecture for Supporting Data Warehouse Queries in Web Portals

Data Warehousing tools have become very popular and currently many of them have moved to Web-based user interfaces to make it easier to access and use the tools. The next step is to enable these tools to be used within a portal framework. The portal framework consists of pages having several small windows that contain individual data warehouse query results. There are several issues that need to be considered when designing the architecture for a portal enabled data warehouse query tool. Some issues need special techniques that can overcome the limitations that are imposed by the nature of data warehouse queries. Issues such as single sign-on, query result caching and sharing, customization, scheduling and authorization need to be considered. This paper discusses such issues and suggests an architecture to support data warehouse queries within Web portal frameworks.

Decision Support System Based on Data Warehouse

Typical Intelligent Decision Support System is 4-based, its design composes of Data Warehouse, Online Analytical Processing, Data Mining and Decision Supporting based on models, which is called Decision Support System Based on Data Warehouse (DSSBDW). This way takes ETL,OLAP and DM as its implementing means, and integrates traditional model-driving DSS and data-driving DSS into a whole. For this kind of problem, this paper analyzes the DSSBDW architecture and DW model, and discusses the following key issues: ETL designing and Realization; metadata managing technology using XML; SQL implementing, optimizing performance, data mapping in OLAP; lastly, it illustrates the designing principle and method of DW in DSSBDW.

A Novel In-Place Sorting Algorithm with O(n log z) Comparisons and O(n log z) Moves

In-place sorting algorithms play an important role in many fields such as very large database systems, data warehouses, data mining, etc. Such algorithms maximize the size of data that can be processed in main memory without input/output operations. In this paper, a novel in-place sorting algorithm is presented. The algorithm comprises two phases; rearranging the input unsorted array in place, resulting segments that are ordered relative to each other but whose elements are yet to be sorted. The first phase requires linear time, while, in the second phase, elements of each segment are sorted inplace in the order of z log (z), where z is the size of the segment, and O(1) auxiliary storage. The algorithm performs, in the worst case, for an array of size n, an O(n log z) element comparisons and O(n log z) element moves. Further, no auxiliary arithmetic operations with indices are required. Besides these theoretical achievements of this algorithm, it is of practical interest, because of its simplicity. Experimental results also show that it outperforms other in-place sorting algorithms. Finally, the analysis of time and space complexity, and required number of moves are presented, along with the auxiliary storage requirements of the proposed algorithm.

Dimensional Modeling of HIV Data Using Open Source

Selecting the data modeling technique for an information system is determined by the objective of the resultant data model. Dimensional modeling is the preferred modeling technique for data destined for data warehouses and data mining, presenting data models that ease analysis and queries which are in contrast with entity relationship modeling. The establishment of data warehouses as components of information system landscapes in many organizations has subsequently led to the development of dimensional modeling. This has been significantly more developed and reported for the commercial database management systems as compared to the open sources thereby making it less affordable for those in resource constrained settings. This paper presents dimensional modeling of HIV patient information using open source modeling tools. It aims to take advantage of the fact that the most affected regions by the HIV virus are also heavily resource constrained (sub-Saharan Africa) whereas having large quantities of HIV data. Two HIV data source systems were studied to identify appropriate dimensions and facts these were then modeled using two open source dimensional modeling tools. Use of open source would reduce the software costs for dimensional modeling and in turn make data warehousing and data mining more feasible even for those in resource constrained settings but with data available.

Using Perspective Schemata to Model the ETL Process

Data Warehouses (DWs) are repositories which contain the unified history of an enterprise for decision support. The data must be Extracted from information sources, Transformed and integrated to be Loaded (ETL) into the DW, using ETL tools. These tools focus on data movement, where the models are only used as a means to this aim. Under a conceptual viewpoint, the authors want to innovate the ETL process in two ways: 1) to make clear compatibility between models in a declarative fashion, using correspondence assertions and 2) to identify the instances of different sources that represent the same entity in the real-world. This paper presents the overview of the proposed framework to model the ETL process, which is based on the use of a reference model and perspective schemata. This approach provides the designer with a better understanding of the semantic associated with the ETL process.

Towards Development of Solution for Business Process-Oriented Data Analysis

This paper proposes a modeling methodology for the development of data analysis solution. The Author introduce the approach to address data warehousing issues at the at enterprise level. The methodology covers the process of the requirements eliciting and analysis stage as well as initial design of data warehouse. The paper reviews extended business process model, which satisfy the needs of data warehouse development. The Author considers that the use of business process models is necessary, as it reflects both enterprise information systems and business functions, which are important for data analysis. The Described approach divides development into three steps with different detailed elaboration of models. The Described approach gives possibility to gather requirements and display them to business users in easy manner.

Business Rules for Data Warehouse

Business rules and data warehouse are concepts and technologies that impact a wide variety of organizational tasks. In general, each area has evolved independently, impacting application development and decision-making. Generating knowledge from data warehouse is a complex process. This paper outlines an approach to ease import of information and knowledge from a data warehouse star schema through an inference class of business rules. The paper utilizes the Oracle database for illustrating the working of the concepts. The star schema structure and the business rules are stored within a relational database. The approach is explained through a prototype in Oracle-s PL/SQL Server Pages.