Virtual Science Hub: An Open Source Platform to Enrich Science Teaching

This paper presents the Virtual Science Hub platform. It is an open source platform that combines a social network, an e-learning authoring tool, a videoconference service and a learning object repository for science teaching enrichment. These four main functionalities fit very well together. The platform was released in April 2012 and since then it has not stopped growing. Finally we present the results of the surveys conducted and the statistics gathered to validate this approach.

Adaptive WiFi Fingerprinting for Location Approximation

WiFi has become an essential technology that is widely used nowadays. It is famous due to its convenience to be used with mobile devices. This is especially true for Internet users worldwide that use WiFi connections. There are many location based services that are available nowadays which uses Wireless Fidelity (WiFi) signal fingerprinting. A common example that is gaining popularity in this era would be Foursquare. In this work, the WiFi signal would be used to estimate the user or client’s location. Similar to GPS, fingerprinting method needs a floor plan to increase the accuracy of location estimation. Still, the factor of inconsistent WiFi signal makes the estimation defer at different time intervals. Given so, an adaptive method is needed to obtain the most accurate signal at all times. WiFi signals are heavily distorted by external factors such as physical objects, radio frequency interference, electrical interference, and environmental factors to name a few. Due to these factors, this work uses a method of reducing the signal noise and estimation using the Nearest Neighbour based on past activities of the signal to increase the signal accuracy up to more than 80%. The repository yet increases the accuracy by using Artificial Neural Network (ANN) pattern matching. The repository acts as the server cum support of the client side application decision. Numerous previous works has adapted the methods of collecting signal strengths in the repository over the years, but mostly were just static. In this work, proposed solutions on how the adaptive method is done to match the signal received to the data in the repository are highlighted. With the said approach, location estimation can be done more accurately. Adaptive update allows the latest location fingerprint to be stored in the repository. Furthermore, any redundant location fingerprints are removed and only the updated version of the fingerprint is stored in the repository. How the location estimation of the user can be predicted would be highlighted more in the proposed solution section. After some studies on previous works, it is found that the Artificial Neural Network is the most feasible method to deploy in updating the repository and making it adaptive. The Artificial Neural Network functions are to do the pattern matching of the WiFi signal to the existing data available in the repository.

A Distance Function for Data with Missing Values and Its Application

Missing values in data are common in real world applications. Since the performance of many data mining algorithms depend critically on it being given a good metric over the input space, we decided in this paper to define a distance function for unlabeled datasets with missing values. We use the Bhattacharyya distance, which measures the similarity of two probability distributions, to define our new distance function. According to this distance, the distance between two points without missing attributes values is simply the Mahalanobis distance. When on the other hand there is a missing value of one of the coordinates, the distance is computed according to the distribution of the missing coordinate. Our distance is general and can be used as part of any algorithm that computes the distance between data points. Because its performance depends strongly on the chosen distance measure, we opted for the k nearest neighbor classifier to evaluate its ability to accurately reflect object similarity. We experimented on standard numerical datasets from the UCI repository from different fields. On these datasets we simulated missing values and compared the performance of the kNN classifier using our distance to other three basic methods. Our  experiments show that kNN using our distance function outperforms the kNN using other methods. Moreover, the runtime performance of our method is only slightly higher than the other methods.

Development of Knowledge Portal using Open Source Tools: A Case Study of FIIT, UNISEL

Knowledge sharing culture contributes to a positive working environment. Currently, there is no platform for the Faculty of Industrial Information Technology (FIIT), Unisel academic staff to share knowledge among them. As it is done manually, the sharing process is through common meeting or by any offline discussions. There is no repository for future retrieval. However, with open source solution the development of knowledge based application may reduce the cost tremendously. In this paper we discuss about the domain on which this knowledge portal is being developed and also the deployment of open source tools such as JOOMLA, PHP programming language and MySQL. This knowledge portal is evidence that open source tools also reliable in developing knowledge based portal. These recommendations will be useful to the open source community to produce more open source products in future.

Comparative Evaluation of Color-Based Video Signatures in the Presence of Various Distortion Types

The robustness of color-based signatures in the presence of a selection of representative distortions is investigated. Considered are five signatures that have been developed and evaluated within a new modular framework. Two signatures presented in this work are directly derived from histograms gathered from video frames. The other three signatures are based on temporal information by computing difference histograms between adjacent frames. In order to obtain objective and reproducible results, the evaluations are conducted based on several randomly assembled test sets. These test sets are extracted from a video repository that contains a wide range of broadcast content including documentaries, sports, news, movies, etc. Overall, the experimental results show the adequacy of color-histogram-based signatures for video fingerprinting applications and indicate which type of signature should be preferred in the presence of certain distortions.

Exploiting Self-Adaptive Replication Management on Decentralized Tuple Space

Decentralized Tuple Space (DTS) implements tuple space model among a series of decentralized hosts and provides the logical global shared tuple repository. Replication has been introduced to promote performance problem incurred by remote tuple access. In this paper, we propose a replication approach of DTS allowing replication policies self-adapting. The accesses from users or other nodes are monitored and collected to contribute the decision making. The replication policy may be changed if the better performance is expected. The experiments show that this approach suitably adjusts the replication policies, which brings negligible overhead.

Modeling of Reusability of Object Oriented Software System

Automatic reusability appraisal is helpful in evaluating the quality of developed or developing reusable software components and in identification of reusable components from existing legacy systems; that can save cost of developing the software from scratch. But the issue of how to identify reusable components from existing systems has remained relatively unexplored. In this research work, structural attributes of software components are explored using software metrics and quality of the software is inferred by different Neural Network based approaches, taking the metric values as input. The calculated reusability value enables to identify a good quality code automatically. It is found that the reusability value determined is close to the manual analysis used to be performed by the programmers or repository managers. So, the developed system can be used to enhance the productivity and quality of software development.

The Variation of Software Development Productivity 1995-2005

Software development has experienced remarkable progress in the past decade. However, due to the rising complexity and magnitude of the project the development productivity has not been consistently improved. By analyzing the latest ISBSG data repository with 4106 projects, we discovered that software development productivity has actually undergone irregular variations between the years 1995 and 2005. Considering the factors significant to the productivity, we found its variations are primarily caused by the variations of average team size and the unbalanced uses of the less productive language 3GL.

The Factors Significant to Software Development Productivity

The past decade has seen enormous growth in the amount of software produced. However, given the ever increasing complexity of the software being developed and the concomitant rise in the typical project size, managers are becoming increasingly aware of the importance of issues that influence the productivity levels of the project teams involved. By analyzing the latest release of ISBSG data repository, we report on the factors found to significantly influence the productivity among which average team size and language type are the two most essential ones. Building on this we present an original model for evaluating the potential productivity during the project planning stage.

Incremental Algorithm to Cluster the Categorical Data with Frequency Based Similarity Measure

Clustering categorical data is more complicated than the numerical clustering because of its special properties. Scalability and memory constraint is the challenging problem in clustering large data set. This paper presents an incremental algorithm to cluster the categorical data. Frequencies of attribute values contribute much in clustering similar categorical objects. In this paper we propose new similarity measures based on the frequencies of attribute values and its cardinalities. The proposed measures and the algorithm are experimented with the data sets from UCI data repository. Results prove that the proposed method generates better clusters than the existing one.

Mining Correlated Bicluster from Web Usage Data Using Discrete Firefly Algorithm Based Biclustering Approach

For the past one decade, biclustering has become popular data mining technique not only in the field of biological data analysis but also in other applications like text mining, market data analysis with high-dimensional two-way datasets. Biclustering clusters both rows and columns of a dataset simultaneously, as opposed to traditional clustering which clusters either rows or columns of a dataset. It retrieves subgroups of objects that are similar in one subgroup of variables and different in the remaining variables. Firefly Algorithm (FA) is a recently-proposed metaheuristic inspired by the collective behavior of fireflies. This paper provides a preliminary assessment of discrete version of FA (DFA) while coping with the task of mining coherent and large volume bicluster from web usage dataset. The experiments were conducted on two web usage datasets from public dataset repository whereby the performance of FA was compared with that exhibited by other population-based metaheuristic called binary Particle Swarm Optimization (PSO). The results achieved demonstrate the usefulness of DFA while tackling the biclustering problem.

The Development of a Narrative Management System: Storytelling in Knowledge Management

This paper presents a narrative management system for organizations to capture organization's tacit knowledge through stories. The intention of capturing tacit knowledge is to address the problem that comes with the mobility of workforce in organisation. Storytelling in knowledge management context is seen as a powerful management tool to communicate tacit knowledge in organization. This narrative management system is developed firstly to enable uploading of many types of knowledge sharing stories, from general to work related-specific stories and secondly, each video has comment functionality where knowledge users can post comments to other knowledge users. The narrative management system allows the stories to browse, search and view by the users. In the system, stories are stored in a video repository. Stories that were produced from this framework will improve learning, knowledge transfer facilitation and tacit knowledge quality in an organization.

Semantic Web as an Enabling Technology for Better e-Services Addoption

E-services have significantly changed the way of doing business in recent years. We can, however, observe poor use of these services. There is a large gap between supply and actual eservices usage. This is why we started a project to provide an environment that will encourage the use of e-services. We believe that only providing e-service does not automatically mean consumers would use them. This paper shows the origins of our project and its current position. We discuss the decision of using semantic web technologies and their potential to improve e-services usage. We also present current knowledge base and its real-world classification. In the paper, we discuss further work to be done in the project. Current state of the project is promising.

Analysis of a Population of Diabetic Patients Databases with Classifiers

Data mining can be called as a technique to extract information from data. It is the process of obtaining hidden information and then turning it into qualified knowledge by statistical and artificial intelligence technique. One of its application areas is medical area to form decision support systems for diagnosis just by inventing meaningful information from given medical data. In this study a decision support system for diagnosis of illness that make use of data mining and three different artificial intelligence classifier algorithms namely Multilayer Perceptron, Naive Bayes Classifier and J.48. Pima Indian dataset of UCI Machine Learning Repository was used. This dataset includes urinary and blood test results of 768 patients. These test results consist of 8 different feature vectors. Obtained classifying results were compared with the previous studies. The suggestions for future studies were presented.

Policy Management Framework for Managing Enterprise Policies

Policy management in organizations became rising issue in the last decade. It’s because of today’s regulatory requirements in the organizations. To manage policies in large organizations is an imperative work. However, major challenges facing organizations in the last decade is managing all the policies in the organization and making them an active documents rather than simple (inactive) documents stored in computer hard drive or on a shelf. Because of this challenge, organizations need policy management program. This policy management program can be either manual or automated. This paper presents suggestions towards managing policies in organizations. As well as possible policy management solution or program to be utilized, manual or automated. The research first examines the models and frameworks used for managing policies from various perspectives in the literature of the research area/domain. At the end of this paper, a policy management framework is proposed for managing enterprise policies effectively and in a simplified manner.

An Investigation on the Variation of Software Development Productivity

The productivity of software development is one of the major concerns for project managers. Given the increasing complexity of the software being developed and the concomitant rise in the typical project size, the productivity has not consistently improved. By analyzing the latest release of ISBSG data repository with 4106 projects ever developed, we report on the factors found to significantly influence productivity, and present an original model for the estimation of productivity during project design. We further illustrate that software development productivity has experienced irregular variations between the years 1995 and 2005. Considering the factors significant to productivity, we found its variations are primarily caused by the variations of average team size for the development and the unbalanced use of the less productive development language 3GL.

The Design of the HL7 RIM-based Sharing Components for Clinical Information Systems

The American Health Level Seven (HL7) Reference Information Model (RIM) consists of six back-bone classes that have different specialized attributes. Furthermore, for the purpose of enforcing the semantic expression, there are some specific mandatory vocabulary domains have been defined for representing the content values of some attributes. In the light of the fact that it is a duplicated effort on spending a lot of time and human cost to develop and modify Clinical Information Systems (CIS) for most hospitals due to the variety of workflows. This study attempts to design and develop sharing RIM-based components of the CIS for the different business processes. Therefore, the CIS contains data of a consistent format and type. The programmers can do transactions with the RIM-based clinical repository by the sharing RIM-based components. And when developing functions of the CIS, the sharing components also can be adopted in the system. These components not only satisfy physicians- needs in using a CIS but also reduce the time of developing new components of a system. All in all, this study provides a new viewpoint that integrating the data and functions with the business processes, it is an easy and flexible approach to build a new CIS.

Unsupervised Feature Selection Using Feature Density Functions

Since dealing with high dimensional data is computationally complex and sometimes even intractable, recently several feature reductions methods have been developed to reduce the dimensionality of the data in order to simplify the calculation analysis in various applications such as text categorization, signal processing, image retrieval, gene expressions and etc. Among feature reduction techniques, feature selection is one the most popular methods due to the preservation of the original features. In this paper, we propose a new unsupervised feature selection method which will remove redundant features from the original feature space by the use of probability density functions of various features. To show the effectiveness of the proposed method, popular feature selection methods have been implemented and compared. Experimental results on the several datasets derived from UCI repository database, illustrate the effectiveness of our proposed methods in comparison with the other compared methods in terms of both classification accuracy and the number of selected features.

Multi-Agent Systems for Intelligent Clustering

Intelligent systems are required in order to quickly and accurately analyze enormous quantities of data in the Internet environment. In intelligent systems, information extracting processes can be divided into supervised learning and unsupervised learning. This paper investigates intelligent clustering by unsupervised learning. Intelligent clustering is the clustering system which determines the clustering model for data analysis and evaluates results by itself. This system can make a clustering model more rapidly, objectively and accurately than an analyzer. The methodology for the automatic clustering intelligent system is a multi-agent system that comprises a clustering agent and a cluster performance evaluation agent. An agent exchanges information about clusters with another agent and the system determines the optimal cluster number through this information. Experiments using data sets in the UCI Machine Repository are performed in order to prove the validity of the system.

A Specification-Based Approach for Retrieval of Reusable Business Component for Software Reuse

Software reuse can be considered as the most realistic and promising way to improve software engineering productivity and quality. Automated assistance for software reuse involves the representation, classification, retrieval and adaptation of components. The representation and retrieval of components are important to software reuse in Component-Based on Software Development (CBSD). However, current industrial component models mainly focus on the implement techniques and ignore the semantic information about component, so it is difficult to retrieve the components that satisfy user-s requirements. This paper presents a method of business component retrieval based on specification matching to solve the software reuse of enterprise information system. First, a business component model oriented reuse is proposed. In our model, the business data type is represented as sign data type based on XML, which can express the variable business data type that can describe the variety of business operations. Based on this model, we propose specification match relationships in two levels: business operation level and business component level. In business operation level, we use input business data types, output business data types and the taxonomy of business operations evaluate the similarity between business operations. In the business component level, we propose five specification matches between business components. To retrieval reusable business components, we propose the measure of similarity degrees to calculate the similarities between business components. Finally, a business component retrieval command like SQL is proposed to help user to retrieve approximate business components from component repository.