Forecasting Fraudulent Financial Statements using Data Mining

This paper explores the effectiveness of machine learning techniques in detecting firms that issue fraudulent financial statements (FFS) and deals with the identification of factors associated to FFS. To this end, a number of experiments have been conducted using representative learning algorithms, which were trained using a data set of 164 fraud and non-fraud Greek firms in the recent period 2001-2002. The decision of which particular method to choose is a complicated problem. A good alternative to choosing only one method is to create a hybrid forecasting system incorporating a number of possible solution methods as components (an ensemble of classifiers). For this purpose, we have implemented a hybrid decision support system that combines the representative algorithms using a stacking variant methodology and achieves better performance than any examined simple and ensemble method. To sum up, this study indicates that the investigation of financial information can be used in the identification of FFS and underline the importance of financial ratios.

An Integrative Bayesian Approach to Supporting the Prediction of Protein-Protein Interactions: A Case Study in Human Heart Failure

Recent years have seen a growing trend towards the integration of multiple information sources to support large-scale prediction of protein-protein interaction (PPI) networks in model organisms. Despite advances in computational approaches, the combination of multiple “omic" datasets representing the same type of data, e.g. different gene expression datasets, has not been rigorously studied. Furthermore, there is a need to further investigate the inference capability of powerful approaches, such as fullyconnected Bayesian networks, in the context of the prediction of PPI networks. This paper addresses these limitations by proposing a Bayesian approach to integrate multiple datasets, some of which encode the same type of “omic" data to support the identification of PPI networks. The case study reported involved the combination of three gene expression datasets relevant to human heart failure (HF). In comparison with two traditional methods, Naive Bayesian and maximum likelihood ratio approaches, the proposed technique can accurately identify known PPI and can be applied to infer potentially novel interactions.

The Development of Decision Support System for Waste Management; a Review

Most Decision Support Systems (DSS) for waste management (WM) constructed are not widely marketed and lack practical applications. This is due to the number of variables and complexity of the mathematical models which include the assumptions and constraints required in decision making. The approach made by many researchers in DSS modelling is to isolate a few key factors that have a significant influence to the DSS. This segmented approach does not provide a thorough understanding of the complex relationships of the many elements involved. The various elements in constructing the DSS must be integrated and optimized in order to produce a viable model that is marketable and has practical application. The DSS model used in assisting decision makers should be integrated with GIS, able to give robust prediction despite the inherent uncertainties of waste generation and the plethora of waste characteristics, and gives optimal allocation of waste stream for recycling, incineration, landfill and composting.

Emotional Intelligence and Retention: The Moderating Role of Job Involvement

The main aim of the current study was to examine the effect of emotional intelligence on retention. The study also aimed at analyzing the role of job involvement, as a moderator, in the effect of emotional intelligence on retention. Using data gathered from 241 employees working with hotels and tourism corporations listed in Amman Stock Exchange in Jordan, emotional intelligence, job involvement and retention were measured. Hierarchical regression analyses were used to test the three main hypotheses. Results indicated that retention was related to emotional intelligence. Moreover, the study yielded support for the claim that job involvement had a moderating effect on the relationship between emotional intelligence and retention.

Autonomously Determining the Parameters for SVDD with RBF Kernel from a One-Class Training Set

The one-class support vector machine “support vector data description” (SVDD) is an ideal approach for anomaly or outlier detection. However, for the applicability of SVDD in real-world applications, the ease of use is crucial. The results of SVDD are massively determined by the choice of the regularisation parameter C and the kernel parameter  of the widely used RBF kernel. While for two-class SVMs the parameters can be tuned using cross-validation based on the confusion matrix, for a one-class SVM this is not possible, because only true positives and false negatives can occur during training. This paper proposes an approach to find the optimal set of parameters for SVDD solely based on a training set from one class and without any user parameterisation. Results on artificial and real data sets are presented, underpinning the usefulness of the approach.

Further Investigations on Higher Mathematics Scores for Chinese University Students

Recently, X. Ge and J. Qian investigated some relations between higher mathematics scores and calculus scores (resp. linear algebra scores, probability statistics scores) for Chinese university students. Based on rough-set theory, they established an information system S = (U,CuD,V, f). In this information system, higher mathematics score was taken as a decision attribute and calculus score, linear algebra score, probability statistics score were taken as condition attributes. They investigated importance of each condition attribute with respective to decision attribute and strength of each condition attribute supporting decision attribute. In this paper, we give further investigations for this issue. Based on the above information system S = (U, CU D, V, f), we analyze the decision rules between condition and decision granules. For each x E U, we obtain support (resp. strength, certainty factor, coverage factor) of the decision rule C —>x D, where C —>x D is the decision rule induced by x in S = (U, CU D, V, f). Results of this paper gives new analysis of on higher mathematics scores for Chinese university students, which can further lead Chinese university students to raise higher mathematics scores in Chinese graduate student entrance examination.

Knowledge Management in Cross- Organizational Networks as Illustrated by One of the Largest European ICT Associations A Case Study of the “METORA

In networks, mainly small and medium-sized businesses benefit from the knowledge, experiences and solutions offered by experts from industry and science or from the exchange with practitioners. Associations which focus, among other things, on networking, information and knowledge transfer and which are interested in supporting such cooperations are especially well suited to provide such networks and the appropriate web platforms. Using METORA as an example – a project developed and run by the Federal Association for Information Economy, Telecommunications and New Media e.V. (BITKOM) for the Federal Ministry of Economics and Technology (BMWi) – This paper will discuss how associations and other network organizations can achieve this task and what conditions they have to consider.

Automatic Choice of Topics for Seminars by Clustering Students According to Their Profile

The new framework the Higher Education is immersed in involves a complete change in the way lecturers must teach and students must learn. Whereas the lecturer was the main character in traditional education, the essential goal now is to increase the students' participation in the process. Thus, one of the main tasks of lecturers in this new context is to design activities of different nature in order to encourage such participation. Seminars are one of the activities included in this environment. They are active sessions that enable going in depth into specific topics as support of other activities. They are characterized by some features such as favoring interaction between students and lecturers or improving their communication skills. Hence, planning and organizing strategic seminars is indeed a great challenge for lecturers with the aim of acquiring knowledge and abilities. This paper proposes a method using Artificial Intelligence techniques to obtain student profiles from their marks and preferences. The goal of building such profiles is twofold. First, it facilitates the task of splitting the students into different groups, each group with similar preferences and learning difficulties. Second, it makes it easy to select adequate topics to be a candidate for the seminars. The results obtained can be either a guarantee of what the lecturers could observe during the development of the course or a clue to reconsider new methodological strategies in certain topics.

An Anomaly Detection Approach to Detect Unexpected Faults in Recordings from Test Drives

In the automotive industry test drives are being conducted during the development of new vehicle models or as a part of quality assurance of series-production vehicles. The communication on the in-vehicle network, data from external sensors, or internal data from the electronic control units is recorded by automotive data loggers during the test drives. The recordings are used for fault analysis. Since the resulting data volume is tremendous, manually analysing each recording in great detail is not feasible. This paper proposes to use machine learning to support domainexperts by preventing them from contemplating irrelevant data and rather pointing them to the relevant parts in the recordings. The underlying idea is to learn the normal behaviour from available recordings, i.e. a training set, and then to autonomously detect unexpected deviations and report them as anomalies. The one-class support vector machine “support vector data description” is utilised to calculate distances of feature vectors. SVDDSUBSEQ is proposed as a novel approach, allowing to classify subsequences in multivariate time series data. The approach allows to detect unexpected faults without modelling effort as is shown with experimental results on recordings from test drives.

The Implementation of Good Manufacturing Practice in Polycarbonate Film Industry

This study reports the implementation of Good Manufacturing Practice (GMP) in a polycarbonate film processing plant. The implementation of GMP took place with the creation of a multidisciplinary team. It was carried out in four steps: conduct gap assessment, create gap closure plan, close gaps, and follow up the GMP implementation. The basis for the gap assessment is the guideline for GMP for plastic materials and articles intended for Food Contact Material (FCM), which was edited by Plastic Europe. The effective results of the GMP implementation in this study showed 100% completion of gap assessment. The key success factors for implementing GMP in production process are the commitment, intention and support of top management.

An Automation of Check Focusing on CRUD for Requirements Analysis Model in UML

A key to success of high quality software development is to define valid and feasible requirements specification. We have proposed a method of model-driven requirements analysis using Unified Modeling Language (UML). The main feature of our method is to automatically generate a Web user interface mock-up from UML requirements analysis model so that we can confirm validity of input/output data for each page and page transition on the system by directly operating the mock-up. This paper proposes a support method to check the validity of a data life cycle by using a model checking tool “UPPAAL" focusing on CRUD (Create, Read, Update and Delete). Exhaustive checking improves the quality of requirements analysis model which are validated by the customers through automatically generated mock-up. The effectiveness of our method is discussed by a case study of requirements modeling of two small projects which are a library management system and a supportive sales system for text books in a university.

On the Performance of Information Criteria in Latent Segment Models

Nevertheless the widespread application of finite mixture models in segmentation, finite mixture model selection is still an important issue. In fact, the selection of an adequate number of segments is a key issue in deriving latent segments structures and it is desirable that the selection criteria used for this end are effective. In order to select among several information criteria, which may support the selection of the correct number of segments we conduct a simulation study. In particular, this study is intended to determine which information criteria are more appropriate for mixture model selection when considering data sets with only categorical segmentation base variables. The generation of mixtures of multinomial data supports the proposed analysis. As a result, we establish a relationship between the level of measurement of segmentation variables and some (eleven) information criteria-s performance. The criterion AIC3 shows better performance (it indicates the correct number of the simulated segments- structure more often) when referring to mixtures of multinomial segmentation base variables.

A Training Model for Successful Implementation of Enterprise Resource Planning

It well recognized that one feature that makes a successful company is its ability to successfully align its business goals with its information communication technologies platform. Enterprise Resource Planning (ERP) systems contribute to achieve better performance by integrating various business functions and providing support for information flows. However, the technological systems complexity is known to prevent the business users to exploit in an efficient way the Enterprise Resource Planning Systems (ERP). This paper aims to investigate the role of training in improving the usage of ERP systems. To this end, we have designed an instrument survey to employees of a Norwegian multinational global provider of technology solutions. Based on the analysis of collected data, we have delineated a training model that could be high relevance for both researchers and practitioners as a step towards a better understanding of ERP system implementation.

Injuries Related to Kitesurfing

Participation in sporting activities can lead to injury. Sport injuries have been widely studied in many sports including the more extreme categories of aquatic board sports. Kitesurfing is a relatively new water surface action sport, and has not yet been widely studied in terms of injuries and stress on the body. The aim of this study was to get information about which injuries that are most common among kitesurfing participants, where they occur, and their causes. Injuries were studied using an international open web questionnaire (n=206). The results showed that many respondents reported injuries, in total 251 injuries to knee (24%), ankle (17%), trunk (16%) and shoulders (10%), often sustained while doing jumps and tricks (40%). Among the reported injuries were joint injuries (n=101), muscle/tendon damages (n=47), wounds and cuts (n=36) and bone fractures (n=28). Also environmental factors and equipment can influence the risk of injury, or the extent of injury in a hazardous situation. Conclusively, the information from this retrospective study supports earlier studies in terms of prevalence and site of injuries. Suggestively, this information should be used for to build a foundation of knowledge about the sport for development of applications for physical training and product development.

Structural Funds of Polish Agriculture

The research objective of the project and article “The impact of Structural Funds on the growth of competitiveness of Polish agriculture" is to assess competitiveness of regions in Poland from the perspective of Polish agriculture by analysing the efficiency of the use of Structural Funds, the economic procedure of their distribution and the regulatory and organisational framework under the Rural Development Programme (RDP). It must be stressed that defining the scope of research in the above manner limits the analysis only to the part of Structural Funds directed to support Polish agriculture.

On Methodologies for Analysing Sickness Absence Data: An Insight into a New Method

Sickness absence represents a major economic and social issue. Analysis of sick leave data is a recurrent challenge to analysts because of the complexity of the data structure which is often time dependent, highly skewed and clumped at zero. Ignoring these features to make statistical inference is likely to be inefficient and misguided. Traditional approaches do not address these problems. In this study, we discuss model methodologies in terms of statistical techniques for addressing the difficulties with sick leave data. We also introduce and demonstrate a new method by performing a longitudinal assessment of long-term absenteeism using a large registration dataset as a working example available from the Helsinki Health Study for municipal employees from Finland during the period of 1990-1999. We present a comparative study on model selection and a critical analysis of the temporal trends, the occurrence and degree of long-term sickness absences among municipal employees. The strengths of this working example include the large sample size over a long follow-up period providing strong evidence in supporting of the new model. Our main goal is to propose a way to select an appropriate model and to introduce a new methodology for analysing sickness absence data as well as to demonstrate model applicability to complicated longitudinal data.

A Study on the Differential Diagnostic Model for Newborn Hearing Loss Screening

According to the statistics, the prevalence of congenital hearing loss in Taiwan is approximately six thousandths; furthermore, one thousandths of infants have severe hearing impairment. Hearing ability during infancy has significant impact in the development of children-s oral expressions, language maturity, cognitive performance, education ability and social behaviors in the future. Although most children born with hearing impairment have sensorineural hearing loss, almost every child more or less still retains some residual hearing. If provided with a hearing aid or cochlear implant (a bionic ear) timely in addition to hearing speech training, even severely hearing-impaired children can still learn to talk. On the other hand, those who failed to be diagnosed and thus unable to begin hearing and speech rehabilitations on a timely manner might lose an important opportunity to live a complete and healthy life. Eventually, the lack of hearing and speaking ability will affect the development of both mental and physical functions, intelligence, and social adaptability. Not only will this problem result in an irreparable regret to the hearing-impaired child for the life time, but also create a heavy burden for the family and society. Therefore, it is necessary to establish a set of computer-assisted predictive model that can accurately detect and help diagnose newborn hearing loss so that early interventions can be provided timely to eliminate waste of medical resources. This study uses information from the neonatal database of the case hospital as the subjects, adopting two different analysis methods of using support vector machine (SVM) for model predictions and using logistic regression to conduct factor screening prior to model predictions in SVM to examine the results. The results indicate that prediction accuracy is as high as 96.43% when the factors are screened and selected through logistic regression. Hence, the model constructed in this study will have real help in clinical diagnosis for the physicians and actually beneficial to the early interventions of newborn hearing impairment.

Weka Based Desktop Data Mining as Web Service

Data mining is the process of sifting through large volumes of data, analyzing data from different perspectives and summarizing it into useful information. One of the widely used desktop applications for data mining is the Weka tool which is nothing but a collection of machine learning algorithms implemented in Java and open sourced under the General Public License (GPL). A web service is a software system designed to support interoperable machine to machine interaction over a network using SOAP messages. Unlike a desktop application, a web service is easy to upgrade, deliver and access and does not occupy any memory on the system. Keeping in mind the advantages of a web service over a desktop application, in this paper we are demonstrating how this Java based desktop data mining application can be implemented as a web service to support data mining across the internet.

Finding Pareto Optimal Front for the Multi-Mode Time, Cost Quality Trade-off in Project Scheduling

Project managers are the ultimate responsible for the overall characteristics of a project, i.e. they should deliver the project on time with minimum cost and with maximum quality. It is vital for any manager to decide a trade-off between these conflicting objectives and they will be benefited of any scientific decision support tool. Our work will try to determine optimal solutions (rather than a single optimal solution) from which the project manager will select his desirable choice to run the project. In this paper, the problem in project scheduling notated as (1,T|cpm,disc,mu|curve:quality,time,cost) will be studied. The problem is multi-objective and the purpose is finding the Pareto optimal front of time, cost and quality of a project (curve:quality,time,cost), whose activities belong to a start to finish activity relationship network (cpm) and they can be done in different possible modes (mu) which are non-continuous or discrete (disc), and each mode has a different cost, time and quality . The project is constrained to a non-renewable resource i.e. money (1,T). Because the problem is NP-Hard, to solve the problem, a meta-heuristic is developed based on a version of genetic algorithm specially adapted to solve multi-objective problems namely FastPGA. A sample project with 30 activities is generated and then solved by the proposed method.

On Combining Support Vector Machines and Fuzzy K-Means in Vision-based Precision Agriculture

One important objective in Precision Agriculture is to minimize the volume of herbicides that are applied to the fields through the use of site-specific weed management systems. In order to reach this goal, two major factors need to be considered: 1) the similar spectral signature, shape and texture between weeds and crops; 2) the irregular distribution of the weeds within the crop's field. This paper outlines an automatic computer vision system for the detection and differential spraying of Avena sterilis, a noxious weed growing in cereal crops. The proposed system involves two processes: image segmentation and decision making. Image segmentation combines basic suitable image processing techniques in order to extract cells from the image as the low level units. Each cell is described by two area-based attributes measuring the relations among the crops and the weeds. From these attributes, a hybrid decision making approach determines if a cell must be or not sprayed. The hybrid approach uses the Support Vector Machines and the Fuzzy k-Means methods, combined through the fuzzy aggregation theory. This makes the main finding of this paper. The method performance is compared against other available strategies.