Forecasting Fraudulent Financial Statements using Data Mining

This paper explores the effectiveness of machine learning techniques in detecting firms that issue fraudulent financial statements (FFS) and deals with the identification of factors associated to FFS. To this end, a number of experiments have been conducted using representative learning algorithms, which were trained using a data set of 164 fraud and non-fraud Greek firms in the recent period 2001-2002. The decision of which particular method to choose is a complicated problem. A good alternative to choosing only one method is to create a hybrid forecasting system incorporating a number of possible solution methods as components (an ensemble of classifiers). For this purpose, we have implemented a hybrid decision support system that combines the representative algorithms using a stacking variant methodology and achieves better performance than any examined simple and ensemble method. To sum up, this study indicates that the investigation of financial information can be used in the identification of FFS and underline the importance of financial ratios.




References:
[1] Aha, D. (1997), Lazy Learning, Dordrecht: Kluwer Academic
Publishers.
[2] Albrecht, C.C., Albrecht, W.S. and Dunn, J.G. (2001), "Can auditors
detect fraud: a review of the research evidence", Journal of Forensic
Accounting, Vol. 2 No. 1, pp. 1-12.
[3] Ansah, S.O., Moyes, G.D., Oyelere, P.B. and Hay, D. (2002), "An
empirical analysis of the likelihood of detecting fraud in New Zealand",
Managerial Auditing Journal, Vol. 17 No. 4, pp. 192-204.
[4] Bell T. and Carcello J. (2000) ÔÇÿA decision aid for assessing the
likelihood of fraudulent financial reporting-, Auditing: A Journal of
Practice & Theory, Vol. 9 (1), pp. 169- 178.
[5] Bollen L., Mertens G., Meuwissen R., VanRaak J., and Scelleman C.
(2005), "Classification and Analysis of Major European Business
Failures". Maastricht Accounting, Auditing and Information
Management Research Center (MARC) of University Maastricht and
RSM.
[6] Burges, C. (1998). A tutorial on support vector machines for pattern
recognition. Data Mining and Knowledge Discovery. 2(2):1-47.
[7] Calderon T.G., and Cheh J.J., (2002), ÔÇÿA roadmap for future neural
networks research in auditing and risk assessment-, International Journal
of Accounting Information Systems, Vol. 3, No. 4, pp. 203-236.
[8] Coderre G. D. (1999) Fraud Detection. Using Data Analysis Techniques
to Detect Fraud. Global Audit Publications.
[9] Coffee, J. (2005), "A theory of corporate scandals: Why the USA and
Europe differ". Oxford Review of Economic Policy, Vol. 21 (2), pp.
198-211.
[10] Cohen, W. (1995), "Fast Effective Rule Induction", Proceeding of
International Confer-ence on Machine Learning, pp. 115-123.
[11] Fanning K. and Cogger K., (1998), ÔÇÿNeural Network Detection of
Management Fraud Using Published Financial Data-, International
Journal of Intelligent Systems in Account-ing, Finance & Management,
Vol. 7, No. 1, pp. 21-24.
[12] Furnkranz, J. (1999), "Separate-and-Conquer Rule Learning", Artificial
Intelligence Review, Vol. 13, pp. 3-54.
[13] Green B.P. and Choi J.H., (1997), ÔÇÿAssessing the risk of management
fraud through neuralnetwork technology-, Auditing: A Journal of
Practice and Theory, Vol. 16(1), pp.14-28.
[14] Jensen, F. (1996), An Introduction to Bayesian Networks, Springer.
[15] Kirkos S., Spathis C., Manolopoulos Y., (2006), Data Mining techniques
for the detection of fraudulent financial statements, Expert Systems with
Applications..
[16] Mitchell, T. (1997), Machine Learning, McGraw Hill.
[17] Murthy, S. (1998), "Automatic Construction of Decision Trees from
Data: A Multi-Disciplinary Survey", Data Mining and Knowledge
Discovery, Vol. 2, pp. 345-389.
[18] Nieschwietz, R.J., Schultz, J.J. Jr and Zimbelman, M.F. (2000),
"Empirical research on external auditors- detection of financial statement
fraud", Journal of Accounting Litera-ture, Vol. 19, pp. 190-246.
[19] Platt, J. (1999), Using sparseness and analytic QP to speed training of
support vector machines. In M. S. Kearns, S. A. Solla, & D. A. Cohn
(Eds.), Advances in neural infor-mation processing systems 11. MA:
MIT Press.
[20] Quinlan, J. R. (1993), C4.5: Programs for machine learning, Morgan
Kaufmann, San Francisco
[21] Seewald, A.K, 2002. How to Make Stacking Better and Faster While
Also Taking Care of an Unknown Weakness, in Sammut C., Hoffmann
A. (eds.), Proceedings of the Nineteenth International Conference on
Machine Learning (ICML 2002), Morgan Kaufmann Publishers, pp.554-
561.
[22] Seewald, A. K., Furnkranz, J., 2001. An evaluation of grading
classifiers. In Advances in Intelligent Data Analysis: Proceedings of the
Fourth International Symposium (IDA-01), pages 221-232, Berlin,
Springer.
[23] Sikonja M. and Kononenko I. (1997), An adaptation of Relief for
attribute estimation in regression, Proc. of ICML'97, pp. 296-304.
Morgan Kaufmann Publishers.
[24] Spathis C., (2002), ÔÇÿDetecting false financial statements using published
data: some evidence from Greece-, Managerial Auditing Journal, Vol.
17, No. 4, pp. 179-191.
[25] Spathis C., Doumpos M. and Zopounidis C., (2002), ÔÇÿDetecting falsified
financial state-ments: a comparative study using multicriteria analysis
and multivariate statistical tech-niques-, The European Accounting
Review, Vol.11, No. 3, pp. 509-535.
[26] Ting, K., & Witten, I., 1999. Issues in Stacked Generalization, Artificial
Intelligence Research 10, 271-289, Morgan Kaufmann.
[27] Wang, Y., Witten, I., 1997, Induction of model trees for predicting
continuous classes, In Proc. of the Poster Papers of the European
Conference on ML, Prague, 128-137.
[28] Watts, R. L., and J. L. Zimmerman, 1986, Positive Accounting Theory.
Prentice-Hall.
[29] Witten I. & Frank E., Data Mining: Practical Machine Learning Tools
and Techniques with Java Implementations, Morgan Kaufmann, San
Mateo, 2000.