Data-organization Before Learning Multi-Entity Bayesian Networks Structure

The objective of our work is to develop a new approach for discovering knowledge from a large mass of data, the result of applying this approach will be an expert system that will serve as diagnostic tools of a phenomenon related to a huge information system. We first recall the general problem of learning Bayesian network structure from data and suggest a solution for optimizing the complexity by using organizational and optimization methods of data. Afterward we proposed a new heuristic of learning a Multi-Entities Bayesian Networks structures. We have applied our approach to biological facts concerning hereditary complex illnesses where the literatures in biology identify the responsible variables for those diseases. Finally we conclude on the limits arched by this work.





References:
[1] M. L. Damian and F. H. Donald, "Combining multiple scoring systems
for target tracking using rank-score characteristics," Information Fusion,
10, 124-136, 2009.
[2] S. Detera-Wadleigh and F. McMahon, "G72/g30 in schizophrenia and
bipolar disorder: review and meta-analysis," Biological Psychiatry,
60(2): 106-114, 2006.
[3] P. Dempster, N. Laird and B. D. Rubin, "Maximum likelihood from
incomplete data via the EM algorithm," Journal of the Royal Stat Soc B
39: 1-38, 1977.
[4] M. Geudj, J. Wojcik, D. Robelin, M. Hoebeke, M. Lamarine and G.
Nuel, "Detecting Local High-Scoring Segments: a First-Stage Approach
for Genome-Wide Association Studies," Statistical Applications in
Genetics and Molecular Biology, Vol. 5, Iss. 1, Article 22 2006.
[5] A. Jain, K. Nandakumar and A. Ross, "Score normalization in
multimodal biometric systems," Pattern Recognition, volume 38 Issue
12, Pages 2270-2285, Dec 2005.
[6] S. Karlin and S. Altshul, "Applications and statistics for multiple highscoring
segments in molecular sequences," Proceedings of the National
Academy of Science USA 90, 5873-5877, 1993.
[7] K. B. Laskey, "MEBN: A language for first-order Bayesian knowledge
bases," Artificial Intelligence, 172, 140-178, 2007.
[8] O. Francois, and P. Leray, "Evaluation d'algorithmes d'apprentissage de
structure pour les réseaux bayésiens," In Proceedings of 14ème Congrès
Francophone Reconnaissance des Formes et Intelligence Artificielle,
RFIA, pages 1453-1460, Toulouse, France, 2004.
[9] H. N. Parkash and D. S. Guru, "Offline signature verification: An
approach based on score level fusion," International journal of computer
applications, 0975-8887, Article 10, No.18, 2010.
[10] R. W. Robinson, "Counting unlabeled acyclic digraphs," Combinatorial
Mathematics, 622, 28-43, 1977.
[11] D. Zaykin, L. Zhivotovsky, P. Westfall and B. Weir, "Truncated product
method for combining P-values," Genet Epidemiol, 22(2), 170-85, Feb
2002.
[12] O. Fran├ºois, "De l-identification de structure de réseaux bayésiens ├á la
reconnaissance de formes à partir d-informations completes où
incompletes, "Thèse de doctorat. Institut National des Science
Appliquées de Rouen, 2006.
[13] P. Leray, "Réseaux Bayésiens: apprentissage et modélisation de
systèmes complexes," habilitation ├á diriger les recherches, Université de
Rouen, 2006.
[14] B. Efron, "The lenght heuristic for simultaneous hypothesis tests,"
Biometrica, 84, 143-157, 1997.
[15] C. Herman and E. L. Lehman, "The use of Maximum Likelihood
Estimates in chi-square tests for goodness of fit," The annals of
Mathematical Statistics volume 25, Number 3, 579-586, 1954.
[16] X. Rui, and C. W. Donald, "Clustering," IEEE Press/Wiley, oct 2008.