Usama Fayyad and Eric Horvitz,
Microsoft Research
Usama Fayyad
Microsoft Research
One Microsoft Way
Redmond, WA 98052, USA
fayyad@microsoft.com
Gregory Piatetsky-Shapiro
GTE Laboratories, MS 44
Waltham, MA 02154, USA
gps@gte.com
Padhraic Smyth
Information and Computer Science
University of California
Irvine, CA 92717-3425, USA
smyth@.ics.uci.edu
This paper presents a first step towards a unifying framework for Knowledge Discovery in Databases. We describe links between data mining, knowledge discovery, and other related fields. We then define the KDD process and basic data mining algorithms, discuss application issues and conclude with an analysis of challenges facing practitioners in the field.
Fayyad, U.M. , Piatetsky-Shapiro, G. and Smyth, P. Knowledge Discovery and Data Mining: Toward a Unifying Framework, Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, AAAI Press: Menlo Park, CA. August 1996.
This paper is available in (PostScript) format.
David M. Chickering
Microsoft Research
e-mail: dmax@microsoft.com
David Heckerman
Microsoft Research
e-mail: heckerma@microsoft.com
We discuss Bayesian methods for learning Bayesian networks when data sets are incomplete. In particular, we examine asymptotic approximations for the marginal likelihood of incomplete data given a Bayesian network. We consider the Laplace approximation and the less accurate but more efficient BIC/MDL approximation. We also consider approximations proposed by Draper (1993) and Cheeseman and Stutz (1995). These approximations are as efficient as BIC/MDL, but their accuracy has not been studied in any depth. We compare the accuracy of these approximations under the assumption that the Laplace approximation is the most accurate. In experiments using synthetic data generated from discrete naive-Bayes models having a hidden root node, we find that the CS measure is the most accurate.
Chickering, D.M. , Heckerman, D. Efficient Approximations for the Marginal Likelihood of Incomplete Data Given a Bayesian Network, in E. Horvitz and F. Jensen, eds., Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers: San Francisco, CA. August 1996, pp. 158-168.
This paper is available in (PostScript) format.
Padhraic Smyth
Information and Computer Science
University of California
Irvine, CA 92717-3425, USA
smyth@.ics.uci.edu
Finding the ``right" number of clusters, $k$, for a data set is a difficult, and often ill-posed, problem. In a probabilistic clustering context, likelihood-ratios, penalized likelihoods, and Bayesian techniques are among the more popular techniques. In this paper a new cross-validated likelihood criterion is investigated for determining cluster structure. A practical clustering algorithm based on {\it Monte Carlo cross-validation} (MCCV) is introduced. The algorithm permits the data analyst to judge if there is strong evidence for a particular k, or perhaps weaker evidence over a sub-range of k values. Experimental results with Gaussian mixtures on real and simulated data suggest that MCCV provides genuine insight into cluster structure. v-fold cross-validation appears inferior to the penalized likelihood method (BIC), a Bayesian algorithm (AutoClass v2.0), and the new MCCV algorithm. Overall, MCCV and AutoClass appear the most reliable of the methods. MCCV provides the data-miner with a useful data-driven tool which complements the fully Bayesian approach: in particular, the method is conceptually simpler to adapt to more complex clustering models beyond Gaussian mixtures.
Smyth, P. Clustering using Monte Carlo Cross-Validation, Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, AAAI Press: Menlo Park, CA. August 1996.
This paper is available in (PostScript) format..
David M. Chickering
Microsoft Research
e-mail: dmax@microsoft.com
Approaches to learning Bayesian networks from data typically combine a scoring function with a heuristic search procedure. Given a Bayesian network structure, many of the scoring functions derived in the literature return a score for the entire equivalence class to which the structure belongs. When using such a scoring function, it is appropriate for the heuristic search algorithm to search over equivalence classes of Bayesian networks as opposed to individual structures. We present the general formulation of a search space for which the states of the search correspond to equivalence classes of structures. Using this space, any one of a number of heuristic search algorithms can easily be applied. We compare greedy search performance in the proposed search space to greedy search performance in a search space for which the states correspond to individual Bayesian network structures.
Chickering, D.M., Learning Equivalence Classes of Bayesian Network Structures, in E. Horvitz and F. Jensen, eds., Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers: San Francisco, CA. August 1996, pp. 150-157.
This paper is available in (PostScript) format.
Nir Friedman
Dept. of Computer Science
Stanford University
Gates Building, 1A
Stanford, CA 94305-9010, USA
e-mail: nir@cs.stanford.edu
Phone: (415) 725-8782
FAX : (415) 725-1449
Moises Goldszmidt
SRI International
333 Ravenswood Way, EK329
Menlo Park, CA 94025, USA
e-mail: moises@erg.sri.com
Phone: (415) 859-4319
FAX : (415) 859-4812
In this paper we examine a novel addition to the known methods for learning Bayesian networks from data that improves the quality of the learned networks. Our approach explicitly represents and learns the local structure in the conditional probability tables (CPTs), that quantify these networks. This increases the space of possible models, enabling the representation of CPTs with a variable number of parameters that depends on the learned local structures. The resulting learning procedure is capable of inducing models that better emulate the real complexity of the interactions present in the data. We describe the theoretical foundations and practical aspects of learning local structures, as well as an empirical evaluation of the proposed method.This evaluation indicates that learning curves characterizing the procedure that exploits the local structure converge faster than these of the standard procedure. Our results also show that networks learned with local structure tend to be more complex (in terms of arcs), yet require less parameters.
Friedman, N., Goldszmidt, M., Learning Bayesian Networks with Local Structure, in E. Horvitz and F. Jensen, eds., Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers: San Francisco, CA. August 1996, pp. 252-262.
This paper is available in PostScript format.
Ron Musick
Lawrence Livermore National Laboratory
P.O. Box 808, L-306
Livermore, CA 94551
rmusick@llnl.gov
Belief networks are a powerful tool for knowledge discovery that provide concise, understandable probabilistic models of data. There are methods grounded in probability theory to incrementally update the relationships described by the belief network when new information is seen, to perform complex inferences over any set of variables in the data, to incorporate domain expertise and prior knowledge into the model, and to automatically learn the model from data. This paper concentrates on part of the belief network induction problem, that of learning the quantitative structure (the conditional probabilities), given the qualitative structure. In particular, the current practice of rote learning the probabilities in belief networks can be significantly improved upon. We advance the idea of applying any learning algorithm to the task of conditional probability learning in belief networks, discuss potential benefits, and show results of applying neural networks and other algorithms to a medium sized car insurance belief network. The results demonstrate from 10 to 100% improvements in model error rates over the current approaches. Our approach explicitly represents and learns the local structure in the conditional probability tables (CPTs), that quantify these networks. This increases the space of possible models, enabling the representation of CPTs with a variable number of parameters that depends on the learned local structures. The resulting learning procedure is capable of inducing models that better emulate the real complexity of the interactions present in the data. We describe the theoretical foundations and practical aspects of learning local structures, as well as an empirical evaluation of the proposed method.This evaluation indicates that learning curves characterizing the procedure that exploits the local structure converge faster than these of the standard procedure. Our results also show that networks learned with local structure tend to be more complex (in terms of arcs), yet require less parameters.
Musick, R. Rethinking the Learning of Belief Network Probabilities, Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, AAAI Press: Menlo Park, CA. August 1996.
This paper is not yet available online.
Kathryn Blackmond Laskey
Department of Systems Engineering
George Mason University
Fairfax, VA 22030-4444 USA
e-mail: klaskey@gmu.edu
Phone: (703) 993-1644
FAX: (703) 993-1706
Laura Martignon
Max Planck Institute fr Psychologische Forschung
Center for Adaptive Behavior and Cognition
80802 Mnchen, Germany
e-mail: laura@mpipf-muenchen.mpg.de
Phone: 49-89-38-602-253
FAX: 49-89-38-602-252
This paper presents a Bayesian approach to learning the connectivity structure of a group of neurons from data on configuration frequencies. A major objective of the research is to provide statistical tools for detecting changes in firing patterns with changing stimuli. Our framework is not restricted to the well-understood case of pair interactions, but generalizes the Boltzmann machine model to allow for higher order interactions. The paper applies a Markov Chain Monte Carlo Model Composition (MC3) algorithm to search over connectivity structures and uses Laplace's method to approximate posterior probabilities of structures. Performance of the methods was tested on synthetic data. The models were also applied to data obtained by Vaadia on multi-unit recordings of several neurons in the visual cortex of a rhesus monkey in two different attentional states. Results confirmed the experimenters' conjecture that different attentional states were associated with different interaction structures.
Laskey, K., Martignon, L., Bayesian Learning of Loglinear Models for Neural Connectivity, in E. Horvitz and F. Jensen, eds., Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers: San Francisco, CA. August 1996, pp. 373-380.
This paper is available in PostScript format.
Paul E. Stolorz
Jet Propulsion Laboratory
California Institute of Technology
Pasadena, CA 91109
pauls@aig.jpl.nasa.gov
Philip C. Chew
University of Pennsylvania
Philadelphia PA
chew@upenn5.hep.upenn.edu
The Monte Carlo method is recognized as a useful tool in learning and probabilistic inference methods common to many datamining problems. Generalized Hidden Markov Models and Bayes nets are especially popular applications. However, the presence of multiple modes in many relevant integrands and summands often renders the method slow and cumbersome. Recent mean field alternatives designed to speed things up have been inspired by experience gleaned from physics. The current work adopts an approach very similar to this in spirit, but focusses instead upon dynamic programming notions as a basis for producing systematic Monte Carlo improvements. The idea is to approximate a given model by a dynamic programming-style decomposition, which then forms a scaffold upon which to build successively more accurate Monte Carlo approximations. Dynamic programming ideas alone fail to account for non-local structure, while standard Monte Carlo methods essentially ignore all structure. However, suitably-crafted hybrids can successfully exploit the strengths of each method, resulting in algorithms that combine speed with accuracy. The approach relies on the presence of significant "local" information in the problem at hand. This turns out to be a plausible assumption for many important applications. Example calculations are presented, and the overall strengths and weaknesses of the approach are discussed.
Stolorz, P.E. and Chew, P.C. Harnessing Graphical Structure in Markov Chain Monte Carlo Learning, Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, AAAI Press: Menlo Park, CA. August 1996.
This paper is not yet available online.