{"title": "Active Inference in Concept Learning", "book": "Advances in Neural Information Processing Systems", "page_first": 45, "page_last": 51, "abstract": null, "full_text": "Active inference in concept learning \n\nJonathan D. Nelson \n\nJavier R. Movellan \n\nDepartment of Cogniti ve Science \nUniversity of California, San Diego \n\nDepartment of Cognitive Science \nUniversity of California, San Diego \n\nLa Jolla, CA 92093-0515 \njnelson@cogsci.ucsd.edu \n\nLa Jolla, CA 92093-0515 \nmovellan@inc.ucsd.edu \n\nAbstract \n\nPeople are active experimenters, not just passive observers, \nconstantly seeking new information relevant to their goals. A \nreasonable approach to active information gathering is to ask \nquestions and conduct experiments that maximize the expected \ninformation gain, given current beliefs (Fedorov 1972, MacKay \n1992, Oaksford & Chater 1994). In this paper we present results \non an exploratory experiment designed to study people's active \ninformation gathering behavior on a concept \ntask \n(Tenenbaum 2000). The results of the experiment are analyzed in \nterms of the expected information gain of the questions asked by \nsubjects. \n\nlearning \n\nIn scientific inquiry and in everyday life, people seek out information relevant to \nperceptual and cognitive tasks. Scientists perform experiments to uncover causal \nrelationships; people saccade to informative areas of visual scenes, turn their head \ntowards surprising sounds, and ask questions to understand the meaning of concepts . \n\nConsider a person learning a foreign language, who notices that a particular word, \n\"tikos,\" is used for baby moose, baby penguins, and baby cheetahs. Based on those \nexamples, he or she may attempt to discover what tikos really means. Logically, \nthere are an infinite number of possibilities. For instance, tikos could mean baby \nanimals, or simply animals, or even baby animals and antique telephones. Yet a \nfew examples are enough for human learners to form strong intuitions about what \nmeanings are most likely. \n\nSuppose you can point to a baby duck, an adult duck, or an antique telephone, to \ninquire whether that object is \"tikos.\" Your goal is to figure out what \"tikos\" \nmeans. Which question would you ask? Why? When the goal is to learn as much as \npossible about a set of concepts, a reasonable strategy is to choose those questions \nwhich maximize the expected information gain, given current beliefs (Fedorov \n1972, MacKay 1992, Oaksford & Chater 1994). \nIn this paper we present \npreliminary results on an experiment designed to quantify the information value of \nthe questions asked by subjects on a concept learning task. \n\n1.1 Tenenbaum's number concept task \n\nTenenbaum (2000) developed a Bayesian model of number concept learning. The \nmodel describes the intuitive beliefs shared by humans about simple number \n\n\fconcepts, and how those beliefs change as new information is obtained, in terms of \nsubjective probabilities. Suppose a subject has been told that the number 16 is \nconsistent with some unknown number concept. With its current parameters, the \nmodel predicts that the subjective probability that the number 8 will also be \nconsistent with that concept is about 0.35 . Tenenbaum (2000) included both \nmathematical and interval concepts in his number concept space. Interval concepts \nwere sets of numbers between nand m, where 1 ::; n ::; 100, and n ::; m ::; 100, such as \nnumbers between 5 and 8, and numbers between 10 and 35. There were 33 \nmathematical concepts: odd numbers, even numbers, square numbers, cube \nnumbers, prime numbers, multiples of n (3 ::; n ::; 12), powers of n (2 ::; n ::; 10), and \nnumbers ending in n (1 ::; n ::; 9). Tenenbaum conducted a number concept learning \nexperiment with 8 subjects and found a correlation of 0.99 between the average \nprobability judgments made by subjects and the model predictions. To evaluate \nhow well Tenenbaum's model described our population of subjects, we replicated \nhis study, with 81 subjects. We obtained a correlation of .87 between model \npredictions and average subject responses. Based on these results we decided to \nextend Tenenbaum's experiment, and allow subjects to actively ask questions about \nnumber concepts, instead of just observing examples given to them. We used \nTenenbaum's model to obtain estimates of the subjective probabilities of the \ndifferent concepts given the examples at hand. We hypothesized that the questions \nasked by subjects would have high information value, when information value was \ncalculated according to the probability estimates produced by Tenenbaum's model. \n\n1.2 \n\nInfomax sampling \n\nConsider the following problem. A subject is given examples of numbers that are \nconsistent with a particular concept, but is not told the concept itself. Then the \nsubject is allowed to pick a number, to test whether it follows the same concept as \nthe examples given. For example, the subject may be given the numbers 2, 6 and 4 \nas examples of the underlying concept and she may then choose to ask whether the \nnumber 8 is also consistent with the concept. Her goal is to guess the correct \nconcept. \n\nWe formalize the problem using standard probabilistic notation: random variables \nare represented with capital letters and specific values taken by those variables are \nrepresented with small letters. The random variable C represents the correct concept \non a given trial. Notation of the form \"C=c\" is shorthand for the event that the \nrandom variable C takes the specific value c. We represent the examples given to \nthe subjects by the random vector X. The subject beliefs about which concepts are \nprobable prior to the presentation of any examples is represented by the probability \nfunction p(e = c). The subject beliefs after the examples are presented is \nrepresented by p(e = c I X = x). For example, if c is the concept even numbers \nand x the numbers \"2, 6, 4\", then p(e = c I X = x) represents subjects' posterior \nprobability that the correct concept is even numbers, given that 2, 6, and 4 are \npositive examples of that concept. The binary random variable Y n represents \nwhether the number n is a member of the correct concept. For example, Y8 =1 \nrepresents the event that 8 is an element of the correct concept, and Y 8= 0 the event \nthat 8 is not. In our experiment subjects are allowed to ask a question of the form \n\"is the number n an element of the concept ?\". This is equivalent to finding the value \ntaken by the random variable Yn , in our formalism. \n\nWe evaluate how good a question is in terms of the information about the correct \nconcept expected for that question, given the example vector X=x. The expected \ninformation gain for the question \"Is the number n an element of the concept?\" is \ngiven by the following formula: \n\n\fI(C'Yn IX =x)=H(CIX =x)-H(CIYn,X =x), \n\nwhere H(C I X = x)is the uncertainty (entropy) about of the concept C given the \nexample numbers in x \n\nH(CIX =x)=-[P(C=cIX =x) log2 P(C=cIX =x), \n\nc \n\nand \n\n-\n\n[P(C=cIX = x) [P(Yn =vIC=c,X =x) log2P(C=cIYn =v,X =x), \nCEC \n\nv=o \n\n1 \n\nis the uncertainty about C given the active question Yn and the example vector x. \nWe consider only binary questions, of the form \"is n consistent with the concept?\" \nso the maximum information value of any question in our experiment is one bit. \nNote how information gain is relative to a probability model P of the subjects' \ninternal beliefs. Here we approximate \nthese subjective probabilities using \nTenenbaum's (2000) number concept model. \n\nAn information-maximizing strategy (infomax) prescribes asking the question with \nthe highest expected information gain, e.g., the question that minimizes the expected \nentropy, over all concepts. Another strategy of interest is confirmatory sampling, \nwhich consists of asking questions whose answers are most likely to confirm current \nbeliefs. In other domains it has been proposed that subjects have a bias to use \nconfirmatory strategies regardless of their information value (Klayman & Ha 1987, \nPopper 1959, Wason 1960). Thus, it is interesting to see whether people use a \nconfirmatory strategy on our concept learning task and to evaluate how informative \nsuch a strategy would be. \n\n2 Human sampling in the number concept game \n\nTwenty-nine undergraduate students, recruited from Cognitive Science Department \nclasses at the University of California, San Diego, participated in the experiment. 1 \nSubjects gave informed consent, and received either partial course credit for \nrequired study participation, or extra course credit, for their participation. The \nexperiment began with the following instructions: \n\nOften it is possible to have a good idea about the state of the world, without \ncompletely knowing it. People often learn from examples, and this study explores \nhow people do so. In this experiment, you will be given examples of a hidden \nnumber rule. These examples will be randomly chosen from the numbers between 1 \nand 100 that follow the rule. The true rule will remain hidden, however. Then you \nwill be able to test an additional number, to see if it follows that same hidden rule. \nFinally, you will be asked to give your best estimation of what the true hidden rule \nis, and the chances that you are right. For instance, if the true hidden rule were \n\"multiples of 11 \", you might see the examples 22 and 66. If you thought the rule \nwere \" multiples of 1 I \", but also possibly \"even numbers \", you could test a number \nof your choice, between 1-100, to see if it also follows the rule. \n\n1 Full stimuli are posted at http://hci.ucsd.edul-jnelson/pages/study.html \n\n\fOn each trial subjects first saw a set of examples from the correct concept. For \ninstance, if the concept were even numbers, subjects might see the numbers \"2, 6, 4\" \nas examples. Subjects were then given the opportunity to test a number of their \nchoice. Subjects were given feedback on whether the number they tested was an \nelement of the correct concept. \nWe wrote a computer program that uses the probability estimates provided by \nTenenbaum (2000) model to compute the information value of any possible question \nin the number concept task. We used this program to evaluate the information value \nof the questions asked by subjects, the questions asked by an infomax strategy, the \nquestions asked by a confirmatory strategy, and the questions asked by a random \nsampling strategy. The infomax strategy was determined as described above. The \nrandom strategy consisted of randomly testing a number between 1 and 100 with \nequal probability. The confirmatory strategy consisted of testing the number \n(excluding the examples) that had the highest posterior probability, as given by \nTenenbaum's model, of being consistent with the correct concept. \n\n3 Results \n\nResults for nine representative trials are discussed. The trials are grouped into three \ntypes, according to the posterior beliefs of Tenenbaum's model, after the example \nnumbers have been seen. The average information value of subjects' questions, and \nof each simulated sampling strategy, are given in Table 1. The specific questions \nsubjects asked are considered in Sections 3.1-3.3. \n\nTrial type \n\nSingle example, high \n\nuncertainty \n\nMultiple example, \nlow uncertainty \n\nInterval \n\nExamples \n\n16 \n\n60 \n\n37 \n\n16, 8, 60,80, 81,25, 16,23, 60,51, 81,98, \n2, 64 10,30 4, 36 19,20 57, 55 96, 93 \n\nSubjects \n\nInfomax \n\nConfirmation \n\nRandom \n\n.70 \n\n.97 \n\n.97 \n\n.35 \n\n.72 \n1.00 \n\n1.00 \n\n.54 \n\n.73 \n\n1.00 \n\n1.00 \n\n.52 \n\n.00 \n\n.01 \n\n.00 \n\n.00 \n\n.06 \n\n.32 \n\n.00 \n\n.04 \n\n0.00 \n\n0.00 \n\n0.00 \n\n0.00 \n\n.47 \n\n1.00 \n\n0.00 \n\n.17 \n\n.37 \n\n.99 \n\n0.00 \n\n.20 \n\n.31 \n\n1.00 \n\n0.00 \n\n.14 \n\nTable 1. \nTenenbaum's number concept model, of several sampling strategies. \nvalue is measured in bits. \n\nInformation value, as assessed using the subjective probabilities in \nInformation \n\n3.1 Single example, high uncertainty trials \n\nOn these trials Tenenbaum's model is relatively uncertain about the correct concepts \nand gives some probability to many concepts. Interestingly, the confirmatory \nstrategy is identical to the infomax strategy on each of these trials, suggesting that a \nconfirmatory sampling strategy may be optimal under conditions of high \nuncertainty. Consider the trial with the example number 16. On this trial, the \nconcepts powers of 4, powers of 2, and square numbers each have moderate \nposterior probability (.28, .14, and .09, respectively). \n\nThese trials provided the best qualitative agreement between infomax predictions \nand subjects' sampling behavior. Unfortunately the results are inconclusive since on \nthese trials both infomax and confirmatory strategy make the same predictions. On \nthe trial with the example number 16, subjects' modal response (8 of 29 subjects) \n\n\fwas to test the number 4. This was actually the most informative question, \naccording to Tenenbaum's model. Several subjects (8 of 29) tested other square \nnumbers, such as 49, 36, or 25, which also have high information value, relative to \nTenenbaum's number concept model (Figure 1). Subjects' questions also had a high \ninformation value on the trial with the example number 37, and the trial with the \nexample number 60. \n\n1 \n\n---------------------------------------------------------------------------_. \n\n0.5 \n\n-\n\n-\n\n-\n\n--\n\n-- -- --- - ----- ----- ----- --\n\n-\n\n----- ----- --\n\nII \n\no \n\nI \n5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 \n\nIIII 11111 \n\nII. 1.1. .L .I. \n\nJ I I I I I II \n\nFigure 1. \nnumber 16 is consistent with the correct concept. \n\nInformation value of sampling each number, in bits, given that the \n\n3.2 Multiple example, low uncertainty trials \n\nOn these trials Tenenbaum's model gives a single concept very high posterior \nprobability. When there is little or no information value in any question, infomax \nmakes no particular predictions regarding which questions are best. Most subjects \ntested numbers \nthat were consistent with the most likely concept, but not \nspecifically given as examples. This behavior matches the confirmatory strategy. \n\nOn the trial with the examples 81, 25,4, and 36, the model gave probability 1.00 to \nthe concept square numbers. On this trial, the most commonly tested numbers were \n49 (11 of 29 subjects) and 9 (4 of 29 subjects). No sample is expected to be \ninformati ve on this trial, because overall uncertainty is so low. \n\nOn the trial with the example numbers 60, 80, 10, and 30, the model gave \nprobability .94 to the concept multiples of 10, and probability .06 to the concept \nmultiples of 5. On this trial, infomax tested odd multiples of 5, such as 15, each of \nwhich had expected information gain of 0.3 bits. The confirmatory strategy tested \nnon-example multiples of 10, such as 50, and had an information value of Obits. \nMost subjects (17 of 29) followed the confirmatory strategy; some subjects (5 of 29) \nfollowed the infomax strategy. \n\n3.3 \n\nInterval trials \n\nIt is desirable to consider situations in which: (1) the questions asked by the \ninfomax strategy are different than the questions asked by the confirmatory strategy, \nand (2) the choice of questions matters, because some questions have high \ninformation value. Trials in which the correct concept is an interval of numbers \nprovide such situations. Consider the trial with the example numbers 16, 23, 19, \nand 20. On this trial, and the other \"interval\" trials, the concept learning model is \ncertain that the correct concept is of the form numbers between m and n, because the \nobserved examples rule out all the other concepts. However, the model is not \ncertain of the precise endpoints, such as whether the concept is numbers between 16 \nand 23, or numbers between 16 and 24, etc. \nInfomax tests numbers near to, but \noutside of, the range spanned by the examples, such as 14 or 26, in this example \n(See Figure 2 at left). \n\n\fWhat do subjects do? Two patterns of behavior, each observed on all three interval \ntrials, can be distinguished . The first is to test numbers outside of, but near to, the \nrange of observed examples. On the trial with example numbers between 16 and 23, \na total of 15 of 29 subjects tested numbers between 10-15, or 24-30. This behavior \nis qualitatively similar to infomax. \nThe second pattern of behavior, which is shown by about one third of the subjects, \nconsists of testing (non-example) numbers within the range spanned by the observed \nexamples. If one is certain that the concept at hand is an interval then asking about \nnumbers within the range spanned by \nthe observed examples provides no \ninformation (Figure 2 at left). Yet some subjects consistently ask about these \nnumbers. Based on this surprising result, we went back to the results of Experiment \n1, and reanalyzed the accuracy of Tenenbaum's model on trials in which the model \ngave high probability to interval concepts. We found that on such trials the model \nsignificantly deviated from the subjects' beliefs. \nIn particular, subjects gave \nprobability lower than one that non-example numbers within the range spanned by \nobserved examples were consistent with the true concept. The model, however, \ngives all numbers within the range spanned by the examples probability 1. See \nFigure 2, at right, and note the difference between subjective probabilities (points) \nand the model' s estimate of these probabilities (solid line). We hypothesize that the \napparent uninformativeness of the questions asked by subjects on these trials is due \nto imperfections in the current version of Tenenbaum's model, and are working to \nimprove the model's descriptive accuracy, to test this hypothesis. \n\n0.5 \n\no \n\n10 \n\n20 \n\n30 \n\n40 \n\n50 \n\n10 \n\n20 \n\n30 \n\n40 \n\n50 \n\nInformation value, relative to Tenenbaum's model, of sampling each \nFigure 2. \nnumber, given the example numbers 16, 23, 19, and 20 (left). In this case the model \nis certain that the correct concept is some interval of numbers; thus, it is not \ninformative to ask questions about numbers within the range spanned by that \nexamples. At right, the probability that each number is consistent with the correct \nconcept. Our subjects' mean probability rating is denoted with points (n = 81 , from \nour first experiment). Tenenbaum's model's approximation of these probabilities is \ndenoted with the solid line. \n\n4 Discussion \n\nThis paper presents exploratory work in progress that attempts to analyze active \ninference from the point of view of the rational approach to cognition (Anderson, \n1990; Oaksford and Chater, 1994). \n\nFirst we performed a large scale replication of Tenenbaum's number concept \nexperiment (Tenenbaum, 2000), in which subjects estimated the probability that \neach of several test numbers were consistent with the same concept as some \nexample numbers . We found a correlation of 0.87 between our subjects' average \nprobability estimates and the probabilities predicted by Tenenbaum' s model. We \nthen extended Tenenbaum's experiment by allowing subjects to ask questions about \nthe concepts at hand. Our goal was to evaluate the information value of the \n\n\fthat in some situations, a simple \nquestions asked by subjects. We found \nconfirmatory strategy maximizes information gain. We also found that the current \nversion of Tenenbaum's number concept model has significant imperfections, which \nlimit its ability to estimate the informativeness of subjects' questions. We expect \nthat modifications to Tenenbaum's model will enable info max to predict sampling \nbehavior in the number concept domain. We are performing simulations to explore \nthis point. We are also working to generalize the infomax analysis of active \ninference to more complex and natural problems. \n\nAcknowledgments \n\nWe thank Josh Tenenbaum, Gedeon Deak, Jeff Elman, Iris Ginzburg, Craig \nMcKenzie, and Terry Sejnowski for their ideas; and Kent Wu and Dan Bauer for \ntheir help in this research. The first author was partially supported by a Pew \ngraduate fellowship during this research. \n\nReferences \n\nAnderson, J. R. (1990). The adaptive character of thought. New Jersey: Erlbaum. \n\nFedorov, V. V. (1972). Theory of optimal experiments. New York: Academic Press. \nKlayman, J.; Ha, Y. \n(1987). Confirmation, disconfirmation, and information in \nhypothesis testing. Psychological Review, 94, 211-228. \n\nMacKay, D. J. C. (1992). \nselection. Neural Computation, 4, 590-604. \n\nInformation-based objective functions for active data \n\nOaksford, M.; Chater, N. (1994). A rational analysis of the selection task as optimal \ndata selection. Psychological Review, 101, 608-631. \n\nPopper, K. R. (1959). The logic of scientific discovery. London: Hutchnison . \nTenenbaum, J. B. (2000). Rules and similarity in concept learning. In Advances in \nNeural Information Processing Systems, 12, Solla, S. A., Leen, T. K., Mueller, K.(cid:173)\nR. (eds.), 59-65. \n\nWason, P. C. (1960) . On the failure to eliminate hypotheses in a conceptual task. \nQuarterly Journal of Experimental Psychology. 12, 129-140. \n\n\f", "award": [], "sourceid": 1931, "authors": [{"given_name": "Jonathan", "family_name": "Nelson", "institution": null}, {"given_name": "Javier", "family_name": "Movellan", "institution": null}]}