The RBF Total Output Uncertainty Measure - The Active Learning Procedure

The Active Learning Procedure

A.2.3 The RBF Total Output Uncertainty Measure

The total output uncertainty cost function is simply the expected EISD between ^g and its new estimate ^^gⁿ⁺¹, if the learner samples next at ^xⁿ⁺¹^~ . The cost function is given by Equation 4.7. We rewrite the expression below in terms of our RBF model parameters:

U(^^gⁿ⁺¹^jDⁿ^;^xⁿ⁺¹^~ ) =^Z and averaged over all possible values of^yⁿ⁺¹ at^xⁿ⁺¹^~ . Recall from Equation A.22 however, that for this RBF concept class, the EISD between^gand its estimate ^^gdepends only on the input ^x^~ⁱ values in ^Dⁿ and not on the observed ^yⁱ values. This means that Ê^F[(^^~â;^~â)^jDⁿ^[ (^xⁿ⁺¹^~ ^;^yⁿ⁺¹)], the new EISD resulting from sampling next at ^xⁿ⁺¹^~ , does not depend on

n+1! Equation A.28 can therefore be further simplied, which leads to the following closed form expression for the total output uncertainty cost function, given also in Equation 4.17:

U(^^gⁿ⁺¹^jDⁿ^;^xⁿ⁺¹^~ ) = Ê^F[(^^~â;^~â)^jDⁿ^[(^xⁿ⁺¹^~ ^;^yⁿ⁺¹)]

n+1 2<

P(^yⁿ⁺¹^j^xⁿ⁺¹^~ ^;^Dⁿ)^dyⁿ⁺¹

= Ê^F[(^^~â;^~â)^jDⁿ^[(^xⁿ⁺¹^~ ^;^yⁿ⁺¹)]

= ^jⁿ⁺¹

A

^j ^/ ^jⁿ⁺¹^j (A.29)

Here, ⁿ⁺¹ has exactly the same form as ⁿ in Equation A.22, and depends only on the polynomial function class priors ^F, the^Kxed Gaussian RBF kernels^fGⁱ()^ji= 1^;^:^:^:;^Kg, the output noise variance ^s² and the data input locations

fx~

;x~

;:::;x~

;x~

n+1 g.

Bibliography

[1] Y. Abu-Mostafa. The Vapnik-Chervonenkis Dimension: Information versus Complex-ity in Learning. Neural Computation, 1(3):312{317, 1989.

[2] Y. Abu-Mostafa. A Method for Learning from Hints. In S. J. Hanson, Jack D. Cowan, and C. Lee Giles, editors, Advances in Neural information processings systems 5, pages 73{80, San Mateo, CA, 1992. Morgan Kaufmann Publishers.

[3] Y.S. Abu-Mostafa. Hints and the VC-dimension. Neural Computation, 5:278{288, 1993.

[4] S. Ahmad and S. Omohundro. A Network for Extracting the Locations of Point Clus-ters using Selective Attention. Technical Report TR 90-011, International Computer Science Institute, University of California, Berkeley, 1990.

[5] D. Angluin. Learning k-term DNF Formulas using Queries and Counterexamples.

Technical Report YALU/DCS/RR-559, Yale University, Department of Computer Science, 1987.

[6] D. Angluin. Queries and Concept Learning. Machine Learning, 2(4):319{342, April 1988.

[7] Alan Bennett and Ian Craw. Finding Image Features Using Deformable Templates and Detailed Prior Statistical Knowledge. In Proc. British Machine Vision Confer-ence, pages 233{239, 1991.

[8] M. Bertero. Regularization Methods for Linear Inverse Problems. In C. Talenti, editor, Inverse Problems. Springer-Verlag, Berlin, 1986.

[9] M. Betke and N. Makris. Fast Object Recognition in Noisy Images using Simulated Annealing. In Proceedings of the International Conference on Computer Vision, pages 523{530, Cambridge, MA, June 1995.

[10] D. Beymer, A. Shashua, and T. Poggio. Example Based Image Analysis and Synthesis.

A.I. Memo No. 1431, Articial Intelligence Laboratory, Massachusetts Institute of Technology, 1993.

[11] David Beymer and Tomaso Poggio. Face Recognition from One Example View. In Proceedings of the International Conference on Computer Vision, Cambridge, MA, 1995.

[12] David J. Beymer. Face Recognition under Varying Pose. A.I. Memo No. 1461, Arti-cial Intelligence Laboratory, Massachusetts Institute of Technology, 1993.

[13] David J. Beymer. Vectorizing Face Images by Interleaving Shape and Texture Com-putations. A.I. Memo No. 1537, Articial Intelligence Laboratory, Massachusetts Institute of Technology, 1995.

[14] B. Boser, I. Guyon, and V. Vapnik. A Training Algorithm for Optimal Margin Classi-ers. In Fifth Annual Workshop on Computational Learning Theory, pages 144{152, Pittsburg, PA, 1992. ACM.

[15] Thomas M. Breuel. An Ecient Correspondence based Algorithm for 2D and 3D Model based Recognition. A.I. Memo No. 1259, Articial Intelligence Laboratory, Massachusetts Institute of Technology, 1993.

[16] J. Bromley and E. Sackinger. Neural-network and k-Nearest-Neighbor Classiers.

Technical Report 11359-910819-16TM, AT&T, 1991.

[17] R. Brooks. Symbolic Reasoning among 3-D Models and 2-D Images. Articial Intel-ligence, 17, August 1981.

[18] R. Brooks, G. Russell, and T. Binford. The Acronym Model based Vision System. In Proceedings IJCAI, pages 105{113, 1979.

[19] R. Brunelli and T. Poggio. Face Recognition: Features versus Templates. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(10):1042{1052, 1993.

[20] B. Buchanan and E. Shortlie. Rule-Based Expert Programs: the MYCIN Experiments of the Stanford Heuristic Programming Project. Addison-Wesley, Reading, MA, 1984.

[21] W. Buntine. A Guide to the Literature on Learning Graphical Models. Technical Report IC-95-05, NASA Ames Research Center, 1995.

[22] M. C. Burl, U. Fayyad, P. Perona, P. Smyth, and M. P. Burl. A Trainable Tool for Finding Small Volcanoes in SAR Imagery of Venus. Technical Report CNS TR 34, California Institute of Technology, October 1993.

[23] I. Chakravarty and H. Freeman. Characteristic Views as a Basis for Three-Dimensional Object Recognition. In SPIE Vol. 336, Robot Vision, pages 37{45, 1982.

[24] D. Cohn. A Local Approach to Optimal Queries. In D. Touretzky, editor, Proc. of 1990 Connectionist Summer School, San Mateo, CA, 1991. Morgan Kaufmann Publishers.

[25] T.F. Cootes, C.J. Taylor, A. Lanitis, D.H. Cooper, and J. Graham. Building and Using Flexible Models Incorporating Grey-level Information. In Proceedings of the International Conference on Computer Vision, pages 242{246, Berlin, May 1993.

[26] C. Cortes and V. Vapnik. Support Vector Networks. Machine Learning (To Appear), 1995.

[27] Ian Craw, David Tock, and Alan Bennett. Finding Face Features. In Proceedings Computer Vision - ECCV, pages 92{96, 1992.

[28] Y. Le Cun, B. Boser, J. Denker, D. Henderson, R. Howard, W. Hubbard, and L. Jackel.

Backpropagation applied to Handwritten Zip Code Recognition. Neural Computation, 1:541{551, 1989.

[29] Y. Le Cun, B. Boser, J. Denker, D. Henderson, R. Howard, W. Hubbard, and L. Jackel.

Handwritten Digit Recognition with a Back-propagation Network. In Advances in Neural Information Processing Systems, volume 2, pages 396{404, 1990.

[30] H. Drucker, R. Schapire, and P. Simard. Boosting Performance in Neural Networks.

Interrnational Journal of Pattern Recognition and Articial Intelligence, 7(4):705{

719, 1993.

[31] Richard O. Duda and Peter E. Hart. Pattern Classication and Scene Analysis, chapter 6, pages 211{257. John Wiley and Sons Inc., 1973.

[32] Richard O. Duda and Peter E. Hart. Pattern Classication and Scene Analysis. John Wiley and Sons Inc., New York, 1973.

[33] V. Fedorov. Theory of Optimal Experiments, page 35. Academic Press, New York, 1972.

[34] David Forsyth, Joseph L. Mundy, Andrew Zisserman, Chris Coelho, Aaron Heller, and Charles Rothwell. Invariant Descriptors for 3-D Object Recognition and Pose.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(10):971{991, 1991.

[35] K. Fukunaga. Introduction to Statistical Pattern Recognition. Academic Press, 1990.

[36] Stuart Geman and Don Geman. Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6:721{741, 1984.

[37] Ziv Gigus and Jitendra Malik. Computing the Aspect Graph for Line Drawings of Polyhedral Objects. IEEE Transactions on Pattern Analysis and Machine Intelli-gence, 12(2):113{122, February 1990.

[38] W. E. L. Grimson and T. Lozano-Perez. Model-Based Recognition and Localiza-tion from Sparse Range Data. In A. Rosenfeld, editor, Techniques for 3-D Machine Perception. North-Holland, Amsterdam, 1985.

[39] W. Eric L. Grimson and Tomas Lozano-Perez. Model-Based Recognition and Local-ization from Sparse Range or Tactile Data. Technical Report A.I. Memo 738, MIT, August 1983.

[40] W. Eric L. Grimson and Tomas Lozano-Perez. Recognition and Localization of Over-lapping Parts from Sparse Data in Two and Three Dimensions. 1984.

[41] W. Eric L. Grimson and Tomas Lozano-Perez. Recognition and Localization of Over-lapping Parts from Sparse Data. In T. Kanade, editor, Three-Dimensional Vision Systems. Kluwer Academic Publishers, Amsterdam, 1985.

[42] W. Eric L. Grimson and Tomas Lozano-Perez. Localizing Overlapping Parts by Searching the Interpretation Tree. IEEE Transactions on Pattern Analysis and Ma-chine Intelligence, 9(4):467{482, 1987.

[43] J. Hadamard. La Theorie des Equations aux Derivees Partielles. Editions Scientique, Pekin, 1964.

[44] D. Heckerman. A Tutorial on Learning Bayesian Networks. Technical Report MSR-TR-95-06, Microsoft Research, Advanced Technology Division, Redmond, WA, 1995.

[45] G. Hinton, M. Revow, and P. Dayan. Recognizing Handwritten Digits using Mixture of Linear Models. In D. Touretzky G. Tesauro and J. Alspector, editors, Advances in Neural Information Processings Systems 7, San Mateo, CA, 1995. Morgan Kaufman.

[46] R. Howard and J. Matheson. Inuence Diagrams. In R. Howard and J. Matheson, editors, Readings on the Principles and Applications of Decision Analysis, volume II, pages 721{762. Strategic Decisions Group, Menlo Park, CA, 1981.

[47] J. Hwang, J. Choi, S. Oh, and R. Marks. Query Learning based on Boundary Search and Gradient Computation of Trained Multi-layer Perceptrons. In Proceedings IJCNN, San Diego, CA, 1990. IEEE Press.

[48] Katsushi Ikeuchi and Takeo Kanade. Applying Sensor Models to Automatic Genera-tion of Object RecogniGenera-tion Programs. In Proceedings of the InternaGenera-tional Conference on Computer Vision, pages 228{237, Tampa, FL, December 1988.

[49] D. Jacobs. The Use of Grouping in Visual Object Recognition. Master's thesis, Massachusetts Institute of Technology, 1988.

[50] M. Kearns and L. Valiant. Cryptographic Limitations on Learning Boolean Formulae and Finite Automata. In Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing, pages 443{444, May 1989.

[51] M. Kirby and L. Sirovich. Applications of the Karhunen-Loeve Procedure for the Characterization of Human Faces. IEEE Transactions on Pattern Analysis and Ma-chine Intelligence, 12(1):103{108, 1990.

[52] J. J. Koenderink and A. J. van Doorn. The Internal Representation of Solid Shape with Respect to Vision. Biological Cybernetics, 32:211{216, 1979.

[53] Matthew R. Korn and Charles R. Dyer. 3-D Multiview Object Representations for Model-based Object Recognition. Pattern Recognition, 20(1):91{103, 1987.

[54] Y. Lee. Handwritten Digit Recognition using K Nearest-Neighbor, Radial-Basis Func-tion and BackpropagaFunc-tion Neural Networks. Neural ComputaFunc-tion, 3:440{449, 1991.

[55] T. Leung, M. Burl, and P. Perona. Finding Faces in Cluttered Scenes using Random Labeled Graph Matching. In Proceedings of the International Conference on Computer Vision, pages 637{644, Cambridge, MA, June 1995.

[56] R. Lindsay, B. Buchanan, E. Feigenbaum, and J. Lederberg. Applications of Articial Intelligence for Chemical Inference: The DENDRAL Project. McGraw-Hill, New York, 1980.

[57] David G. Lowe. Perceptual Organization and Visual Recognition. Kluwer Academic Publishers, Boston, 1985.

[58] D. MacKay. Bayesian Methods for Adaptive Models. PhD thesis, California Institute of Technology, Pasadena, CA, 1992.

[59] J. Marroquin, S. Mitter, and Tomaso Poggio. Probabilistic Solution of Ill-posed Prob-lems in Computational Vision. In Proceedings Image Understanding Workshop, pages 293{309, Miami Beach, FL, December 1985.

[60] G. Martin and J. Pittman. Recognizing Hand-printed Letters and Digits using Back-propagation Learning. Neural Computation, 3:258{267, 1991.

[61] B. Moghaddam and A. Pentland. Probabilistic Visual Learning for Object Detection.

In Proceedings of the International Conference on Computer Vision, pages 786{793, Cambridge, MA, June 1995.

[62] J. Moody and C. Darken. Fast Learning in Networks of Locally Tuned Processing Units. Neural Computation, 1(2):281{294, 1989.

[63] V. Morozov. Methods of Solving Incorrectly posed Problems. Springer-Verlag, Berlin, 1984.

[64] Hiroshi Murase and Shree K. Nayar. Learning Object Models from Appearance. In Proceedings AAAI, pages 836{843, Washington, DC, 1993.

[65] P. Niyogi. The Informational Complexity of Learning from Examples. PhD thesis, Massachusetts Institute of Technology, Cambridge, MA, 1995.

[66] P. Niyogi and F. Girosi. On the Relationship between Generalization Error, Hypothe-sis Complexity, and Sample Complexity for Radial BaHypothe-sis Functions. Technical Report AIM{1467, Articial Intelligence Laboratory, Massachusetts Institute of Technology, 1993.

[67] D. Parker. Learning Logic. Technical Report TR-47, Center of Computational Re-search in Economics and Management Science, MIT, 1985.

[68] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Infer-ence. Morgan-Kaufmann, San Mateo, CA, 1988.

[69] A. Pentland, B. Moghaddam, and T. Starner. View-based and Modular Eigenspaces for Face Recognition. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition, pages 84{91, June 1994.

[70] M. Plutowski and H. White. Active Selection of Training Examples for Network Learn-ing in Noiseless Environments. Technical Report CS91-180, Department of Computer Science and Engineering, University of California, San Diego, 1991.

[71] T. Poggio and F. Girosi. A Theory of Networks for Approximation and Learning.

Technical Report AIM{1140, Articial Intelligence Laboratory, Massachusetts Insti-tute of Technology, 1989.

[72] T. Poggio and F. Girosi. Extensions of a Theory of Networks for Approximation and Learning: Outliers and Negative Examples. Technical Report AIM{1220, Articial Intelligence Laboratory, Massachusetts Institute of Technology, 1990.

[73] T. Poggio and T. Vetter. Recognition and Structure from One (2D) Model View:

Observations on Prototypes, Object Classes, and Symmetries. A.I. Memo No. 1347, Articial Intelligence Laboratory, Massachusetts Institute of Technology, 1992.

[74] Jean Ponce and David J. Kriegman. Computing Exact Aspect Graphs of Curved Objects: Parametric Surfaces. In Proceedings AAAI, pages 1074{1079, 1990.

[75] D. Reisfeld, H. Wolfson, and Y. Yeshurun. Detection of Interest Points using Symme-try. In Proceedings of the International Conference on Computer Vision, pages 62{65, Dec 1990.

[76] H. Rowley, S. Baluja, and T. Kanade. Human Face Detection in Visual Scenes.

Technical Report CMU-CS-95-158, Carnegie Mellon University, 1995.

[77] D. Rumelhart and J. McClelland. Parallel Distributed Processing, volume 1. MIT Press, Cambridge, Massachusetts, 1986.

[78] C. Sammut and R. Banerji. Learning Concepts by Asking Questions. In J. Carbonell R. Michalski and T. Mitchell, editors, Machine Learning: An Articial Intelligence Approach (Vol. 2). Morgan Kaufmann, Los Altos, CA, 1986.

[79] R. Schapire. The Strength of Weak Learnability. Machine Learning, 5(2):197{227, 1990.

[80] B. Schoelkopf, C. Burges, and V. Vapnik. Extracting Support Data for a Given Task.

In International Conference on Knowledge Discovery and Data Mining, Menlo Park, CA, 1995. AAAI Press.

[81] E. Shapiro. Algorithmic Program Disgnosis. In Proceedings of Ninth ACM Symposium on Principles of Programming Languages, pages 299{308, Albuquerque, NM, 1982.

The Association of Computing Machinery.

[82] E. Shapiro. Algorithmic Program Debugging. MIT Press, Cambridge, MA, 1983.

[83] P. Simard, Y. Le Cun, and J. Denker. Ecient Pattern Recognition using a New Transformation Distance. In Advances in Neural Information Processing Systems, volume 5, pages 50{58, Denver, Colorado, 1993.

[84] P. Sinha. Object Recognition via Image Invariants: A Case Study. In Investigative Ophthalmology and Visual Science, volume 35, pages 1735{1740, Sarasota, Florida, May 1994.

[85] L. Sirovich and M. Kirby. Low-dimensional Procedure for the Characterization of Human Faces. Journal of the Optical Society of America, 4(3):519{524, March 1987.

[86] P. Sollich. Query Construction, Entropy, Generalization in Neural Network Models.

Physical Review E, 49:4637{4651, 1994.

[87] John Stewman and Kevin Bowyer. Creating the Perspective Projection Aspect Graph of Polyhedral Objects. In Proceedings of the International Conference on Computer Vision, pages 494{500, Tampa, FL, December 1988.

[88] K. Sung and P. Niyogi. Active Learning for Function Approximation. In Advances in Neural Information Processings Systems 7, pages 593{600, Cambridge, MA, 1995.

MIT Press.

[89] K. Sung and T. Poggio. Example-based Learning for View-based Human Face De-tection. In Proceedings Image Understanding Workshop, volume II, pages 843{850, Monterey, CA, November 1994.

[90] K. Sung and T. Poggio. Example-based Learning for View-based Human Face Detec-tion. Technical Report AIM{1521, Articial Intelligence Laboratory, Massachusetts Institute of Technology, December 1994.

[91] Michael J. Swain and Dana H. Ballard. Color Indexing. International Journal of Computer Vision, 7(1):11{32, 1991.

[92] Demetri Terzopoulos and Keith Waters. Analysis and Synthesis of Facial Image Sequences using Physical and Anatomical Models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(6):569{579, 1993.

[93] C. Therrien. Decision, Estimation and Classication. John Wiley and Sons, Inc., 1989.

[94] A. Tikhonov. Solution of Incorrectly Formulated Problems and the Regularization Method. Soviet Math. Dokl., 4:1035{1038, 1963.

[95] A. Tikhonov and V. Arsenin. Solutions of Ill-Posed Problems. W. H. Winston, Washington, DC, 1977.

[96] M. Turk and A. Pentland. Eigenfaces for Recognition. Journal of Cognitive Neuro-science, 3(1):71{86, 1991.

[97] S. Ullman and R. Basri. Recognition by Linear Combinations of Models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(10):992{1006, 1991.

[98] L. Valiant. A Theory of Learnable. Proc. of the 1984 STOC, pages 436{445, 1984.

[99] L. Valiant. Learning Disjunctions of Conjunctions. In Proceedings IJCAI, pages 560{

566, Los Angeles, CA, 1985.

[100] V. Vapnik and A. Chervonenkis. On the Uniform Convergence of Relative Frequencies of Events to their Probabilities. Th. Prob. and its Applications, 17(2):264{280, 1971.

[101] V. Vapnik and A. Chervonenkis. The Necessary and Sucient Conditions for the Uniform Convergence of Averages to their Expected Values. Toeriya Veroyatnostei i Ee Primeneniya, 26(3):543{564, 1981.

[102] V. N. Vapnik. Estimation of Dependences Based on Empirical Data. Springer-Verlag, Berlin, 1982.

[103] G. Yang and T. Huang. Human Face Detection in a Scene. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition, pages 453{458, June 1993.

[104] A. Yuille, P. Hallinan, and D. Cohen. Feature Extraction from Faces using Deformable Templates. International Journal of Computer Vision, 8(2):99{111, 1992.

Dans le document MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I.T.R. No. January, (Page 184-195)