Wednesday, June 15, 2011

Tensors

Bayes Rule

Bayes rule, given below, relates two conditional probabilities. It is one of the most common and powerful ingredient of algorithms in computer vision as well as in many other fields


In any problem, we may want to predict A (it may be a discrete class label or some continuous value) given the observed data B (e.g. our image, or some features calculated from it). What Bayes rule allows us to do is frame this problem in terms of B given A and Prior (or often called Marginal) probabilities of A and B. These distributions could often be learned from training data allowing us to make the prediction.

Bayes rule also helps us understand the difference between a generative vs. discriminative learner. If we somehow learn the probability P(A|B) directly then what we have is a discriminative classifier where we can predict the values for A given B. If instead we learn the probabilities P(B|A) and P(A), then what we have the joint distribution P(A,B) = P(B|A)P(A) and hence a generative classifier.

Friday, January 28, 2011

Books for computer vision

Although almost anything could be found online, including everything about computer vision, a single coherent source is nice to have. A nice book is such a source :). Below are some of the books that are usually recommended, and quite famous, for various areas of computer vision.

  1. Introduction and basics: "Computer Vision: A modern approach" - David A. Forsyth and Jean Ponce
  2. Geometry: "Multiple view geometry in computer vision" - Richard Hartley & Andrew Zisserman
  3. Geometry: "Three-Dimensional computer vision" - Olivier Faugeras
  4. Reference: "Computer Vision: Algorithms and Applications" - Richard Szeliski (Online free version available on website)
Since we often resort to machine learning tools for vision problems, it would be unfair to not recommend some machine learning related books
  1. Good reference: "Pattern Recognition and machine learning" - Christopher M. Bishop
  2. Great textbook: "Machine Learning" - Tom M. Mitchell
  3. Graphical Models: "Probabilistic Graphical Models: Principles and Techniques" - Daphne Koller and Nir Friedman 
More often than not, we will end up in situations that require solving complex mathematical problems. One common case is optimizing some cost function to determine optimal parameter values. Although, we generally resort to using some tool box for such situations (e.g. matlab's "fminsearch()"), following are some books that do a great job at explaining the machinery behind these tools
  1. Optimization: "Convex Optimization" - Stephen Boyd and Lieven Vandenberghe (Online free version)

Monday, January 17, 2011

Learning VRML

Working in vision, one will definitely run into creation of 3D models. VRML (Virtual Reality Modelling Language) is one, although outdated but still used, option. An excellent tutorial could be found here.

For windows there are plenty of free VRML viewers. I had difficulty finding a good VRML viewer for linux. MeshLab was buggy and did not support the standard properly, and many others were a nightmare to compile. Finally, I settled with freeWRL and this log proved priceless during the installation.

VRML is an old standard. There are several new solutions in work for 3D modelling, although I haven't tried them.  X3D is the "official" successor of VRML though I haven't found a good viewer yet. O3D is another option for creating 3D graphics. It is a javascript api for rendering graphics in browser (now they conform to WebGL).

Wednesday, November 24, 2010

Decision Trees

A great resource for learning about decision trees is http://decisiontrees.net/

Tuesday, October 26, 2010

Datasets

Lets say one has come up with an algorithm that tackles a computer vision problem. Let's assume this algorithm is a novel chair detector. To prove that the detector actually works you need to show quantitative and qualitative results. Testing and evaluation of any such computer vision algorithm requires a dataset. Following are the two options that one could take at this point

  1. Compile a completely new data set
    • Pros
      • Existing data sets may not represent or cover the scenarios in which the algorithm is applicable. Hence a new dataset expands this horizon.
    • Cons
      • Lot of work needed to compile a dataset
      • Lot of work needed to compile a good dataset. Ideally it should be an improvement over the existing datasets, eliminating some (or all) of the earlier shortcomings while expanding into new areas. It should not suffer from latent biases that skew the results.
      • Hard to compare the algorithm with results of other state of the art or earlier results from other groups as they would not have been run on new dataset.
  2. Use an existing dataset
    • Pros
      • Other results to compare against. Other groups would have used the dataset and hence will have results which could be used to compare new algorithm
      • Being a existing and used dataset means that there will be lesser problems and several wrinkles flattened in terms of data annotations etc, making it more reliable
    • Cons
      • May not cover the new scenarios for which the new algorithm is applicable or have a bias against them.

Whichever option you choose you need to produce reproducible results. To ensure the usefulness of your work it is beneficial if you provide appropriate code that makes it easy for others to reproduce the results as well. I would recommend referring to Dataset Issues in Object Recognition by Ponce et. al. for a comprehensive study of the issues involved.