Spatial Pyramid Matching for Scene Classification

CMU16-720, Computer Vision

September 2020


Given a set of images, the goal was to determine the location of the scenes using Spatial Pyramid Matching. This representation is based off of the bag of visual words approach.


I started by applying a filter bank to each our images to tease out the high frequency signals. The filters consisted of  variations of the Gaussian and Gaussian-Laplace filters. From here, I took samples of pixels from the filter responses and passed it to a K-mean clustering algorithm to generate a 'visual words' dictionary.  This was done over thousands of training images to generate a model where I could create word maps describing each scene image. Finally I developed a recognition model using the spatial pyramid matching technique which mapped each test image to the closest historgram describing the image.


To the right are some examples of the word maps generated using this model. I was able to achieve an accuracy of 65.5% using this image classification model. 


  • linked_in
  • github logo