Spatial Pyramid Matching for Scene Classification

CMU16-720, Computer Vision

September 2020

Abstract: 

Given a set of images, the goal was to determine the location of the scenes using Spatial Pyramid Matching. This representation is based off of the bag of visual words approach.

Process: 

I started by applying a filter bank to each our images to tease out the high frequency signals. The filters consisted of  variations of the Gaussian and Gaussian-Laplace filters. From here, I took samples of pixels from the filter responses and passed it to a K-mean clustering algorithm to generate a 'visual words' dictionary.  This was done over thousands of training images to generate a model where I could create word maps describing each scene image. Finally I developed a recognition model using the spatial pyramid matching technique which mapped each test image to the closest historgram describing the image.

Results: 

To the right are some examples of the word maps generated using this model. I was able to achieve an accuracy of 65.5% using this image classification model.