Detecting Whales: Kaggle Right Whale Recognition Challenge

Fork on Github
Kaggle’s NOAA Right Whale Recognition Challenge aims to develop an algorithm to identify individuals of Right Whales, which are critically endangered. It is a great chance to study machine learning and digital image processing although looks to me as a really hard challenge. Anyway I’ve developed this method to detect the whale in the photograph and I’m releasing it in a hope that it may help others.

It takes advantage of the fact that most pictures are pretty plain, with almost all of the area covered by water, and have a smaller region of interest which corresponds to the whale, so the histogram for most of the image will be similar except on the region of interest. The algorithm looks recursively to subimages that have an HSV histogram not similar to the original image’s histogram, marking those regions in white and else on black. Then searches for the biggest continuous region using contours and places a bounding box around it, assuming it’s the whale. The image is called “extract” and is saved along the black & white mask.

Check the code in Github. Uses Python 2.7 and OpenCV 3.0.

Original Image:
whale

Whale found:
whale

Areas found mask:
whale

ROI Mask:
whale

ROI Extract:
whale