Knowing both the Field of View (FoV) of a camera’s lens and the dimensions of the object we’d like to measure (Region of Interest, ROI) seems like more than enough to get a distance.
Note, opencv has an extensive suite of actual calibration tools and utilities here.
…But without calibration or much forethought, could rough measurements of known objects even be usable? Some notes from a math challenged individual:
git clone https://github.com/Jesssullivan/misc-roi-distance-notes && cd misc-roi-distance-notes
Most webcams don’t really provide a Field of View much greater than ~50 degrees- this is the value of a MacBook Pro’s webcam for instance. Here’s the plan to get a Focal Length value from Field of View:
So, thinking along the lines of similar triangles:
- Camera angle forms the angle between the hypotenuse side (one edge of the FoV angle) and the adjacent side
- Dimension is the opposite side of the triangle we are using to measure with.
- ^ This makes up the first of two "similar triangles"
- Then, we start measuring: First, calculate the opposite ROI Dimension using the arbitrary Focal Length value we calculated from the first triangle- then, plug in the Actual ROI Dimensions.
- Now the adjacent side of this ROI triangle should hopefully be length, in the the units of ROI’s Actual Dimension.
source a fresh venv to fiddle from:
python3 -m venv distance_venv
# depends are imutils & opencv-contrib-python:
pip3 install -r requirements.txt
The opencv people provide a bunch of prebuilt Haar cascade models, so let’s just snag one of them to experiment. Here’s one to detect human faces, we’ve all got one of those:
wget https://raw.githubusercontent.com/opencv/opencv/master/data/haarcascades/haarcascade_frontalface_alt2.xml -O ./haar/haarcascade_frontalface_alt2.xml
Of course, an actual thing with fixed dimensions would be better, like a stop sign!
Let’s try to calculate the distance as the difference between an actual dimension of the object with a detected dimension- here’s the plan:
YMMV, but YOLO:
# `python3 measure.py`
from cv2 import cv2
DFOV_DEGREES = 50 # such as average laptop webcam horizontal field of view
KNOWN_ROI_MM = 240 # say, height of a human head
# image source:
cap = cv2.VideoCapture(0)
cascade = cv2.CascadeClassifier('./haar/haarcascade_frontalface_alt2.xml')
# Capture & resize a single image:
_, image = cap.read()
image = cv2.resize(image, (0, 0), fx=.7, fy=0.7, interpolation=cv2.INTER_NEAREST)
# Convert to greyscale while processing:
gray_conv = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray_conv, (7, 7), 0)
# get image dimensions:
gray_width = gray.shape
gray_height = gray.shape
focal_value = (gray_height / 2) / math.tan(math.radians(DFOV_DEGREES / 2))
# run detector:
result = cascade.detectMultiScale(gray)
for x, y, h, w in result:
dist = KNOWN_ROI_MM * focal_value / h
dist_in = dist / 25.4
# update display:
cv2.rectangle(image, (x, y), (x + w, y + h), (255, 0, 0), 2)
cv2.putText(image, 'Distance:' + str(round(dist_in)) + ' Inches',
(5, 100), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)
cv2.imshow('face detection', image)
if cv2.waitKey(1) == ord('q'):
run demo with: