I started by trying to build an entire pipeline to see the end results of the 3D points. This way it is easier to see if something is going wrong in one of the steps.
I tried to retrieve the intrinsic parameters of my smartphone’s camera from the image’s metadata, but I was getting weird results. Then I tried to get them manually, but someone in the lab reminded me that smartphones have auto-focus and the parameters change. To avoid messing with my phone, I simply used another camera that was available and not being used by anyone in the lab (an Orbbec Astra Pro) and got the intrinsic parameters using a chessboard and a ROS tool
I used SIFT once more to get the matching points from two viewpoints:
Then, found the Essential matrix → Rotation and Translation → Projection matrices
With the matching points and projection matrices, I triangulated and plotted the points (ignoring the distortion parameters) and got this:
But the Essential matrix is defined only up to scale (in the illustration it is the fundamental matrix, but the essential matrix comes from the fundamental matrix knowing the intrinsic parameters).
Source: https://www.youtube.com/watch?v=izpYAwJ0Hlw&list=PL2zRqk16wsdoCCLpou-dGo7QQNks1Ppzo&index=10
OpenCV has a function that solves this by setting the distance between cameras to 1, which means that the distance between points is relative to the distance between cameras. We can retrieve the real scale factor by having an object in the image of known dimensions.
Using the distortion factors, I got weird results and I haven’t found out yet what I did wrong:
Re-calibrating the camera with more images and still ignoring the distortion parameters, the results are apparently better
Then I experimented with bending the paper like this: