### Kinect Resolution

There are many many examples of uses of Microsoft Kinect but I have not found around a description of the resolution of the sensor.

The starting point is the depth information that is natively expressed over 11 bits (2048 values) and it can be mapped to a physical depth by means of an equation. In the following (X,Y,Z) in upper case are the coordinates in image space, while (x,y,z) the coordinates in real space, relative to the Kinect reference frame, expressed in mm. Two similar equations have been proposed online, one based on the function 1/Z, the other using tan(z).

z_l = 1000 / (Z * -0.00307 + 3.3309);

z_t = 1000* 0.1236 .* tan(Z / 2842.5 + 1.1863);

Given this mapping we approximate the resolution by computing the distance between points separated by one unit of raw depth. Simply in MATLAB: z_ld = [0, diff(z_l)];

The following figure shows the resolution based on the two functions, limiting the raw depth to 1024, that corresponds to about 5 meters.

In reality we are more interested to the resolution by distance as follows:

The resolution for the X and Y axis depends instead on the projective transformation. Considering the X and Y in image coordinates with the origin in the center of the image for the VGA format (640x480):

x = (X - 3.3931e+02) * z / 5.9421e+02

y = (Y - 2.4274e+02) * z / 5.9421e+02

From this the resolution along X and Y can be expressed as:

rx = z / 5.9421e+02;

ry = z / 5.9421e+02;

Resulting in the following figure of both Z and XY resolutions:

This corresponds to the known information of resolution at 2m being 10mm for Z and about 3mm for X and Y.

The resolution can be also approximated by polynomial fitting:

rz = p1*z^3 + p2*z^2 + p3*z + p4

rxy = q1*z + q2

With p1 = -8.9997e-012, p2 = 3.069e-006, p3 = 3.6512e-006, p4 = -0.0017512.and q1 = 0.0016829, q2 = 9.1461e-016.

These function could be used to estimate the error in computing velocity by directly tracking single points.

Please contact me for errors or suggestions

Being related to the above transformation it is worth mentioning how to use the Kinect data inside a 3D application in OpenGL. One approach is to extract the world coordinate from Kinect data and then render it as points in model space, allowing a generic camera. The other is to create a model and projection transformation that matches the one of the Kinect and then blit the color and depth information mapping from the 12bit to the depth buffer. The question is which is the OpenGL projection camera that allows this. Unfortunately there is an offset between Kinect RGB and Depth cameras and it is necessary to calibrate them for alignment. Next time I will add the projection matrix.

The starting point is the depth information that is natively expressed over 11 bits (2048 values) and it can be mapped to a physical depth by means of an equation. In the following (X,Y,Z) in upper case are the coordinates in image space, while (x,y,z) the coordinates in real space, relative to the Kinect reference frame, expressed in mm. Two similar equations have been proposed online, one based on the function 1/Z, the other using tan(z).

z_l = 1000 / (Z * -0.00307 + 3.3309);

z_t = 1000* 0.1236 .* tan(Z / 2842.5 + 1.1863);

Given this mapping we approximate the resolution by computing the distance between points separated by one unit of raw depth. Simply in MATLAB: z_ld = [0, diff(z_l)];

The following figure shows the resolution based on the two functions, limiting the raw depth to 1024, that corresponds to about 5 meters.

In reality we are more interested to the resolution by distance as follows:

The resolution for the X and Y axis depends instead on the projective transformation. Considering the X and Y in image coordinates with the origin in the center of the image for the VGA format (640x480):

x = (X - 3.3931e+02) * z / 5.9421e+02

y = (Y - 2.4274e+02) * z / 5.9421e+02

From this the resolution along X and Y can be expressed as:

rx = z / 5.9421e+02;

ry = z / 5.9421e+02;

Resulting in the following figure of both Z and XY resolutions:

This corresponds to the known information of resolution at 2m being 10mm for Z and about 3mm for X and Y.

The resolution can be also approximated by polynomial fitting:

rz = p1*z^3 + p2*z^2 + p3*z + p4

rxy = q1*z + q2

With p1 = -8.9997e-012, p2 = 3.069e-006, p3 = 3.6512e-006, p4 = -0.0017512.and q1 = 0.0016829, q2 = 9.1461e-016.

These function could be used to estimate the error in computing velocity by directly tracking single points.

Please contact me for errors or suggestions

**OpenGL Rendering**Being related to the above transformation it is worth mentioning how to use the Kinect data inside a 3D application in OpenGL. One approach is to extract the world coordinate from Kinect data and then render it as points in model space, allowing a generic camera. The other is to create a model and projection transformation that matches the one of the Kinect and then blit the color and depth information mapping from the 12bit to the depth buffer. The question is which is the OpenGL projection camera that allows this. Unfortunately there is an offset between Kinect RGB and Depth cameras and it is necessary to calibrate them for alignment. Next time I will add the projection matrix.

**Update:**this IROS 2011 paper presents some tests of camera registration compared with a Vicon system. This work is associated the other on extrinsic calibration.**Update 2:**details about calibration and a full course on the topic can be found in this TUM Course. Another interesting work shows how to compute a view dependent 3d projection based on Kinect data.**Update 3**: more recently a formal paper came out on this "Khoshelham, K., & Elberink, S. O. (2012). Accuracy and resolution of kinect depth data for indoor mapping applications.*Sensors*,*12*(2), 1437-1454" (PDF)
## Comments