3D Computational Imaging with Single-Pixel Detectors

See allHide authors and affiliations

Science  17 May 2013:
Vol. 340, Issue 6134, pp. 844-847
DOI: 10.1126/science.1234454

Cheap Pix

Three-dimensional (3D) images can be captured by, for example, holographic imaging or stereoimaging techniques. To avoid using expensive optical components that are limited to specialized bands of wavelengths, Sun et al. (p. 844; see the Perspective by Faccio and Leach) projected pulses of randomly textured light onto an object. They were able to reconstruct an image of the 3D object by detecting the reflected light with several photodetectors without any need for lenses. The patterned light beams can thus in principle be substituted for light sources of any wavelength.


Computational imaging enables retrieval of the spatial information of an object with the use of single-pixel detectors. By projecting a series of known random patterns and measuring the backscattered intensity, it is possible to reconstruct a two-dimensional (2D) image. We used several single-pixel detectors in different locations to capture the 3D form of an object. From each detector we derived a 2D image that appeared to be illuminated from a different direction, even though only a single digital projector was used for illumination. From the shading of the images, the surface gradients could be derived and the 3D object reconstructed. We compare our result to that obtained from a stereophotogrammetric system using multiple cameras. Our simplified approach to 3D imaging can readily be extended to nonvisible wavebands.

Computational imaging based on projected patterns is an alternative technique to conventional imaging and removes the need for a spatially resolving detector. Instead, this form of computational imaging infers the scene by correlating the known spatial information of a changing incident light field with the total reflected (or transmitted) intensity. For example, two copies of a randomly generated light field can be made with a beam splitter; one copy of the light field interacts with the object and a non–spatially resolving detector, and the other copy is recorded with a camera. Aggregating the correlations between the two detectors yields an image even though the light striking the camera has never interacted with the object. This phenomenon, called ghost imaging, has been demonstrated in both the quantum and classical regimes (19).

Such imaging systems can be simplified by using a device capable of generating computer-programmable random light fields, which obviates the requirement for the beam splitter and the camera because knowledge of the light field is held in the computer memory. This type of system was initially called computational ghost imaging (10) but is similar in approach to more standard computational imaging systems, which use projected light patterns [albeit usually highly structured (11, 12)]. We also note that the use of projected patterns is related to the field of single-pixel cameras (13), where the programmable component is used to filter the detected, rather than illuminating, light.

In both single-pixel cameras and computational imaging systems, inverting the known random patterns and the measured intensities is a computational problem. A number of sophisticated algorithms have been developed over the years to improve the signal-to-noise ratio (SNR) for different systems (14, 15), but with appropriate normalization (16) a simple iterative algorithm was adopted for this experiment.

Previous experiments in single-pixel computational imaging (9, 14, 16) were restricted to relatively small (less than 10 cm) two-dimensional (2D) images, mainly of 2D template objects or 2D outlines of 3D objects. In the present work, we capture the 3D spatial form of an object by using several single-pixel detectors in different locations. A 2D image is derived from each detector but appears as if it is illuminated differently from the others. Comparing the shading information in the images allows the surface gradient and hence the 3D form of the surface to be reconstructed.

The experimental setup (Fig. 1) consists of a digital light projector to illuminate objects with random binary light patterns, four spatially separated single-pixel photodetectors to measure the intensity of the reflected light, an analog-to-digital converter to digitize the photodetector signals, and a computer to generate the random speckle pattern as well as perform 3D reconstructions of the test object.

Fig. 1 Experimental setup used for 3D surface reconstructions.

The light projector illuminates the object (head) with computer-generated random binary speckle patterns. The light reflected from the object is collected on four spatially separated single-pixel photodetectors. The signals from the photodetectors are measured and used to reconstruct a computational image for each photodetector.

The digital light projector comprises a red, green, and blue light-emitting diode illumination source and a digital micromirror device (DMD) to generate the structured illumination (see supplementary materials). Note that the large operational bandwidth of the DMD (300 nm to 2 μm) enables the use of this technique at other wavelengths that are potentially unsuitable for existing imaging technologies.

The projected patterns are randomly distributed binary patterns, with a black-to-white ratio of 1:1, that are projected onto the object. The life-sized mannequin head is positioned about 1 m from the lens so that it fits within the projected pattern. Four spatially separated single-pixel photodetectors are positioned in the plane of the lens, separated by 500 mm and each pointing toward a common point on the object to record the backscattered light. For every binary pattern projected, the corresponding object intensity is measured by each photodetector, and the data are fed to a computer algorithm.

DMD-based projectors create color images by displaying 24 binary images (bit planes) per frame in quick succession. By alternating between a binary pattern and its inverse in subsequent bit planes, we can demodulate the measured signal at the frequency of the bit plane projection (1440 Hz) to isolate the backreflected signal from light sources at other frequencies such as room lighting. Because the speckle pattern has equal numbers of black and white pixels, the measured signals for each pattern can be normalized; this has been shown to improve the SNR of the final reconstruction (16).

In all iterative techniques, a 2D representation of the object is reconstructed by averaging the product of the measured photodetector signal and the incident pattern over many patterns. A sequence of M binary patterns, Pi(x, y), are reflected from the object, giving a sequence of measured signals Si. The 2D reconstruction I(x, y), which provides an estimate of the object, can be stated asI(x,y)=(SiSi)(Pi(x,y)Pi(x,y)) (1)where angle brackets denote an ensemble average for Miterations, (1/M)∑i.

Using Eq. 1, we obtain 2D reconstructions of the object for each of the four photodetectors. Because all the images are derived from the same set of projected patterns, the (x, y) locations of the features in each image are identical. However, the intensity distribution in each image is different, because the apparent lighting of the object is dependent on the location of the detector used to record the backscattered light (optical imaging systems are reciprocal; see supplementary materials). Thus, in contrast to imaging systems based on multiple cameras, the perspective of the single-pixel detector does not render geometrical distortion to the object being imaged.

Depth information of a scene is normally lost in a 2D image, but there are instances where it can be inferred using a technique called "shape from shading" (SFS) (17, 18). From a single image with one source of illumination, this method relies on the shading caused by geometrical features to reveal the depth of the scene. Many SFS methods assume that the object exhibits uniform Lambertian (matte) reflectance and that a single light source is located at infinity, such that the incoming lighting vector is constant across the surface of the object. The test objects used in our experiment exhibit Lambertian reflectance (see supplementary materials) and so this assumption is valid. An alternative technique, called photometric stereo (19, 20), adopts the same assumptions but uses multiple images, each with a different illumination and taken from the same viewpoint, similar to the types of images retrieved by our 3D computational imaging system.

The intensity of a pixel (x, y) in the image obtained from the ith detector can be expressed asIi(x, y)=Isα(d^in^) (2)where Is is the source intensity, α is the surface reflectivity, d^i is the unit detector vector pointing from the object to the detector, and n^ is the surface normal unit vector for the object. Thus, for N images, we can write Eq. 2 asI(x,y)=Isα(Dn^) (3)where D is an array containing the unit detector vectors and I is an array containing the corresponding image intensities. For any pixel (x, y), the unit surface normal isn^=1Isα(D1I) (4)and the surface albedo (reflectivity) isα=D1 (5)From these surface normals, calculated for each pixel, it is possible to determine the gradient between adjacent pixels, from which we obtain the surface geometry by integration. Because we record four images, the problem becomes overconstrained; the surface normals represent only two degrees of freedom per pixel. We can thus remove our assumption of uniform reflectivity and recover an estimate of the surface albedo α at the same time as finding the object’s shape.

The gradients are integrated to find the shape of the object, starting at the center and working outward. The height of the surface at a given point can be estimated from a nearest-neighbor point using the height of that point and the gradient of the surface. Because the center of each pixel is associated with the measured gradient data, we use the mean of the gradient at the nearest-neighbor pixel and the gradient at the pixel to be evaluated. For each point where the height is being estimated, the value used is the mean of the estimates from each nearest neighbor.

After the integration has been performed to provide an initial estimate of the object’s shape, an optimization step refines the shape, where the cost function is the sum of the squared differences between the gradients of the reconstructed surface and the gradients recovered from the photometric stereo measurement. As described above, it is possible to estimate the height of a pixel on the basis of the height and gradient at each of its neighboring pixels. Our simple optimization works one pixel at a time, each iteration setting the pixel’s height such that it matches the mean estimate from each of its nearest neighbors. In the case of a pixel that is surrounded by other pixels with height estimates, this corresponds to setting the Laplacian of the reconstructed height field equal to the Laplacian calculated from the measured gradient data. In the case of pixels at the edge of the object, this is equivalent to assuming that the gradient measured perpendicular to the edge of the object is accurate. These two criteria are suggested by Horn (20). Because the optimal value for any given pixel can be calculated quickly (in both cases it is a linear operation), millions of iterations could be carried out in a few minutes, corresponding to approximately 100 passes over each pixel on average.

Once the algorithm had been appropriately calibrated by imaging flat and spherical surfaces (with the same surface material and reflective properties), accounting for changes of the lighting vector for different pixels across the object plane, the system was tested for objects with geometric complexity. One object investigated was a life-size white polystyrene mannequin head, with approximate dimensions 190 mm × 160 mm × 250 mm. Using the 2D images shown in Fig. 2, we calculated for each pixel the reflectivity, the surface normals, the surface gradient, and the estimated depth as prescribed by our model. A standard 3D graphic package was then used to visualize this profile, overlaid with the reflectivity data, as illustrated in Fig. 3.

Fig. 2 Source images from the four single-pixel detectors from 1000 to 1 million iterations.

The images from each photodetector are reconstructed using an iterative algorithm (described in the text). The spatial information in each image is identical; however, the apparent illumination source is determined by the location of the relevant photodetector, indicated underneath. No postprocessing has been applied to the images. The scale refers to the relative intensity of the images (in arbitrary units, 0 to 255).

Fig. 3 3D reconstruction of the object.

Rendered views of the reconstructed facial surface derived by integration of the surface normal data and overlaid with the reflectivity data (see movie S1).

To quantify the accuracy of our approach, we compared the 3D reconstruction of the test object with a 3D image captured from a stereophotogrammetric camera system. This latter system uses a matching algorithm on the 2D images from multiple cameras to recover the distance map of an object from the cameras. The accuracy of this system with facial shapes is well documented (21) to have a root mean square error (RMSE) on the order of 1 mm for central facial locations, but the error can rise substantially (2 cm) at side locations where the surface normals are close to perpendicular to the line of sight.

To compare the facial profiles measured by the two systems, we characterized the shapes according to well-defined facial locations (22): nose tip, mouth corners, etc. Figure 4 shows two sets of 21 such anatomical landmarks superimposed on these facial images by a trained observer. After lateral and angular registration and subsequent depth scaling, the RMSE of our 3D computational imager is found to be slightly below 4 mm. In common with camera-based stereophotogrammetry, the observed error is greater toward the edge of the object, around the ears and upper forehead. The increased error is a consequence of the projected pattern expanding at greater depth and highlights one limitation of the system. Our approach to minimizing this effect is to use a lens with suitable focal length for projection (see supplementary material).

Fig. 4 Comparison between computational imaging and a stereophotogrammetric imaging system.

Computational imaging (green) and stereophotogrammetric (blue) reconstructions of the mannequin head, from frontal (A) and profile (B) viewpoints, are shown with anatomical landmarks (color-coded green and blue, respectively) added.

Beyond showing that high-quality images of real-life objects can be captured using a single-pixel photodetector, our experiment demonstrates that by using a small number of single-pixel detectors, computational imaging methods can yield 3D images. An important difference between our technique and the multiple-camera approach is that a single projector determines the spatial resolution of the system, removing issues of pixel alignment associated with multiple cameras. Furthermore, reversing the fundamental imaging process allows for the use of simpler, less expensive detectors. The operational bandwidth of the system is limited not by the efficiency of a pixelated imaging detector but instead by the reflectivity of the DMD used for light projection, whose efficiency extends well beyond the visible spectrum. The development of such technology—for example, the use of a broadband white light source—could enable computational imaging systems to become a cheaper alternative for applications in 3D and multispectral imaging.

Supplementary Materials

Materials and Methods

Supplementary Text

Figs. S1 to S3

Movie S1

References and Notes

  1. Acknowledgments: Supported by the Royal Society and the Wolfson Foundation (M.J.P.) and by the UK Engineering and Physical Sciences Research Council. B.S. and M.P.E. thank D. Giovannini for useful discussions. All authors contributed equally to this work and to the writing of the manuscript.
View Abstract

Navigate This Article