With the high prevalence of obesity, diabetes and other chronic diseases, the study of caloric and nutritional intake becomes increasingly important(Reference Bray and Popkin1,Reference Celis-Morales, Lyall and Gray2) . Besides proper food choice, the control of portion size is the most deterministic factor in controlling intake. Traditionally, food portion size is measured in terms of either weight or volume. Although the weight measurement can be conducted precisely using a weighing scale, it is inconvenient because the weighing scale must be placed at, or carried to, the eating site. Volumetric portion size measurement, on the other hand, is traditionally conducted by self-estimation with reference to a common object (e.g. a cup, a spoon or a fist). In some cases, a set of descriptive terms is used (e.g. small, medium or large) instead of a quantitative volumetric value. Although these intuitive approaches are easy to learn for people to self-monitor their intake, they are clearly very subjective and inaccurate. In addition, numerous studies have shown that people tend to underreport their intake(Reference Gemming, Jiang and Swinburn3,Reference Maurer, Taren and Teixeira4) .
Recently, advances in microelectronics and mobile technology have led to an imaging approach to portion size measurement. A food image acquired by a cell phone or a wearable device can be used to quantitatively measure food volume based on a mathematical transformation of coordinates, expressed in matrices, between the image pixels and real-world coordinates(Reference Zhang, Yang and Yue5,Reference Gao, Lo and Lo6) . Several computational methods have been developed(Reference Weiss, Stumbo and Divakaran7,Reference Jia, Li and Qu8) , including those based on wireframe shape models(Reference Jia, Chen and Yue9), structured lights(Reference Makhsous, Mohammad and Schenk10) and depth maps(Reference Fang, Zhu and Jiang11,Reference Lo, Sun and Qiu12) . Although this advanced imaging approach holds promise for objective portion size quantification, its accuracy is currently much lower than that of the weight-based approach. As a result, it is often required to assess volumetric accuracy before a large-scale dietary study is conducted. Unfortunately, one cannot assess this accuracy easily because the true volume of food that serves as the ‘gold standard’ is difficult to obtain. Traditionally, the water displacement method is often used to measure the true volume. However, foods that are destructible in water cannot be measured unless they are properly sealed. Frequently, the sealing process alters the volume if the food is compressible. Although water in this method can be replaced by certain plant seeds (e.g. rapeseeds or millets), sealing is still required for foods containing liquid, and the degree of seed compression, which is influenced by a variety of physical factors(Reference Lupu, Canja and Padureanu13), introduces a new source of error. Several advanced methods have been reported using a CT/MRI scan(Reference Kelkar, Boushey and Okos14) and a gas comparison pycnometer(Reference Sereno15). However, these methods are expensive, and they do not measure the same kind of volume as pictured in a photograph that shows only the surface of food, not its interior. Currently, the lack of accurate food volume measurement is a significant stumbling block in image-based dietary assessment.
Technological advances in dietetics and nutrition science face another significant challenge in updating the existing food databases. A food database for dietary assessment outputs values of calories and nutrients based on inputs of food names and portion sizes. Many currently used large databases were established over decades. Some food entries do not provide volumetric measures(Reference Weiss, Stumbo and Divakaran7). Updating food databases is thus necessary to support the new imaging technology. Clearly, this conversion cannot be accomplished properly without an accurate means to determine the true volume of food.
In this work, we present an electronic instrument to measure both food volume and density (VD). We call it a VD meter. The VD meter is composed of four main modules: a mechanical module to support a turning table and a weighing sensor; a camera module for image acquisition by an array of cameras with illuminating lights; an electronic module for power supply and system control/coordination; and a data processing module for performing image calibration and 3D surface reconstruction.
The rest of the paper is organised as follows. The structural design of the VD meter will be described. The subsequent section highlights our algorithms for image calibration and reconstruction with an emphasis on a new mathematical model that mimics the physical properties of the electric field. We use this model to estimate food surface from a 3D point cloud, followed by estimating the volume and density. Our experiments and data analysis will be presented. After a discussion about several important constraints in food volume measurement, this paper will be concluded.
Structural and hardware design
A cross-sectional view (in the vertical direction) of the VD meter is illustrated in Fig.1. The mechanical module contains a turntable rotating precisely at a constant speed of 0·625 rpm and driven by a step motor through a transmission system. A high-precision load sensor is installed under the turntable as the weighing sensor. The camera module of the VD meter contains an arc-shaped stationary support that is installed with a set of high-quality cameras (Type MER-132-30GM; Daheng Group, Inc.) with a resolution of 1292 × 964 pixels, frame rate 30 frames per second and pixel size of 3·75 × 3·75 μm. In our experiment, we used three cameras (installed on the arc support and marked with green borders), and the imaging rate was sixty-four images per turntable rotation. The cameras are properly angled towards the turntable. The support arc is also installed with a set of white LED aiming at the turntable. Each LED is located at the middle of the neighbouring cameras. The electronic module within the VD meter provides power to other units, interfaces data (for cameras, weighing sensor and exterior computer) and coordinates functions among other system components. Additionally, a data processing module, which is a set of software, is loaded in both the microcomputer within the VD meter and a desktop computer connected to the VD meter. The assembled system is shown in Fig. 2a.
During measurement, food, with or without a plate, is placed on top of the turntable. The control unit has an option to tare the plate weight automatically to obtain the net weight of the food. As the turntable rotates for an entire cycle (360°), the set of cameras synchronously takes images at multiple positions, forming an imaging surface shaped like a mesh dome food cover (Fig. 2b). Precisely, sixty-four images were taken for each camera, and a total of 64 × 3 = 192 images were obtained by the three cameras of the VD meter for each food measurement. Since the rotation speed of the turntable and the locations/orientations of the cameras are known, all picture-taking points (small red cameras in Fig. 2b) on the ‘mesh dome’ are known and distributed regularly on the dome, minimising the likelihood of occlusion. With shadowless illumination provided by white LED, the VD meter provides an imaging platform for high-accuracy food image reconstruction. Here we note that the size of the food measurable by the VD meter is limited by the turntable size and the space within the imaging area. However, we intentionally designed them to be sufficiently large (Fig. 2a) for common foods.
System calibration and volume estimation based on 3D point cloud
System calibration
In order to establish correspondences between image pixels and real-world coordinates, the imaging system within the VD meter must be calibrated. This calibration is required only once if the system is not re-adjusted or repositioned. The calibration is performed using an algorithm (more information is provided in Supplementary material S1) with a sheet of checkerboard placed on the turntable, as shown in Fig. 2.
Point cloud construction from multi-view images
The purpose of this part of the algorithm is to compute a 3D cloud of food surface points based on 192 2D images in different views. While the details of the algorithm can be found in Supplementary material S2, here we highlight the key procedures utilised. First, a set of common points (called feature points) observable from multiple neighbourhood images are selected automatically. Then, these feature points are one-to-one registered across as many images as possible. Next, the 3D relationships of these feature points are obtained and utilised to calculate a 3D cloud of points in the actual world coordinates (in a real-world unit, e.g. millimetre) as illustrated in Fig. 3. Finally, outlier points due to noise are identified and removed from the point cloud.
Volume estimation from point cloud
The next major computational procedure is to estimate food volume from the point cloud. Traditionally, this estimation is performed using the convex hull method in which the surface is assumed to be locally convex(Reference Preparata and Hong16,Reference Li, Wang and Jia17) . However, it is problematic to estimate the food volume using this method because this special type of 3D objects often has concave local surfaces. When the convex hull method is applied to a concave surface, the estimated volume tends to be larger than the true volume. In order to solve this significant problem, we present two methods for food volume estimation: a simple sliced point cloud method and a robust estimation method using a new electric field-based physical model (Fig. 4). These methods are described in detail in Supplementary material S3.
Experiments
To evaluate the performance of the VD meter, we conducted two experimental studies of both computer-synthesised 3D objects and real food samples. Since the food density is equal to mass divided by volume, and the mass measurement is at least two orders of magnitude more accurate (when a high-quality digital load sensor is used) than the volumetric measurement, the error in mass measurement can be ignored. As a result, the accuracy of volumetric measurement is equivalent to the accuracy of density measurement. Therefore, we needed only to perform experiments for the volumetric case. We used computer synthesis in the first study because, in this case, the true volumes of 3D objects are precisely known, and this approach allows us to evaluate our algorithm performance for specific concave surfaces. In the second study, real-world food samples with a variety of shapes, colours and textures were utilised to evaluate the VD meter performance. In both studies, we compared volume estimation accuracies of different methods, including the convex hull method, the slice-based method and the electric field method. For the last method, we conducted an additional study by changing point cloud density and adding noise and outliers to evaluate the robustness of this method.
Experiments on synthetic object models
Two 3D volumetric models were synthesised computationally, as shown in Fig. 5. We applied the convex hull method, slice-based method and electric field method to each point cloud for volume estimation and compared their results against the ground truth volumes of synthesised models. We performed the slice-based method twice for N = 10 and N = 18, where N represents the number of slices.
Table 1 compares the three methods quantitatively, where both N = 10 and N = 18 cases for the slice-based method are listed. It can be observed that the convex hull method tends to distort the regions where the object surface is locally concave. As a result, the estimated volumes are larger than the true volumes (comparing Figs. 5 and 6). In contrast, in our electric field-based method, the point cloud points were well fit by the free particles even in concave regions (Fig. 7).
Compared with the convex hull method, the accuracy of the electric field method is higher. In the slice-based method, estimation accuracy tends to increase as the number of slices increases. However, this method has a significant problem: it is difficult to determine N (number of slices) applicable to different point cloud models. For a specific food, N needs to be adjusted manually. Estimation accuracy tends to increase as the number of slices increases, but, if the number of slices is too large, the number of points in each slice reduces, which tends to affect estimation accuracy negatively.
For the best performing electric field method, we conducted additional experiments to evaluate its robustness by changing the density of the original point cloud. For each object model, we compared three point cloud densities, as shown in Table 2. It can be observed that estimation accuracy remains <1 % regardless of the choices of number of points in the cloud. During 3D reconstruction, noisy outliers are present due to calibration error, matching error and other sources of errors. These outliers are often difficult to remove. In order to evaluate the electric field method with the presence of noise and outliers, we added white Gaussian noise (mean = 0, standard = 1) and outliers (randomly generated from a uniform distribution in the range of [0·05, 5]) separately to each point cloud and performed volume estimation. The results (Table 3) indicate that the electric field method maintained similar performances regardless of noise and outliers.
Experiments on real food samples
Six foods, purchased from cafeteria and food stores, were used to study the real-world performance of the VD meter. For each food, we used the VD meter and applied the electric field method to estimate the volume. For the same food, we also obtained its 3D point cloud using a laser scanner(Reference Wang, Zhang and Xu18), which is considered to be close to the ground truth of 3D shape of the food. The electric field method is also applied to the 3D point cloud from the laser scanner to estimate the volume, in order to check the accuracy of 3D point cloud from the VD meter. Furthermore, we also used the slice-based method, where both N = 20 and N = 50 were implemented. Finally, water displacement (manual measurement) was adopted as a base for comparison, as it is a very traditional and 3D-point-free method to obtain the food volume.
We first performed system calibration to establish correspondences between image pixels and real-world coordinates using a commercial, high-precision checkerboard. Then, 192 images (taken by three cameras) of foods in different views were produced by the VD meter. Next, feature points were detected using combined features of Harris(Reference Harris and Stephens19) and features from accelerated segment test (FAST)(Reference Rosten and Drummond20) and speeded-up robust features (SURF)(Reference Bay, Tuytelaars and Gool21) as described in Supplementary material S2. The threshold value of T min was experimentally chosen to be 3. Then, the point cloud of each shape model was computed according to Eq. (7) in the Supplementary material. To remove noise and outliers, we filtered raw clouds using Kp = 10. After filtering, we applied the electric field method to reconstruct the surface for each food and estimating its volume.
In contrast to the VD meter, the number of points obtained by the laser scanner was very large (usually on the order of 105). As a result, estimating the volume directly using the raw point cloud data was time-consuming. Since the density of the point cloud has only limited effect on volume estimation accuracy, to accelerate computation, the original laser point cloud was down-sampled 80 % for volume estimation.
Table 4 lists the names of real foods (column 1), results based on water displacement method (column 2, which was used as the base for comparison), results of slice-based method with its corresponding errors (columns 3 and 4 for N = 20 and columns 5 and 6 for N = 50), image-based electric field method with its corresponding errors (columns 7 and 8) and laser-based electric field method with its corresponding errors (columns 9 and 10) and cubic centimetre as the unit of estimation values. The number of 3D points in the real food study ranges from 7000 to 20 000.
Several important observations can be made from Table 4. First, our electric field-based method achieved better performance than the slice-based method. Although the best number of slices can be found for optimising volume estimation, the optimal number differed between foods. Second, the electric field method achieved satisfactory performance regardless of different food shapes both in image-based and laser-based point clouds.
Figure 8 shows food images tested (first row) and their experimental results using both image- (second row) and laser- (third row) based electric field methods. In the bottom two rows, red and green points represent, respectively, food point clouds (i.e. particles in the PCP set) and the final positions of free particles in the NCP set. It can be seen that the green points well represent the surfaces of red food point clouds.
All the experiments were performed using a computer equipped with Intel Core i7 3·5 GHz CPU and 16 GB RAM. The time required for each measurement depends on the complexity of food shapes and the number of points reconstructed to estimate the volume. In our analysis, the typical amount of time ranges from 3 to 5 min using MATLAB. We expect the duration to be shortened substantially using a different code, but with additional programming effort.
Discussion
Since the VD meter measures food volume and density, it is necessary to discuss the physical definitions of these quantities and several important challenges regarding their measurements. Unlike numerous incompressible objects in the physical world that have no ambiguity in their volumetric measures, foods do not have a unique definition for its volume. For example, the volume of an apple or a cup of coffee is well defined. However, the volume of a bowl of rice or a plate of salad is ambiguous because these compressible objects are porous with connected air spaces and, as a result, the boundary between food and air is uncertain. Strictly speaking, volumetric measurement of this type of food is not a deterministic quantity, but rather a probability distribution! In order to mitigate this fundamental problem, three kinds of food volumes (and densities) have been defined(Reference Rahman22): bulk volume, apparent volume and net volume. Bulk volume is the volume defined by the container (e.g. a box of cereal); apparent volume is defined by a hypothetical enclosure covering the food; and net volume, which cannot be measured using a camera, is the volume of net matter excluding all spaces. Despite these three definitions, food volume is still an ambiguous quantity. A bowl of non-liquid food has both bulk and apparent volumes (e.g. the top and bottom parts of the food in a bowl fit the definitions of apparent and bulk volumes, respectively). A compressible food in a bowl or plate does not have the same volume precisely if measured twice. This phenomenon is due partially to the redistribution of food elements and the variation of ‘tightness’ of the hypothetical enclosure. Similarly, the concept of ‘spaces’ cannot be accurately defined in the net volume because new gaps between matters or particles always exist as we descend into a lower scale of observation. Therefore, we must accept the ambiguity in food volume (density as well) and treat these quantities with a certain level of uncertainty (we emphasise, again, they are probability distributions). As a result, an overly high requirement for ‘accuracy’ in the volume of a compressible food is unnecessary and misleading. By the same token, volume and density measurements using the VD meter are ‘accurate’ only relevant to the average physical states of foods being served in the real world, including its usual amount of water content, temperature, surrounding pressure and commonly accepted containers. Moreover, the VD meter only measures the average density of the entire food, not its local density.
Conclusion
In this paper, we presented a new instrument, the VD meter, to measure both food volume and density. This instrument contains a number of hardware and software modules, including a turntable, a pressure sensor, an array of cameras, an array of illumination lights, an electronic circuitry for system control functions and a set of software performing computations. We also presented a new algorithm to estimate the 3D surface from a point cloud based on a physical model that governs the motions of charged particles in an electric field. This model produces a 3D surface of food which can have both convex and concave local regions. Our experiments using both synthesised and real foods indicate that the electric field method overperforms the reference methods for food volume estimation. The VD meter presented in this paper provides a new tool for both portion size estimation in dietary studies and the improvement of food databases.
Acknowledgements
Acknowledgements: The authors would like to acknowledge all the participants for their significant contributions to this research study, as well as Hao Ma for collecting experimental data. Financial support: This work was supported in part by the State’s Key Project of Research and Development Plan in China (H. Z., D. Y, M. S., W. J., grant no. 2016YFE0108100), the National Institutes of Health grants (R01CA165255, R21CA172864 and R56DK113819) and the Bill & Melinda Gates Foundation (contract ID OPP1171395). Conflicts of interest: None. Authorship: D. Y., X. H. and H.Z. were responsible for camera calibration/image collection/analysis. D. Y., X. H., H.Z., W.Y. and M.S. contributed to the algorithm for data analysis. D. Y., X. H., H.Z., W.Y., Z.M. and M.S. contributed to final drafting and editing of the manuscript. Ethics of human subject participation: The experiment of this research measures volume and density of the food, It does not involve any human subjects.
Supplementary material
For supplementary material accompanying this article visit https://doi.org/10.1017/S136898002000275X