Identification of foods in the breakfast and determination of nutritional facts using deep convolution neural networks

Food recognition plays a crucial role in various fields, including healthcare, nutrition, and the food industry. It involves identifying different types of foods or dishes from images, videos, or other data sources. In healthcare, food recognition aids individuals in monitoring their daily food intake and managing their diet. It also assists dietitians and nutritionists in creating personalized meal plans based on patients’ nutritional requirements and preferences. This article focuses on the development of software that can recognize food products and predict their nutritional facts. The software extracts essential nutritional facts such as fat, carbohydrates, protein, and energy from the food products and compiles them into a comprehensive list. For each of the 20 food products, 36 food images were obtained, resulting in a total of 720 food images. To validate the accuracy of the trained models, six different images of each food product were set aside for external validation purposes. The rest of the images were then trained using deep learning algorithms, namely, GoogleNet, ResNet-50, and Inception-v3, in the MATLAB software. The training and validation processes yielded over 98% correct predictions for each of the deep learning algorithms. Although there were no significant differences in accuracy among the algorithms, GoogleNet stood out when considering both prediction accuracy and prediction time. The validated deep learning algorithms were employed in developing the software for food recognition and nutritional value determination. The results indicate that the developed software can reliably identify foods and provide their corresponding nutritional facts. This software holds significant potential for application in the nutrition and dietetic field and can be particularly useful in healthcare settings for monitoring the dietary intake of patients with chronic diseases such as diabetes, heart disease, or obesity. The system can track the types and quantities of foods consumed, offering personalized feedback to patients and healthcare providers.


INTRODUCTION
Healthy diet is an important issue for good health and sufficient nutrition intake, and it helps protect us from catching numerous chronic noncommunicable diseases, such as heart disease, diabetes, and cancer (Ruthsatz & Candeias, 2020).The awareness of healthy eating, including tracking calories and nutritional facts, has considerably increased in importance for consumers worldwide (Miller & Cassady, 2015).Accurate prediction and tracking of dietary caloric and nutritional consumption are important for evaluating the effectiveness of weight loss interventions and for maintaining a healthy lifestyle (Sawamoto et al., 2017).To be conscious consumers and having accurate preference about what we eat, we should generally know the attributes, nutritional facts, and calories of the food products.Knowing the features of food is significant to check food quality and safety for consumers worldwide (Miller & Cassady, 2015).
Modern techniques such as electronic noses (Seesaard et al., 2022), computer vision (Tarlak et al., 2016a;2016b), and spectroscopy (Habibi & Khosravi-Darani, 2017) have been commonly employed to detect food quality and attributes, but a fast, easy, accurate, and automatic way is a practical demand in daily life.Recently, the accuracy and effectiveness of food intake reporting systems have been developed by applying pattern recognition and image processing methods to automatically classify and distinguish food items (Allegra et al., 2020;Jiang et al., 2020).In these systems, the databases showing nutritional facts and calories of the food products are applied to produce a daily food consumption report, but, first, it is required to know and classify the food product which is consumed.The classification of food product images is considered a challenging task because of numerous parameters, including the identification of multi-food classes within a single plate or the variance of the food texture for the same type (Boushey et al., 2017;Yang et al., 2010).
By image analysis of the food products, it is possible to discuss about four main phases that can be followed: food detection, classification or recognizing food products, weight determination by using food volume, and nutritional facts and calories of the food products.Recently, image identification and recognition accuracy has been improved with the development in image processing and object detection, machine learning approaches, and specifically deep learning and its implementation of convolutional neural networks (CNNs).This leads to increasing interest in the image analysis process for food products.
Deep learning is a branch of machine learning algorithms that train computers to do what comes naturally to human beings: learn from experience (Alzubi et al., 2018).Deep learning uses neural networks to learn useful representations of features directly from data.Neural networks combine multiple nonlinear processing layers, using simple elements operating in parallel and inspired by biological nervous systems (Prieto et al., 2016).Deep learning models can achieve state-of-the-art accuracy in object classification, sometimes exceeding human-level performance.The use of CNNs has shown better prediction performance for food recognition than traditional machine learning approaches.Studies have demonstrated improved performance by modifying the AlexNet model and creating a deep CNN for the Food-101 dataset (Bossard et al., 2014;Krizhevsky et al., 2012).CNNs have also been used for food recognition and identification with datasets comprising 10 food classes, resulting in a detection accuracy of 73.7% (Kawano & Yanai, 2014).Another study has retrained the AlexNet model with two different datasets and achieved maximum accuracy of 78.8 and 67.6% (Kawano & Yanai, 2015).All these results showed that CNNs provide better prediction ability considering the previous studies in which conventional machine learning approaches were applied (Shirmard et al., 2022;Subhi & Ali, 2018).
To be conscious consumers and having accurate preference about what we eat, we should generally know the attributes, nutritional facts, and calories of the food products.The implementation of food recognition technology in healthcare can aid individuals in monitoring their daily food consumption and managing their diet effectively.Moreover, it can facilitate dietitians and nutritionists in devising customized meal plans for their patients, tailored to their specific nutritional needs and preferences.However, to the best of our knowledge, there is no research focusing on nutritional facts and calories of the food products that we consume.Therefore, it is valuable to develop software by which the types and amounts of foods consumed can be tracked and provide personalized feedback to patients and healthcare providers.
The primary aim of this article was to develop software using the deep learning algorithms such as GoogleNet, Res-Net-50, and Inception-v3 in the MATLAB software.Using this software, the food products that are commonly consumed in the Turkish breakfast can be recognized, and their nutritional facts can be predicted.

MATERIAL AND METHODS
There are five main steps in this study (Figure 1): • food images were obtained; • nutritional facts were gathered; • the collected food images were categorized into train and test data; • the deep learning algorithms were used for training the food images; • the MATLAB software was developed.
Detailed information about these five main steps is given in the following subsections.

Food images
A traditional Turkish breakfast comprises a rich menu, but some food products such as olive, white cheese, egg, and black tea are irreplaceable.Vegetable, carrot, cucumber, eggplant, green pepper, potato, tomato, and zucchini are also consumable in the breakfast.Additionally, some fruits such as apple, peach, and grape can be consumed.For each of the 20 food products, 30 food images were obtained, which accounts for a total of 600 food images.The food images were gathered from the Internet source by searching with their name in search engines.A sample of food images used is given in Figure 2.

Nutritional facts and calories
Nutritional facts and calories of the food products were obtained from the food nutrition database known as FatSecret (2023).By writing the food name, searching was done.Fat in grams, carbohydrate in grams, protein in grams, and calories  in kilocalories per specific serving size were collected.For each of the 20 food products, the fundamental nutritional values were collected.

Dividing of food images
To develop the recognition software, a collection of food product images was gathered.Each food product had 36 different images, resulting in a total of 720 food images for both training and testing.Out of 720 food images, 600 (30 images for each of the food products) were randomly selected and used for training purpose.Rest of the 120 food images (6 images for each of the food products) were used specifically for testing purpose.

Deep learning algorithms
The food images were trained using deep learning algorithms such as GoogleNet, ResNet-50, and Inception-v3, which are 22, 50, and 48 layers deep, respectively, in the deep learning toolbox in the MATLAB software."googlenet, " "resnet50, " and "inceptionv3" which are present in MATLAB were used to create the respective deep learning algorithm by modifying the network layers including the last fully connected layer to match the number of classes (20 food products) in our dataset.
After training, the "classify" function that provides us with predictions for each image in the dataset was applied to evaluate the trained network on the testing dataset.

Software development
To recognize the food products that are consumed in Turkish breakfast and to predict their nutritional facts and calories, the software was developed.The illustration of the software interface is given in Figure 3.The developed software is provided in the GitHub database located at Tarlak and Yucel (2023).

Evaluation of training and validation process
For classification problems, classifier performance is typically defined according to the confusion matrix associated with the classifier.Additionally, based on the entries of the matrix, it is possible to compute Average accuracy, Error rate, Precision, Recall, and Fscore using Equations 1-5, respectively (Sokolova & Lapalme, 2009): Where:

RESULTS AND DISCUSSION
The food images were trained using deep learning algorithms such as GoogleNet, ResNet-50, and Inception-v3, which are 22, 50, and 48 layers deep, respectively, in the deep learning toolbox in the MATLAB software.For this purpose, the food products that are commonly consumable in the breakfast including apple, bagel, carrot, cucumber, egg, eggplant, fermented sausage, grape, green pepper, honey, mint, olive, omelet, parsley, peach, potato, tea, tomato, white cheese, and zucchini were used.For each food product, 30 different images were obtained, meaning that 600 food images were employed for the training process.
Deep CNNs based on GoogleNet, ResNet-50, and Inception-v3 were applied.When they are compared from the point of elapsed time for the training process, the GoogleNet is the fastest trained network, followed by Inception-v3 and Res-Net-50.The elapsed time can change from computer to computer; however, training processes of GoogleNet, Inception-v3, and ResNet-50 took about 30, 180, and 210 min, respectively, for the computer whose processor is Intel(R) core(TM) i5-1035G1 CPU @ 1.00 GHz 1.19.This simply means that GoogleNet works seven times faster than ResNet-50 and six times faster than Inception-v3.According to the deep learning toolbox in the MATLAB software, the file sizes (storage sizes on disk) for architectures of GoogleNet, ResNet-50, and Inception-v3 are 27.0, 96, and 89 MB, respectively.This difference may directly arise from their complexity and network architecture, meaning that GoogleNet is more advantageous in terms of elapsed times to be trained if the classification process is not too complex.
In the area of machine learning and specifically the problem of statistical classification, a confusion matrix, also known as an error matrix (Stehman, 1997), is a specific table layout that shows visualization of the performance of an algorithm, typically a supervised learning one (in unsupervised learning, it is usually called a matching matrix).Each row of the matrix represents the instances in an actual class, whereas each column represents the instances in a predicted class, or vice versa -both variants are found in the literature (Stehman, 1997).The former version was chosen in this study.The confusion matrix is a special kind of contingency table, with two dimensions ("actual" and "predicted"), and identical sets of "classes" in both dimensions (each combination of dimension and class is a variable in the contingency table).For the evaluation of the training process of GoogleNet, ResNet-50, and Inception-v3, the confusion matrix was obtained and is given in Figure 4.It can be seen from this figure that blue markers indicate true predictions, whereas other colors show the number of errors for each specific class.For instance, GoogleNet gives five error predictions (class codes: 1, 5, 8, 11, and 15 for one sample from each class) in the training process.Class codes show the alphabetic orders of the food product names.Apple, bagel, carrot, cucumber, egg, eggplant, fermented sausage, grape, green pepper, honey, mint, olive, omelet, parsley, peach, potato, tea, tomato, white cheese, and zucchini are coded as 1-20, respectively.This means that Goo-gleNet provided only one false prediction out of 30 samples for apple, egg, grape, honey, and parsley.The other 29 predictions are still true for the 5 food products.Additionally, for the other 15 food products, the training process was achieved as 100% true in Figure 4. ResNet-50 and Inception-v3 gave 7 and 5 false predictions out of 600 samples for the whole training process.This result simply means that GoogleNet and Inception-v3 gave the same true and false results, while ResNet-50 gave more false results than GoogleNet and Inception-v3.As a statistical evaluation metric, average accuracy, error rate, precision, recall, and Fscore were calculated and presented in Table 1.The average accuracies of GoogleNet, ResNet-50, and Inception-v3 were found to be 99.92,99.88, and 99.92%, respectively.This result shows that GoogleNet and Inception-v3 were more successful than ResNet-50 for the training process.Other statistical metrics (i.e., error rate, precision, recall, and Fscore) also approved that GoogleNet and Inception-v3 gave the same accuracy and more training capability than ResNet-50 (Table 1).
CNNs have been employed for food recognition purposes in the past few years, and they have provided prediction performance more than the traditional machine learning approaches.Bossard et al. (2014) have modified the structure of the AlexNet model as reported by Krizhevsky et al. (2012) and created a deep convolutional neural net by considering the images shared in the dataset known as Food-101.As a result, this modification has greatly improved the prediction performance.Kawano and Yanai (2014) have also used convolutional neural net for food recognition and identification, the dataset which was in their study comprised 10 food classes.The results showed the great performance of the convolutional neural net in contrast with other traditional techniques by giving a detection accuracy of  73.7%.Kawano and Yanai (2015) have retrained the AlexNet model with two different datasets, namely, UEC-FOOD-100 and UEC-FOOD-256.In their study, they got maximum accuracies of 78.8% for the UEC-FOOD-100 dataset and 67.6% for the UEC-FOOD-256 dataset.The results indicated that using all CNNs in this study provides better prediction ability considering the previous studies which were applied.
For the assessment of the validation process of GoogleNet, ResNet-50, and Inception-v3, the confusion matrix was found and is shown in Figure 5. Blue markers in Figure 5 show true predictions, while other colors indicate the number of errors for each specific class.Each of the networks (i.e., GoogleNet, ResNet-50, and Inception-v3) gave 12 false predictions in total  (6 × 20 = 120 independent food images), although networks provided false predictions for different food images.For example, GoogleNet gave four false predictions for class code 4 (i.e., cucumber), while others (namely, ResNet-50 and Inception-v3) gave only two false predictions for the cucumber.
Statistical evaluation results (i.e., average accuracy, error rate, precision, recall, and Fscore) are summarized in Table 2.The average accuracy and error rate of GoogleNet were 98.84 and 1.16, respectively, while the average accuracy and error rate of ResNet-50 and Inception-v3 were 99.01 and 0.99, respectively.This shows that ResNet-50 and Inception-v3 provided better prediction performance than GoogleNet in the validation process.The precision-recall and Fscore were found to be slightly different for ResNet-50 and Inception-v3; however, precision, recall, and Fscore of ResNet-50 were 90.98, 90.00, and 90.49, respectively.
While no significant variations in accuracy were observed among the algorithms, GoogleNet demonstrated superiority in terms of both prediction accuracy and time.The validated deep learning algorithms were effectively utilized in the development of software dedicated to food recognition and nutritional value determination.The obtained results affirm the software's reliability in identifying food items and providing their corresponding nutritional information.This software exhibits substantial potential for implementation in the nutrition and dietetics field, particularly within healthcare settings, to monitor the dietary intake of individuals with chronic ailments such as diabetes, heart disease, or obesity.By tracking the types and quantities of consumed foods, the system can offer personalized feedback to patients and healthcare providers, facilitating effective dietary management.

CONCLUSION
Food recognition is a crucial aspect in various fields, including healthcare, nutrition, and the food industry.This article highlights the importance of developing software that can accurately recognize different types of foods and predict their nutritional values.Through deep learning algorithms and extensive training, the developed software has shown an accuracy rate of over 98% in recognizing food products and their nutritional facts.The potential applications of this software in healthcare are immense, as it can assist in monitoring the dietary intake of patients with chronic diseases and providing personalized feedback to patients and healthcare providers.With this software, individuals can have greater control over their diet and make informed decisions about their food choices, ultimately contributing to a healthier lifestyle.

Figure 1 .
Figure1.The flowchart of the steps followed in this study.
tp i : the number of true-positive classes; tn i : the number of true-negative classes;fp i : the number of false-positive classes; fn i : the number of false-negative classes; l: the number of evaluated classes.

Figure 3 .
Figure 3. Visual design of food determination software developed in the study.

Figure 4 .
Figure 4. Confusion matrix to evaluate the classification capability of deep learning algorithms: (a) GoogleNet, (b) ResNet-50, and (c) Inception-v3 for the training process.