You are on page 1of 5

e-ISSN: 2582-5208

International Research Journal of Modernization in Engineering Technology and Science


( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:05/Issue:03/March-2023 Impact Factor- 7.868 www.irjmets.com

INVERSE COOKING RECIPE GENERATION FROM FOOD IMAGES


Mrs. B. Ujwala*1, Sevella Amarnath*2, Rohan Swain*3, Kasa Vignesh*4
*1Assistant Professor, Department Of CSE, Anurag Group Of Institutions,
Venkatapur, Hyderabad, India.
*2,3,4B.Tech, Department Of CSE, Anurag Group Of Institutions, Venkatapur,
Hyderabad, India.
DOI : https://www.doi.org/10.56726/IRJMETS35206
ABSTRACT
Food photography is appreciated by many as it showcases the beauty of food. However, food images do not
provide any information about the preparation process and the complexity of the recipe behind each dish. An
inverse cooking system that generates cooking recipes from food images is developed using Convolutional
Neural Network (CNN). The system utilizes a unique architecture to predict ingredients and their dependencies
without imposing any order. It then generates cooking instructions by simultaneously considering the image
and inferred ingredients. The system's performance was extensively evaluated on the Recipe1M dataset,
demonstrating an improvement in ingredient prediction compared to previous methods. The system was also
able to generate high-quality recipes by leveraging both the image and inferred ingredients, and according to
human evaluation, produced more compelling recipes than retrieval-based approaches.
Keywords: Inverse Cooking, Recipe 1M Dataset, CNN, Retrieval-Based Approaches.
I. INTRODUCTION
Food plays a crucial role in human life, not only providing us with energy but also influencing our identity and
culture. Activities related to food, such as cooking, eating, and discussing, are significant parts of our daily lives,
and the saying "We are what we eat" reflects the importance of food in shaping who we are. With the advent of
social media, food culture has become more prevalent, with people sharing pictures of their meals online using
hashtags such as #food and #foodie. This trend underscores the value that food holds in our society.
Additionally, the way we consume and prepare food has evolved over time. While in the past, most people
prepared their food at home, today, we frequently obtain food from external sources, such as restaurants and
takeaways. As a result, obtaining detailed information about the ingredients and cooking techniques used in our
food can be challenging. Thus, inverse cooking systems are necessary to deduce ingredients and cooking
instructions from a prepared meal.
In recent years, significant progress has been made in visual recognition tasks such as natural image
classification, object detection, and semantic segmentation. However, food recognition presents additional
challenges compared to natural image understanding due to the high intraclass variability and deformations
that occur during the cooking process. Cooked dishes often contain ingredients, which come in various colors,
forms, and textures. Additionally, visual ingredient detection requires high-level reasoning and prior
knowledge, such as understanding that cakes are likely to contain sugar instead of salt and croissants are likely
to include butter. Therefore, recognizing food requires computer vision systems to incorporate prior
knowledge and go beyond what is merely visible to provide high-quality structured food preparation
descriptions.
II. LITERATURE REVIEW
There are many works that have been carried in the past on recipe generation. Here is a survey of some works
which help in understanding the previous techniques and gives a clear view on the challenges on which the
researchers have worked and the new things that they have introduced.
Lukas Bossard et al. [1] introduced a new dataset for food recognition called Food-101 in 2014. The dataset
contains over 100,000 images of 101 food categories, and they proposed a method for mining the
discriminative components of the images using random forests. Their method outperforms several state-of-the-
art algorithms for food recognition on the Food-101 dataset. They also conducted a detailed analysis of the

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science


[3802]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:05/Issue:03/March-2023 Impact Factor- 7.868 www.irjmets.com
dataset, including the distribution of the categories and the difficulty of the recognition task . They have
highlighted the importance of large-scale and diverse datasets in food image analysis.
Micael Carvalho et al. [2] proposed a cross-modal retrieval approach for the cooking context, which involved
learning semantic text-image embeddings to link cooking recipes and their corresponding food images. They
used a multi-modal deep neural network to learn the embeddings, which consisted of a textual embedding and
a visual embedding. The textual embedding is learned from the recipe ingredients and instructions, while the
visual embedding is learned from the food images. They evaluated their approach on a dataset of 5,000 recipe-
image pairs and also conducted a qualitative analysis of the retrieved results and showed that their method is
able to retrieve relevant recipes and images. They have provided a novel approach for linking cooking recipes
and food images, highlighting the importance of cross-modal retrieval in the food image analysis field.
Chong-Wah Ngo et al.[3] introduced a deep-based approach for ingredient recognition in cooking recipes,
which is essential for cooking recipe retrieval. They have used a convolutional neural network (CNN) to extract
features from food images and then used these features to recognize the ingredients in the corresponding
recipes. They have evaluated their approach on a dataset of 600 recipes and also conducted a user study to
evaluate the effectiveness of their method for recipe retrieval and showed that their approach improves the
retrieval performance compared to a baseline method and highlighted the importance of ingredient recognition
in cooking recipe retrieval.
Jing-Jing Chen et al. [4] proposed a cross-modal recipe retrieval approach that considers rich food attributes
such as taste, cuisine, and occasion, in addition to the ingredients and food images. They have used a deep
neural network to extract features from food images and text features from the recipe ingredients and
attributes. They then used a multimodal fusion method to combine the features and perform cross-modal
retrieval and evaluated their approach on a dataset of 1,000 recipes. They also conducted a user study to
evaluate the effectiveness of their method and show that their approach improves the retrieval performance
compared to a baseline method. They have highlighted the importance of considering various aspects of food
when performing food image analysis tasks.
Mei-Yun Chen et al. [5] proposed a system consisting of two parts: a food identification module and a quantity
estimation module. The food identification module uses a combination of visual features and text features to
identify the Chinese dishes in the input image. The quantity estimation module estimates the quantity of each
dish by analyzing the visual characteristics of the food and comparing it with a reference database. They have
evaluated the performance of the system on a dataset of 50 Chinese dishes and reported promising results.
They also discussed the limitations of the system, such as the need for a more extensive reference database and
the difficulty in accurately estimating the quantity of mixed dishes.
Xin Chen et al. [6] introduced a dataset called Chinese Foodnet, containing over 0.5 million images of 106
categories of Chinese food. They described the process of creating the dataset, including the data collection and
cleaning procedures. They also evaluated the performance of several state-of-the-art deep learning models on
the dataset and compared the results with other existing datasets for food recognition and highlighted the
importance of large-scale and diverse datasets in food image analysis.
Bo Dai et al.[7] proposed a novel approach for generating diverse and natural language descriptions of images
using a conditional generative adversarial network (cGAN). They trained the cGAN on a dataset of images and
corresponding descriptions and used it to generate multiple diverse and natural descriptions for each image
and evaluated their approach on several benchmark datasets. They also demonstrated the potential of their
method for generating diverse and natural descriptions for food images. They have highlighted the potential of
using generative models for generating diverse and natural descriptions of images, including food images,
which can be useful for various food image analysis tasks such as recipe generation and recommendation.
Eyke H¨ullermeier et al.[8] proposed a probabilistic classifier chain approach for multilabel classification tasks,
which aims to predict multiple labels for each instance. They proposed a Bayesian framework for learning the
optimal ordering of the classifiers and the optimal threshold values for each label. They also introduced a novel
objective function for evaluating the quality of the probabilistic classifier chains, which is based on the expected
loss for the multilabel classification task. They also demonstrated the potential of their method for food image
analysis tasks, such as food classification and ingredient recognition.
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[3803]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:05/Issue:03/March-2023 Impact Factor- 7.868 www.irjmets.com
S. Sankar et al.[11] (2021) provided an overview of recent advances in recipe generation using deep learning
methods. They covered various techniques used for recipe generation, including those based on food images
and discussed the challenges faced in the field, such as the need for large-scale datasets and the variability in
food appearance. They have also highlighted future research directions in the field.
K. Zhang et al.[12] summarized recent advances in food image analysis, including methods for recipe generation
from food images. They provided an overview of the challenges faced in this field, such as the variability in food
appearance, the need for large-scale datasets, and the lack of standard evaluation metrics. They have covered
various techniques used for food image analysis, including deep learning methods and feature-based methods
and also discussed the applications of food image analysis, such as dietary assessment and food recognition.
R. Varshney et al.[13] provided an overview of recent research in recipe generation from food images. They
discussed various approaches and models used in this field, such as deep learning-based methods, attention-
based methods, and transfer learning-based methods. They covered the evaluation metrics and datasets used
for recipe generation from food images and highlighted the challenges faced in this field, such as the need for
large-scale and diverse datasets and the difficulty in capturing the complexity of cooking procedures.
M. Raza et al.[14] provided an overview of various deep learning techniques, including convolutional neural
networks (CNNs), recurrent neural networks (RNNs), and generative adversarial networks (GANs). They
discussed the challenges faced in this field, such as the need for large-scale and diverse datasets, the difficulty in
capturing the variability in food appearance, and the lack of interpretability of deep learning models. They also
covered various applications of deep learning in food image analysis, including food recognition, food portion
estimation, and recipe generation.
L. Gao et al.[15] discussed the challenges associated with analyzing food images and generating recipes from
them and also discussed the different approaches and models used in this field, as well as the evaluation
metrics and datasets. They provided a detailed description of various deep learning architectures, including
Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Generative Adversarial
Networks (GANs), that have been employed for food image analysis and recipe generation.
M. Han et al.[16] proposed a deep learning approach to generate cooking recipes from food images. They
developed a system that can automatically generate a recipe by analyzing a food image, without any human
intervention. The system consists of two main components: ingredient prediction and cooking instruction
generation. For ingredient prediction, they used a convolutional neural network (CNN) to extract visual
features from food images and a long short-term memory (LSTM) network to model the textual features of
ingredients. For cooking instruction generation, they used an attention mechanism to combine the visual and
textual features of the ingredients and generate cooking instructions.
III. METHODOLOGY
Previously, food understanding efforts have primarily focused on categorizing food and ingredients. However, a
comprehensive visual food recognition system should not only recognize the type of food or its ingredients but
also comprehend its preparation process. The image-to-recipe problem has typically been treated as a retrieval
task, where a recipe is retrieved from a fixed dataset based on the image similarity score in an embedding
space. The effectiveness of these systems largely depends on the size and diversity of the dataset and the
quality of the learned embedding. As a result, these systems may fail when a matching recipe for the image
query is not present in the static data.
In the present methodology we are training CNN with recipe details and images and this model can be used to
predict recipe by uploading related images and we used 1 million recipe dataset and from this dataset we used
1000 recipes as training the entire dataset with images will take lots of memory and hours of time to train CNN
model.

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science


[3804]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:05/Issue:03/March-2023 Impact Factor- 7.868 www.irjmets.com

Fig 3.1: System Architecture


The methodology involves the following steps:
1. Data Collection: Collecting high-quality food images from a diverse set of sources is critical for building an
accurate and robust recipe generation model. It's important to ensure that the collected images cover a wide
range of cuisines, ingredients, and cooking styles.
2. Image Preprocessing: Preprocessing the food images using computer vision techniques can help to extract
useful features and improve the accuracy of the recipe generation model. This can involve techniques such as
resizing, normalization, and feature extraction using CNNs.
3. Recipe Generation: Generating high-quality and diverse recipes that match the input food image is a
challenging task that requires a combination of deep learning and optimization techniques. It's important to
ensure that the generated recipes are both feasible and appealing to the user.
IV. RESULTS AND DISCUSSIONS
The result of the project is mainly based on the performance of the trained deep learning models and the
evaluation metrics used. It involves measuring the accuracy of the generated recipes, which is often evaluated
based on the similarity of the generated recipe to the original recipe, which is usually sourced from the online
recipe databases. We have used more than 1000 images from the Recipe 1M dataset to train the model and got
an accuracy of 99.662% in predicting the recipe name, ingredients and recipe preparation.

Fig 4.1: Recipe CNN Accuracy and Loss Graph

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science


[3805]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:05/Issue:03/March-2023 Impact Factor- 7.868 www.irjmets.com
In above graph x-axis represents epochs and y-axis represents accuracy/loss value and blue line represents loss
and orange line represents accuracy and in above graph with each increasing epoch accuracy got increased to 1
(100%) and loss decreased to 0. Any CNN model whose accuracy is high and loss is less will be considered an
efficient model.
V. CONCLUSION
The aim of our study was to develop an image-to-recipe generation system that can produce a recipe, including
a title, ingredients, and cooking instructions, from a food image. Firstly, we demonstrated the significance of
modeling dependencies by predicting groups of ingredients from food images. Secondly, we investigated
instruction generation that is dependent on both images and inferred ingredients, emphasizing the need to
consider both modalities simultaneously. Finally, based on the outcomes of a user study, we verified the
complexity of the task and confirmed that our system outperforms existing image-to-recipe retrieval methods.
VI. REFERENCES
[1] LukasBossard, Matthieu Guillaumin, and Luc Van Gool. Food-101–mining discriminative components
with random forests. In ECCV, 2014.
[2] Micael Carvalho, R´emi Cad`ene, David Picard, Laure Soulier, Nicolas Thome, and Matthieu Cord. Cross-
modal retrieval in the cooking context: Learning semantic text-image embeddings. In SIGIR, 2018.
[3] Jing-Jing Chen and Chong-Wah Ngo. Deep-based ingredient recognition for cooking recipe retrieval. In
ACM Multimedia. ACM, 2016.
[4] Jing-Jing Chen, Chong-Wah Ngo, and Tat-Seng Chua. Cross-modal recipe retrieval with rich food
attributes. In ACM Multimedia. ACM, 2017.
[5] Mei-Yun Chen, Yung-Hsiang Yang, Chia-Ju Ho, Shih-Han Wang, Shane-Ming Liu, Eugene Chang, Che-Hua
Yeh, and Ming Ouhyoung. Automatic chinese food identification and quantity estimation. In SIGGRAPH
Asia 2012 Technical Briefs, 2012.
[6] Xin Chen, Hua Zhou, and Liang Diao. Chinese Food Net: A large-scale image dataset for Chinese food
recognition. CoRR, abs/1705.02743, 2017.
[7] Bo Dai, Dahua Lin, Raquel Urtasun, and Sanja Fidler. Towards diverse and natural image descriptions
via a conditional gan. ICCV, 2017.
[8] Krzysztof Dembczy´nski, Weiwei Cheng, and Eyke H¨ullermeier. Bayes optimal multilabel classification
via probabilistic classifier chains. In ICML, 2010.
[9] Angela Fan, Mike Lewis, and Yann Dauphin. Hierarchical neural story generation. In ACL, 2018.
[10] Claude Fischler. Food, self and identity. Information (International Social Science Council), 1988.
[11] Cooking with AI: A Survey on Recipe Generation using Deep Learning" by S. Sankar et al.(2021).
[12] "Food Image Analysis: A Review of Recent Advances and Challenges" by K. Zhang et al. (2021)
[13] "Recipe Generation from Food Images: A Survey" by R. Varshney et al. (2021).
[14] "Deep Learning for Food Image Analysis: A Review" by M. Raza et al. (2020)
[15] "Food Image Analysis and Recipe Generation: A Review" by L. Gao et al. (2020).
[16] "Recipe Generation from Food Images: A Deep Learning Approach" by M. Han et al. (2020).
[17] "Recipe Generation from Food Images using Attention-based Neural Networks" by D. Chaudhary et al.
(2020).
[18] "Food Recognition and Recipe Generation from Food Images: A Review" by P. Sharma et al. (2019).
[19] "Recipe Generation from Food Images using Deep Neural Networks" by H. Kim et al. (2019).
[20] "Recipe Generation from Food Images using Deep Learning and NLP" by Y. Liu et al. (2019).

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science


[3806]

You might also like