Logo del repository
  1. Home
 
Opzioni

2D Prediction of the Nutritional Composition of Dishes from Food Images: Deep Learning Algorithm Selection and Data Curation Beyond the Nutrition5k Project

Bianco R.
•
Coluccia S.
•
Marinoni M.
altro
Parpinel M.
2025
  • journal article

Periodico
NUTRIENTS
Abstract
Background/Objectives: Deep learning (DL) has shown strong potential in analyzing food images, but few studies have directly predicted mass, energy, and macronutrient content from images. In addition to the importance of high-quality data, differences in country-specific food composition databases (FCDBs) can hinder model generalization. Methods: We assessed the performance of several standard DL models using four ground truth datasets derived from Nutrition5k—the largest image–nutrition dataset with ~5000 complex US cafeteria dishes. In light of developing an Italian dietary assessment tool, these datasets varied by FCDB alignment (Italian vs. US) and data curation (ingredient–mass correction and frame filtering on the test set). We evaluated combinations of four feature extractors [ResNet-50 (R50), ResNet-101 (R101), InceptionV3 (IncV3), and Vision Transformer-B-16 (ViT-B-16)] with two regression networks (2+1 and 2+2), using IncV3_2+2 as the benchmark. Descriptive statistics (percentages of agreement, unweighted Cohen’s kappa, and Bland–Altman plots) and standard regression metrics were used to compare predicted and ground truth nutritional composition. Dishes mispredicted by ≥7 algorithms were analyzed separately. Results: R50, R101, and ViT-B-16 consistently outperformed the benchmark across all datasets. Specifically, when replacing it with these top algorithms, reductions in median Mean Absolute Percentage Errors were 6.2% for mass, 6.4% for energy, 12.3% for fat, and 33.1% and 40.2% for protein and carbohydrates. Ingredient–mass correction substantially improved prediction metrics (6–42% when considering the top algorithms), while frame filtering had a more limited effect (<3%). Performance was consistently poor across most models for complex salads, chicken-based or eggs-based dishes, and Western-inspired breakfasts. Conclusions: The R101 and ViT-B-16 architectures will be prioritized in future analyses, where ingredient–mass correction and automated frame filtering methods will be considered.
DOI
10.3390/nu17132196
WOS
WOS:001527154500001
Archivio
https://hdl.handle.net/11390/1309918
info:eu-repo/semantics/altIdentifier/scopus/2-s2.0-105010658752
https://ricerca.unityfvg.it/handle/11390/1309918
Diritti
open access
Soggetti
  • 2D prediction of nutr...

  • deep learning

  • dietary assessment to...

  • energy prediction

  • food image recognitio...

  • frame filtering

  • macronutrients predic...

  • mass prediction

  • Nutrition5k

  • portion size correcti...

google-scholar
Get Involved!
  • Source Code
  • Documentation
  • Slack Channel
Make it your own

DSpace-CRIS can be extensively configured to meet your needs. Decide which information need to be collected and available with fine-grained security. Start updating the theme to match your nstitution's web identity.

Need professional help?

The original creators of DSpace-CRIS at 4Science can take your project to the next level, get in touch!

Realizzato con Software DSpace-CRIS - Estensione mantenuta e ottimizzata da 4Science

  • Impostazioni dei cookie
  • Informativa sulla privacy
  • Accordo con l'utente finale
  • Invia il tuo Feedback