Download PDFOpen PDF in browser

A Comparative Study of Early Fusion and Multimodal Siamese Neural Network in Food Classification

EasyChair Preprint 15687

11 pagesDate: January 8, 2025

Abstract

The shift from traditional diets to those high in fat and calories has led to an increase in obesity and related health issues, emphasizing the need for accurate dietary monitoring. Automated systems utilizing artificial intelligence (AI) have emerged as promising tools for providing personalized dietary advice through food classification. With the growing volume of food-related content on social media, including images and accompanying text, there is an increasing need to leverage multimodal data for more accurate predictions. This study compares two fusion techniques for food classification: early fusion and the multimodal Siamese Neural Network (mSNN). Using the UPMCFood-101 dataset, which includes images and text descriptions across 101 food categories, the analysis focuses on three specific classes: bread pudding, chicken wings, and waffles. While the study focuses on these three classes, the flexible architecture of both models suggests their potential for generalization to other food categories. The early fusion model demonstrated strong generalization, achieving an overall accuracy of 0.960. In contrast, the mSNN, trained with 72,000 pairs, achieved a peak accuracy of 0.976, outperforming the early fusion model in precision, recall, and accuracy, particularly with smaller image text per class in databases. However, the mSNN's performance declined with larger databases due to outlier effects that skewed average distance calculations, leading to reduced accuracy. These findings suggest that while the mSNN is more accurate with smaller datasets, the early fusion model provides better generalization.

Keyphrases: Multimodal Siamese Neural Network, early fusion, multimodal

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@booklet{EasyChair:15687,
  author    = {Kanokporn Sintarasirikulchai and Akarin Phaibulpanich and Seksan Kiatsupaibul},
  title     = {A Comparative Study of Early Fusion and Multimodal Siamese Neural Network in Food Classification},
  howpublished = {EasyChair Preprint 15687},
  year      = {EasyChair, 2025}}
Download PDFOpen PDF in browser