Logo del repository
  1. Home
 
Opzioni

Evaluating LLMs Capabilities at Natural Language to Logic Translation: A Preliminary Investigation

Brunello A.
•
Ferrarese R.
•
Geatti L.
altro
Saccomanno N.
2025
  • conference object

Abstract
Translating natural language (NL) into logical formalisms like First-Order Logic (FOL) has long been a challenge across multiple disciplines, including mathematics, computer science, and education. Traditional computational linguistics methods have struggled with this task due to the complexity and ambiguity of natural language. However, advancements in Natural Language Processing (NLP), particularly the introduction of Large Language Models (LLMs), have opened up new possibilities for tackling this challenge. Despite their potential, a systematic approach to evaluating the performance of LLMs in NL-to-FOL translation is still lacking. In this study, we take a first step towards filling in this gap. We examine a large dataset based on students’ efforts in formalizing natural language statements from the book “Language, Proof, and Logic”. Based on this dataset, we propose a preliminary evaluation pipeline to assess LLM performance in NL-to-FOL translation tasks, considering both syntactic and semantic aspects. We then apply this pipeline to evaluate two recent LLMs, Meta’s Llama 3.1 (8B) and Google DeepMind’s Gemma 2 (9B). Our findings validate the proposed approach, revealing key similarities and differences between LLM-generated and student-produced formulas, and provide valuable insights into the current capabilities of LLMs in this domain.
Archivio
https://hdl.handle.net/11390/1300826
info:eu-repo/semantics/altIdentifier/scopus/2-s2.0-85216652388
https://ricerca.unityfvg.it/handle/11390/1300826
Diritti
open access
license:creative commons
license uri:http://creativecommons.org/licenses/by/4.0/
Soggetti
  • First Order Logic

  • Formal Method

  • Large Language Model

  • Natural Language Proc...

  • Translation

google-scholar
Get Involved!
  • Source Code
  • Documentation
  • Slack Channel
Make it your own

DSpace-CRIS can be extensively configured to meet your needs. Decide which information need to be collected and available with fine-grained security. Start updating the theme to match your nstitution's web identity.

Need professional help?

The original creators of DSpace-CRIS at 4Science can take your project to the next level, get in touch!

Realizzato con Software DSpace-CRIS - Estensione mantenuta e ottimizzata da 4Science

  • Impostazioni dei cookie
  • Informativa sulla privacy
  • Accordo con l'utente finale
  • Invia il tuo Feedback