Apple Researchers Question AI’s Formal Reasoning Capabilities in Mathematics, Find LLM Responding Differently to Same Question

A team of Apple researchers has questioned the formal reasoning capabilities of large language models (LLMs), particularly in mathematics. They found that LLMs exhibit noticeable variance when responding to different instantiations of the same question.

IANS Oct 12, 2024 11:10 AM IST

Artificial Intelligence Representational Image (Photo Credit: Wikimedia Commons)

New Delhi, October 12: A team of Apple researchers has questioned the formal reasoning capabilities of large language models (LLMs), particularly in mathematics. They found that LLMs exhibit noticeable variance when responding to different instantiations of the same question. Literature suggests that the reasoning process in LLMs is probabilistic pattern-matching rather than formal reasoning.

Although LLMs can match more abstract reasoning patterns, they fall short of true logical reasoning. Small changes in input tokens can drastically alter model outputs, indicating a strong token bias and suggesting that these models are highly sensitive and fragile. “Additionally, in tasks requiring the correct selection of multiple tokens, the probability of arriving at an accurate answer decreases exponentially with the number of tokens or steps involved, underscoring their inherent unreliability in complex reasoning scenarios,” said Apple researchers in their paper titled “GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models.” Apple Swift Student Challenge 2025 To Open in February; Check Participation and Other Details.

The ‘GSM8K’ benchmark is widely used to assess the mathematical reasoning of models on grade-school level questions. While the performance of LLMs on GSM8K has significantly improved in recent years, it remains unclear whether their mathematical reasoning capabilities have genuinely advanced, raising questions about the reliability of the reported metrics. To address these concerns, the researchers conducted a large-scale study on several state-of-the-art open and closed models. Ryan Salame, Convict in FTX Cryptocurrency Fraud Case, Shares News of Getting Jail Sentence on LinkedIn, Says ‘Starting a New Position As Inmate at FCI Cumberland’

“To overcome the limitations of existing evaluations, we introduce GSM-Symbolic, an improved benchmark created from symbolic templates that allow for the generation of a diverse set of questions,” the authors wrote. GSM-Symbolic enables more controllable evaluations, providing key insights and more reliable metrics for measuring the reasoning capabilities of models. “Our findings reveal that LLMs exhibit noticeable variance when responding to different instantiations of the same question,” said researchers, adding that overall, "our work provides a more nuanced understanding of LLMs’ capabilities and limitations in mathematical reasoning”.

(The above story first appeared on LatestLY on Oct 12, 2024 11:10 AM IST. For more news and updates on politics, world, sports, entertainment and lifestyle, log on to our website latestly.com).

Apple Researchers Question AI’s Formal Reasoning Capabilities in Mathematics, Find LLM Responding Differently to Same Question

A team of Apple researchers has questioned the formal reasoning capabilities of large language models (LLMs), particularly in mathematics. They found that LLMs exhibit noticeable variance when responding to different instantiations of the same question.

Tags

You Might Also Like

CES 2025: Samsung Display Unveil Versatile Lineup of Foldable, OLED Screens at Upcoming Consumer Electronics Show on January 7

India Committed to Taking Lead in AI With Focus on Innovation and Creating Opportunities for Youth: PM Narendra Modi

iPhone 16 Pro Max Price Cut: Apple’s Top Premium Smartphone Available at Over INR 14,000 Discounted Price on Indian E-Commence Website; Check Details

iPhone SE4 Launch May Take Place in Q1, 2025, Likely To Rename As iPhone 16e; Check Expected Price, Specifications, Features

Latest NEWS

Mamata Banerjee Birthday 2024: PM Narendra Modi, Congress President Mallikarjun Kharge Greet West Bengal CM and TMC Supremo on Her 70th Birthday

Guru Gobind Singh Jayanti 2025 Images and HD Wallpapers for Free Download Online: Celebrate Guru Gobind Singh Ji Parkash Purab by Sharing Greetings, Quotes and WhatsApp Status Messages

IND vs AUS Border-Gavaskar Trophy 2024-25 Series Results In A Nutshell: From India's Dominant Win In Perth to Australia's Thumping Victory in Sydney, a Look at What Happened in All Five Matches

India Achieves Major Milestone by Solidifying Position As World’s Third-Largest Metro System, Reaches 1,000 km Network

When is South Africa vs Australia ICC WTC 2023-25 Final? Know Date and Time in IST of World Test Championship Summit Clash

IND vs AUS 5th Test 2024-25 Day 3 Video Highlights: Watch Australia Beat India By Six Wickets To Claim Border-Gavaskar Trophy After A Decade

TRENDING NEWS

Categories