Apple Researchers Question AI’s Formal Reasoning Capabilities in Mathematics, Find LLM Responding Differently to Same Question

A team of Apple researchers has questioned the formal reasoning capabilities of large language models (LLMs), particularly in mathematics. They found that LLMs exhibit noticeable variance when responding to different instantiations of the same question.

Technology IANS| Oct 12, 2024 11:10 AM IST

A+

A-

New Delhi, October 12: A team of Apple researchers has questioned the formal reasoning capabilities of large language models (LLMs), particularly in mathematics. They found that LLMs exhibit noticeable variance when responding to different instantiations of the same question. Literature suggests that the reasoning process in LLMs is probabilistic pattern-matching rather than formal reasoning.

Although LLMs can match more abstract reasoning patterns, they fall short of true logical reasoning. Small changes in input tokens can drastically alter model outputs, indicating a strong token bias and suggesting that these models are highly sensitive and fragile. “Additionally, in tasks requiring the correct selection of multiple tokens, the probability of arriving at an accurate answer decreases exponentially with the number of tokens or steps involved, underscoring their inherent unreliability in complex reasoning scenarios,” said Apple researchers in their paper titled “GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models.” Apple Swift Student Challenge 2025 To Open in February; Check Participation and Other Details.

The ‘GSM8K’ benchmark is widely used to assess the mathematical reasoning of models on grade-school level questions. While the performance of LLMs on GSM8K has significantly improved in recent years, it remains unclear whether their mathematical reasoning capabilities have genuinely advanced, raising questions about the reliability of the reported metrics. To address these concerns, the researchers conducted a large-scale study on several state-of-the-art open and closed models. Ryan Salame, Convict in FTX Cryptocurrency Fraud Case, Shares News of Getting Jail Sentence on LinkedIn, Says ‘Starting a New Position As Inmate at FCI Cumberland’

“To overcome the limitations of existing evaluations, we introduce GSM-Symbolic, an improved benchmark created from symbolic templates that allow for the generation of a diverse set of questions,” the authors wrote. GSM-Symbolic enables more controllable evaluations, providing key insights and more reliable metrics for measuring the reasoning capabilities of models. “Our findings reveal that LLMs exhibit noticeable variance when responding to different instantiations of the same question,” said researchers, adding that overall, "our work provides a more nuanced understanding of LLMs’ capabilities and limitations in mathematical reasoning”.

(The above story first appeared on LatestLY on Oct 12, 2024 11:10 AM IST. For more news and updates on politics, world, sports, entertainment and lifestyle, log on to our website latestly.com).

City	Petrol	Diesel
New Delhi	96.72	89.62
Kolkata	106.03	92.76
Mumbai	106.31	94.27
Chennai	102.74	94.33

City

Petrol

Diesel

New Delhi

96.72

89.62

Kolkata

106.03

92.76

Mumbai

106.31

94.27

Chennai

102.74

94.33

Football Legend Ronaldinho Arrives in Chennai Ahead of India-All Stars vs Brazil Legends Clash (Watch Video)

Bitcoin Price Today, March 29, 2025: BTC Price Stands at USD 83,575.21 Amid Fluctuations, Drops From Recent High of USD 88,000

Rajasthan Weather Update and Forecast: Cold Winds From North India Cause Significant Drop in Temperature Across State; Further Decline Expected in Next 2 Days

Virat Kohli, MS Dhoni Showcase Special Bond, Legends Engage in Light-Hearted Moments After CSK vs RCB IPL 2025 Match (Watch Video)

Eid Moon Sighting 2025 Live News Updates: Eid al-Fitr Date Announcement Awaited From New Zealand, UK, US and Canada

Bangladesh’s Wicketkeeper-Batter Litton Das To Skip Zimbabwe Test Series for Pakistan Super League 2025

Kerala Lottery Result Today 3 PM Live, Karunya KR-699 Lottery Result of 29.03.2025, Watch Lucky Draw Winner List

Rajat Patidar Removes Cap Before Shaking Hands With MS Dhoni Following Royal Challengers Bengaluru's Thumping Win Over Chennai Super Kings in IPL 2025, Video Goes Viral

Earthquake in Thailand and Myanmar: Elon Musk’s Starlink Announces To Deploy Kits To Assist With Communications Needs and Relief Efforts

‘Ye Pakka Mera Vibrator Udhaar Maangne Wali Hai’: Comedian Swati Sachdeva’s Joke Involving Her Mother and Vibrator Sparks Controversy; Video Surfaces

Apple Researchers Question AI’s Formal Reasoning Capabilities in Mathematics, Find LLM Responding Differently to Same Question

A team of Apple researchers has questioned the formal reasoning capabilities of large language models (LLMs), particularly in mathematics. They found that LLMs exhibit noticeable variance when responding to different instantiations of the same question.

xAI Buys X: Elon Musk’s AI Company Acquires Social Platform for USD 33 Billion To Expand X’s Massive Reach Using Advanced AI Capabilities

Fake Paneer Used in Pizza, Burger? McDonald’s and Domino’s Refute Allegations After Influencer Apple Tiwari’s Iodine Tincture Test Video Sparks Concerns

Ghibli-Style Image Trend Takes Over Instagram and Elon Musk-Run X After OpenAI’s GPT-4o Update

Shruthi Narayanan’s Leaked Private Video Fake? Tamil Actress Breaks Silence, Hints at AI Involvement in Casting Couch Scandal (View Post)

Football Legend Ronaldinho Arrives in Chennai Ahead of India-All Stars vs Brazil Legends Clash (Watch Video)

Bitcoin Price Today, March 29, 2025: BTC Price Stands at USD 83,575.21 Amid Fluctuations, Drops From Recent High of USD 88,000

Rajasthan Weather Update and Forecast: Cold Winds From North India Cause Significant Drop in Temperature Across State; Further Decline Expected in Next 2 Days

Virat Kohli, MS Dhoni Showcase Special Bond, Legends Engage in Light-Hearted Moments After CSK vs RCB IPL 2025 Match (Watch Video)

Eid Moon Sighting 2025 Live News Updates: Eid al-Fitr Date Announcement Awaited From New Zealand, UK, US and Canada

Bangladesh’s Wicketkeeper-Batter Litton Das To Skip Zimbabwe Test Series for Pakistan Super League 2025

Earthquake in Bangkok: 43 Missing After Under-Construction Skyscraper Collapses As Massive Quake Jolts Thailand Capital

Rohit Sharma, Virat Kohli, Ravindra Jadeja Set For Demotion In BCCI Contract List? Nitish Kumar Reddy, Harshit Rana, and Abhishek Sharma To Earn New Deals: Report

‘Anti-Democratic Leader’: Mamata Banerjee Heckled by Students During Speech at Kellogg College (Watch Video)

Miami Open 2025: Novak Djokovic Beats Sebastian Korda, Advances to Semifinals in Push for 7th Tournament Title

Shruthi Narayanan’s Leaked Private Video Fake? Tamil Actress Breaks Silence, Hints at AI Involvement in Casting Couch Scandal (View Post)

Bengaluru Shocker: Woman Murdered, Body Found Stuffed in Suitcase at Rented House; Police Arrest Husband From Pune

Short Videos

Editor's Choice

Kerala: Firefighters Called to Hospital After Doctors Fail To Remove Metal Nut Stuck on Man’s Private Parts in Kasaragod

Banking Rules Changing From April 1, 2025: From Change in Minimum Balance Requirement to Revised Interest Rates, Check New Banking Rules Coming Into Effect From Next Month

Fake Paneer Used in Pizza, Burger? McDonald’s and Domino’s Refute Allegations After Influencer Apple Tiwari’s Iodine Tincture Test Video Sparks Concerns

7th Pay Commission DA Hike: Here’s How Much Salary Will Increase for Central Govt Employees After 2% Dearness Allowance Hike

Trending Topics