Can we trust LLM CALCULATIONS?.

Farmdude@lemmy.world · 3 months ago

Can we trust LLM CALCULATIONS?.

WolfLink@sh.itjust.works · 3 months ago

Just use Wolfram Alpha instead

BilboBargains@lemmy.world · 3 months ago

Use Wolfram Alpha for mathematics

SomeRandomNoob@discuss.tchncs.de · edit-2 3 months ago

short answer: no.

Long Answer: They are still (mostly) statisics based and can’t do real math. You can use the answers from LLMs as starting point, but you have to rigerously verify the answers they give.

unexposedhazard@discuss.tchncs.de · 3 months ago

The whole “two r’s in strawberry” thing is enough of an argument for me. If things like that happen at such a low level, its completely impossible that it wont make mistakes with problems that are exponentially more complicated than that.

AbouBenAdhem@lemmy.world · edit-2 3 months ago

Would you trust six mathematicians who claimed to have solved a problem by intuition, but couldn’t prove it?

That’s not how mathematics works: if you have to “trust” the answer, it isn’t even math.

bunchberry@lemmy.world · 3 months ago

I’ve used LLMs quite a few times to find partial derivatives / gradient functions for me, and I know it’s correct because I plug them into a gradient descent algorithm and it works. I would never trust anything an LLM gives blindly no matter how advanced it is, but in this particular case I could actually test the output since it’s something I was implementing in an algorithm, so if it didn’t work I would know immediately.

Farmdude@lemmy.world · 3 months ago

That’s rad, dude. I wish I knew how to do that. Hey, dude I imagined a cosmological model that fits the data with two fewer parameters then the standard model. Planke data. I I’ve checked the numbers, but I don’t have the credentials. I need somebody to check it out. This is a it and a verbal explanation for the model by Academia.edu. It’s way easier to listen first before looking. I don’t want recognition or anything. Just for someone to review it. It’s a short paper. https://youtu.be/_l8SHVeua1Y

zxqwas@lemmy.world · 3 months ago

Using a calculator or wolfram alpha or similar tools i don’t trust the answer unless it passes a few sanity checks. Frequently I am the source of error and no LLM can compensate for that.

Farmdude@lemmy.world · 3 months ago

It checked out. But, all six getting the same is likely incorrect?.

supersquirrel@sopuli.xyz · 3 months ago

Why would I bother?

Calculators exist, logic exists, so no… LLMs are a laughably bad fit for directly doing math, they are bullshit engines they cannot “store” a value without fundamentally exposing it to hallucinating tendencies which is the worst property a calculator could possibly have.

tal@olio.cafe · 3 months ago

Why would I bother?

Because you want to have a single interface that accepts natural-language input and gives answers.

That doesn’t mean that using an LLM as a calculator is a reasonable approach — though a larger system that incorporates an LLM might be. But I think that the goal is very understandable. I have Maxima, a symbolic math package, on my smartphone and computers. It’s quite competent at probably just about any sort of mathematical problem that pretty much any typical person might want to do. It costs nothing. But…you do need to learn something about the package to be able to use it. You don’t have to learn much of anything that a typical member of the public doesn’t already know to use a prompt that accepts natural-language input. And that barrier is enough that most people won’t use it.

Farmdude@lemmy.world · 3 months ago

It was about all six models getting the same answer from different accounts. I was testing it. Over a hundred each same numbers

Aatube@kbin.melroy.org · 3 months ago

this is a really weird premise. doing the same thing on 6 models is just not worth it especially when wolfram alpha exists and is far more trustable and speedy

FaceDeer@fedia.io · 3 months ago

If the LLMs are part of a modern framework I would expect that they should be calling out to Wolfram Alpha (or a similar specialized math-solver) via an API to get the answer for you, for that matter.

GrammarPolice@lemmy.world · 3 months ago

Finally an intelligent comment. So many comments in here that don’t realize most LLM’s are bundled with calculators that just do the math.

FaceDeer@fedia.io · 3 months ago

Anti-AI sentiment is extremely strong in every part of the Fediverse I’ve seen so far, usually my comments get downvoted heavily even when I’m just describing factual details of how it works. I expect a lot of people simply don’t bother after a while.

gedaliyah@lemmy.world · 3 months ago

Here’s an interesting post that gives a pretty good quick summary of when an LLM may be a good tool.

Here’s one key:

Machine learning is amazing if:

The problem is too hard to write a rule-based system for or the requirements change sufficiently quickly that it isn’t worth writing such a thing and,

The value of a correct answer is much higher than the cost of an incorrect answer.

The second of these is really important.

So if your math problem is unsolvable by conventional tools, or sufficiently complex that designing an expression is more effort than the answer is worth… AND ALSO it’s more valuable to have an answer than it is to have a correct answer (there is no real cost for being wrong), THEN go ahead and trust it.

If it is important that the answer is correct, or if another tool can be used, then you’re better off without the LLM.

The bottom line is that the LLM is not making a calculation. It could end up with the right answer. Different models could end up with the same answer. It’s very unclear how much underlying technology is shared between models anyway.

For example, if the problem is something like, "here is all of our sales data and market indicators for the past 5 years. Project how much of each product we should stock in the next quarter. " Sure, an LLM may be appropriately close to a professional analysis.

If the problem is like “given these bridge schematics, what grade steel do we need in the central pylon?” Then, well, you are probably going to be testifying in front of congress one day.

Rhaedas@fedia.io · 3 months ago

How trustable the answer is depends on knowing where the answers come from, which is unknowable. If the probability of the answers being generated from the original problem are high because it occurred in many different places in the training data, then maybe it’s correct. Or maybe everyone who came up with the answer is wrong in the same way and that’s why there is so much correlation. Or perhaps the probability match is simply because lots of math problems tend towards similar answers.

The core issue is that the LLM is not thinking or reasoning about the problem itself, so trusting it with anything is more assuming the likelihood of it being right more than wrong is high. In some areas this is safe to do, in others it’s a terrible assumption to make.

Farmdude@lemmy.world · 3 months ago

I’m a little confused after listening to a podcast with… Damn I can’t remember his name. He’s English. They call him the godfather of AI. A pioneer.

Well, he believes that gpt 2-4 were major breakthroughs in artificial infection. He specifically said chat gpt is intelligent. That some type of reasoning is taking place. The end of humanity could come in a year to 50 years away. If the fella who imagined a Neural net that is mapped using the human brain. And this man says it is doing much more. Who should I listen too?. He didn’t say hidden AI. HE SAID CHAT GPT. HONESTLY ON OFFENSE. I JUST DON’T UNDERSTAND THIS EPIC SCENARIO ON ONE SIDE AND TOTALLY NOTHING ON THE OTHER

OwlPaste@lemmy.world · 3 months ago

no, once i tried to do binary calc with chat gpt and he keot giving me wrong answers. good thing i had sone unit tests around that part so realised quickly its lying

Farmdude@lemmy.world · 3 months ago

But, if you ran, gave the problem to all the top models and got the same? Is it still likely an incorrect answer? I checked 6. I checked a bunch of times. Different accounts. I was testing it. I’m seeing if its possible with all that in others opinions I actually had to check over a hundred times each got the same numbers.