Bro. You are not addressing the dichotomy. Why is it in some instances you need ...

rvz · on Sept 12, 2023

> Bro. You are not addressing the dichotomy. Why is it in some instances you need stats and huge sample sizes and in other instances you need only one witness to convict someone for murder?

Again, you're continuing to compare apples and oranges from different professions and applying it here which it is nonsensical. The unpredictable behaviour of LLMs like ChatGPT tells us that not only it is non-deterministic but it cannot be trusted at all and in that context, it always requires humans to check it and for the case of how reliable it is it needs much more experiments and scientific methods by others to attest that.

This is exactly why medical professionals in this case will laugh at your question. For clinicians to use one study as the truth and basis with a low sample of 1 or 30 to show whether a medical device is reliable, especially an AI as a medical device is beyond ridiculous.

> All your doing is regurgitating the same bs everyone knows about the nature of science, statistics and sample sizes that are all blindingly obvious to anyone.

So why aren't you able to understand that then? Except that your mistake was to begin by comparing the research methods in the legal profession and the medical profession and to use that in this case as a flawed analogy to show that 'reliability' is the same in both of them. Which I'm afraid you only confused yourself.

> I am presenting to you a valid dilemma...

Which again is irrelevant and besides the point. Everything after that is on the point of what I've already said on transparent explainability, which it is known that LLMs and AI models such as ChatGPT cannot reason or explain their decisions transparently, and thus need examination and further reproducibility by others due to their their unpredictable behaviour.

Such a system used for medical advice is quite frankly unsatisfactory for physicians and other medical professionals. Just because it worked for someone does not mean it is reliable and works for everyone else. Hence me asking you for a significantly larger sample size and more clinical research papers with ChatGPT being used.

> Bro. You're irrational. chatGPT beat 17 doctors in seconds and it doesn't even phase you. But your mom who has likely never taken an IQ test doesn't need one test to verify her intelligence.

I'm not the one drawing wild conclusions of reliability over one study and then suggesting that we can replace all doctors with ChatGPT just because of one anecdote showing it was correct for one it is also correct for others, which given its unpredictability that in itself is beyond illogical. You clearly are doing just that.

As long as it is a black-box AI model, physicians and medical professionals will always scrutinise its unpredictable nature and reliability rather than you trusting whatever diagnosis it gives as the truth.

> Go above the level of scientific rigor. Einstein didn't need sample sizes and statistics to speculate about black holes and general relativity. The verification came later, but the math and nature of reality was formulated and judged correct through common sense I described above. This was way before anything Einstein proposed was verified by "stats".

What does Einstein speculating his equations have to do with showing how an AI system that is non-deterministic is also reliable? There are different methods in showing this reliability as I have explained already and you bringing that up is an irrelevant distraction.

> Do you not have the ability to bypass statistical data and formulate conclusions without it? Looks like no.

The entire point IS conclusively showing reliability which you using ONE paper with a low sample size you used for that as the basis of that claim is laughably insufficient for clinicians to draw an overall conclusion to show how ChatGPT is more reliable than doctors to the point where it is safe for medical advice or completely replacing doctors (which isn't going to happen anyway).

corethree · on Sept 12, 2023

> Again, you're continuing to compare apples and oranges from different professions and applying it here which it is nonsensical. The unpredictable behaviour of LLMs like ChatGPT tells us that not only it is non-deterministic but it cannot be trusted at all and in that context, it always requires humans to check it and for the case of how reliable it is it needs much more experiments and scientific methods by others to attest that.

Where's the science on this I need reams and reams of hard data and several scientific papers to prove this because nothing exists in reality until there's a scientific papers written about it.

No I'm kidding don't actually give me science on this. Everything you said is a conclusion easily arrived at with just intuition, experience and common sense. You violate your own principles everytime you make a statement without a citation to a rigorous long winded scientific paper.

You realize witnesses are non deterministic too? Yet a judge only needs one to convict a murderer. Non determinism doesn't mean jack in this Convo.

You talk about medical professionals laughing in my face do you mean the 17 professionals mentioned in the article who for 10 years failed to diagnose a simple issue? You think anybody cares for them laughing?

>What does Einstein speculating his equations have to do with showing how an AI system that is non-deterministic is also reliable? There are different methods in showing this reliability as I have explained already and you bringing that up is an irrelevant distraction.

It's relevant. You're just excited thinking the conversation is going in some strange direction of ultimate statistical rigor as the only valid topic of conversation.

I bring up Einstein to show you we can talk about well believed and highly esteemed topics that have zero statistical verification and it is valid from the standpoint of scientists and "professionals".

I'm saying we don't need that level of rigor to talk about things that involve common sense.

Science has weaknesses. The first aspect of it that's weak is it's fucking slow and expensive. The second thing is that a fundamental point of science is that nothing can proven to be true. Statistics does not have the ability for proving anything. In the end you're still speculating with science.

>The entire point IS conclusively showing reliability which you using ONE paper with a low sample size you used for that as the basis of that claim is laughably insufficient for clinicians to draw an overall conclusion to show how ChatGPT is more reliable than doctors to the point where it is safe for medical advice or completely replacing doctors (which isn't going to happen anyway).

And I'm saying your entire point is wrong. My point is right. You need to follow my point which is this:

I can come to very real real conclusions about chatGPT and about LLMs without the need of resorting to science and statistical samples to verify statements in the same way you can make conclusions about your mom and her status as an intelligent being.

Also I never said chatGPT is overall more reliable then doctors. I think of it as the precursor to the thing that will replace them. That's a highly reasonable speculation that can be made with zero science needed.

The anecdotal data of 17 doctors failing here is valid supporting evidence for that speculation.

rvz · on Sept 12, 2023

> Where's the science on this I need reams and reams of hard data and several scientific papers to prove this because nothing exists in reality until there's a scientific papers written about it.

You tell me, since I've already asked you to find another paper with a larger sample size, yet clearly you're struggling again to find one after judging paper you used by its headline than actually reading it and its limitations.

> No I'm kidding don't actually give me science on this. Everything you said is a conclusion easily arrived at with just intuition, experience and common sense. You violate your own principles everytime you make a statement without a citation to a rigorous long winded scientific paper.

Perhaps you need to search as to what the whole point of explainability is in LLMs and why clinicians and physicians refer to these systems as untrustworthy black-box systems who's output cannot be trusted and still needs human medical professionals to check its output.

> You realize witnesses are non deterministic too? Yet a judge only needs one to convict a murderer. Non determinism doesn't mean jack in this Convo.

Except that the difference is humans can be held to account and transparently explain themselves when something goes wrong. An AI cannot explain transparently reason nor explain itself other than repeat and reword its own response and can't figure out it's own errors even when you point it out.

Non-determinism in LLMs is completely relevant due to the opaqueness as to how LLMs do their decision-making. Hence that, given an AI misdiagnoses a patient and lacks the transparent reasoning to show why it is wrong, then tells us it is untrustworthy to clinicians. Showing that 17 doctors couldn't diagnose a patient and ChatGPT could in ONE case does not mean it is 'reliable'.

Clinicians are interested in larger sample sizes in trials before making a judgement in the overall error rate and reliability in how effective a medical device is.

Everything beyond what you said mentioned is irrelevant.

> You talk about medical professionals laughing in my face do you mean the 17 professionals mentioned in the article who for 10 years failed to diagnose a simple issue? You think anybody cares for them laughing?

I'm still laughing at you for showing ONE clinical example and you proclaiming that as conclusive proof that LLMs can be used for medical advice and to completely replace all doctors. You realize that they can give the incorrect diagnosis at random? The still unanswered question is how effective it is over a large amount of cases and a sample size. i.e trials. Not one.

> Science has weaknesses. The first aspect of it that's weak is it's fucking slow and expensive. The second thing is that a fundamental point of science is that nothing can proven to be true. Statistics does not have the ability for proving anything. In the end you're still speculating with science.

Once again, as you have admitted already, one anecdote does not show that something is reliable. The point as which medical trials exist to test how reliable a system is, instead of releasing something that has been untested over a single paper which you seem to believe should happen, because of your own assumptions.

> And I'm saying your entire point is wrong. My point is right. You need to follow my point which is this:

Nope. You believe your opinion is 'right' over ONE anecdote and a single study which scratches the surface. Where as since from the beginning of deep neural networks which LLMs are based on, they fundamentally are black box systems and clinicians using them for diagnosis is unexplainable to them and showing those distant examples is unconvincing to them. Again, what about the number of cases over a larger sample size which it shows the incorrect diagnosis than the correct diagnosis?

Do you not realize why ChatGPT and others have a disclaimer that it CANNOT be used for giving medical advice?

> Also I never said chatGPT is overall more reliable then doctors. I think of it as the precursor to the thing that will replace them. That's a highly reasonable speculation that can be made with zero science needed.

Given an LLM frequently hallucinates and is a opaque system, they will always need human doctors to check that their decisions are not incorrect. Fully replacing doctors with opaque AI systems with that fact, is a wild speculation and even it one happens, people will trust humans more than an unattended AI system or a hypothetical AI-only system which no-one is held to account when the AI makes a mistake.

> The anecdotal data of 17 doctors failing here is valid supporting evidence for that speculation.

One case study of ChatGPT getting 1 diagnosis right does not tell us how reliable it is against a larger sample size, many other cases where it got its diagnosis incorrect on a larger scale which clinicians are looking for to show its effectiveness.

corethree · on Sept 14, 2023

First, your insistence on scientific rigor is laudable but, quite frankly, limited in scope. We're on the cusp of a new era, and your demand for reams of data misses the point: it's not just about what we can prove right now; it's about the trajectory we're on. And let me tell you, that trajectory is heading towards AI surpassing human capability, whether you like it or not.

You talk about LLMs like ChatGPT being "black boxes," implying that's a reason they can't replace humans. Let me clue you in: medicine was a black box for centuries! And yet, we didn't sit around waiting for the perfect solution; we innovated, learned, and improved. Why shouldn't we expect the same trajectory for AI? Machine learning models are already becoming more explainable, and they'll only get better.

On the topic of accountability, you act as though it's an exclusively human trait. Let me burst that bubble for you. Accountability can be programmed, designed, and regulated into an AI system. Humans wrote the laws that hold people accountable; who's to say we can't draft a new legal framework for AI? The goal isn't to mimic human accountability but to surpass it, creating a system that not only learns from its mistakes but also minimizes them to an extent that humans cannot.

You dismiss the non-determinism of AI as a fatal flaw. But isn't that a human trait, too? How many times have medical professionals changed their "expert opinions" based on new evidence? The fact is, non-determinism exists everywhere, but what AI has the potential to offer is a level of data analysis and rapid adaptation that humans can't match.

As for the anecdote about the 17 doctors? Don't trivialize that. It's not just a point of failure for those specific doctors; it's a symptom of a flawed and fallible system. To argue that AI can't replace doctors because of one paper or anecdote is to entirely miss the point: we're not talking about the technology of today but of the technology of tomorrow. AI is on a path to becoming more reliable, more accountable, and more efficient than human medical professionals.

So yes, my point is that AI doesn't just have the potential to supplement human roles; it has the potential to replace them. Not today, maybe not tomorrow, but eventually. And it's not because AI is perfect; it's because it has the potential to be better, to continually improve in ways and at speeds that humans can't match.

We're not just dabbling in speculation here; we're tapping into a future that's hurtling toward us. If you're not prepared for it, you're not just standing in the way of progress; you're standing on the tracks. Prepare to get run over.

I can now get into a car driven by AI and go wherever I want. 2 years ago people like you were saying it's a pipe dream. You need a certain level of brain power an IQ of 90+ to realize that despite the fact that this anecdotal snippet of progress isn't scientifically rigorous it's a datapoint as strong as 17 doctors failing in front of chatGPT. It allows us to speculate realistically without the need for science.

rvz · on Sept 16, 2023

> First, your insistence on scientific rigor is laudable but, quite frankly, limited in scope. We're on the cusp of a new era, and your demand for reams of data misses the point:

This is a case of reliability which requires an abundance of evidence of it in many parameters including a larger sample size which my question remains unanswered. You showing me one data point, does not remotely conclude LLMs are reliable for this use-case, especially for medical professionals.

> it's not just about what we can prove right now; it's about the trajectory we're on. And let me tell you, that trajectory is heading towards AI surpassing human capability, whether you like it or not.

For serious high risk use-cases (legal, financial, medical, transportation, etc) all require the reliability case to obtain the trust of the human. That needs extensive evidence, research, etc of the system working reliably which you have only shown only one data point which professionals cannot work with to make a conclusion on reliability at all.

> You talk about LLMs like ChatGPT being "black boxes," implying that's a reason they can't replace humans.

We're talking about clinicians; a high risk profession which it is almost certain that LLMs cannot fully replace all of them as I have already explained. As long as a human needs to check their outputs, then that will remain the case, by default.

> medicine was a black box for centuries! And yet, we didn't sit around waiting for the perfect solution; we innovated, learned, and improved. Why shouldn't we expect the same trajectory for AI? Machine learning models are already becoming more explainable, and they'll only get better.

That isn't the point. Clinicians have used other tools which are far more transparent than deep neural networks / LLMs and the massive disadvantage for LLMs has always been unable to transparently show its decision process and explaining itself.

There is a significant difference in the explainability of an LLM than with typical machines learning methods which don't use neural networks, and it has been known for decades that clinicians have a very low trust in using such systems unattended and in general, hence the back-peddling of disclaimers of never using these systems for medical advice, financial and legal advice, etc.

> Accountability can be programmed, designed, and regulated into an AI system....

Like what? So called 'guardrails' which have been found to have been broken into all the time? At least with human doctors, even if something goes wrong, there is always someone that is held to account to explain what exactly was the issue and what happened.

The fact that these AI systems still require a human to supervise it defeats the point of trusting it to fully replace all human doctors due to its frequent failure to explain transparently whenever one needs to understand its decisions.

> You dismiss the non-determinism of AI as a fatal flaw. But isn't that a human trait, too? How many times have medical professionals changed their "expert opinions" based on new evidence? The fact is, non-determinism exists everywhere, but what AI has the potential to offer is a level of data analysis and rapid adaptation that humans can't match.

It is a fatal flaw, made worse with the choice of AI system for the intended use-case and not every problem can be solved with an LLM, including social problems that need human interaction. As humans are able to reason and explain their decision process, LLMs have no concept of such a thing, even if their own creators claims to do so.

It is fundamental and by design for LLMs and related systems. Everything else beyond that is speculative or even science fiction.

> I can now get into a car driven by AI and go wherever I want. 2 years ago people like you were saying it's a pipe dream. You need a certain level of brain power an IQ of 90+ to realize that despite the fact that this anecdotal snippet of progress isn't scientifically rigorous it's a datapoint as strong as 17 doctors failing in front of chatGPT. It allows us to speculate realistically without the need for science.

Self-driving cars that are meant to drive as well or even better than a human can in all conditions is a science fiction pipe dream (Yes it is.). The designers of such autonomous systems already know this and the regulators have less trust in them and do not allow any system that has no human intervention to be on the roads.

The worst case is accounted for in terms of reliability (including failures, near misses, etc) and it completely makes zero sense and it is irresponsible for regulators and professionals to use just one data point of the system working, dismiss the hundreds of failures and then conclude that the AI system is reliable in all cases.