Artificial intelligence is no more the talk of the future. It’s a tool of the present. Not only are patients utilizing artificial intelligence, but physicians are also utilizing artificial intelligence, various different resources to make clinical decision making. So that was the underlying need for testing the existing LLM models, as well as the tools which can be utilized for clinical decision making...
Artificial intelligence is no more the talk of the future. It’s a tool of the present. Not only are patients utilizing artificial intelligence, but physicians are also utilizing artificial intelligence, various different resources to make clinical decision making. So that was the underlying need for testing the existing LLM models, as well as the tools which can be utilized for clinical decision making. So what we did was we selected these four LLM models, including ChatGPT 5.2, Gemini, Claude, Grok, and then ASCO AI tool, as well as Open Evidence. ASCO AI, we are right now in ASCO 2026. ASCO AI has a separate and a new AI-based tool where you can put in patient-based information without any PHI, and it will give you exactly guideline-based approach. Similar is Open Evidence as well, where you can put in a question and it will give you guideline or literature-based answers what you should be doing next. So what we found and what we wanted to do with this, we wanted to see whether they’re accurate or not. So we did concordance with thoracic experts. Those thoracic experts were from University of Miami in Sylvester, MD Anderson, and University of Alabama. We did six case scenarios for first-line setting, six case scenarios for second-line setting, and then selected four options for each of them. We fit in a prompt to all the six tools, four LLM and two tools, about if this question was asked to 100 United States-based oncologists, so that the question for access and everything didn’t mean the same. If it was given to 100 U.S.-based oncologists, how often, what would be the answer? And then we asked the US thoracic experts as well to see what was the concordance. We looked into concordance via TAO, and we also looked into divergence as more of statistical analysis. And interestingly, what we found was that for frontline setting, for stage four non-small cell lung cancer, EGFR positive, frontline setting, concordance for ASCO AI and ChatGPT was 80%, which is really good. I know we would always hope for being at 100%, but still 80% is not bad to begin with. ASCO AI tool has been around only for a few months or a few months plus. Similarly, divergence internally amongst the LLM models was also very low in the first line setting. However, when we go into second line setting, it kind of fell apart. And the reason why I say is that the highest concordance that we saw was in Gemini, which was around 0.49, which is like 50%. That’s like a flip of a coin. It could be positive or it could be true. It could not be true. And in second line setting, there’s also lots of nuances. And the reason is the data is constantly updating. It depends a lot on the resistance mechanism. It depends a lot on what liquid biopsy was done, what information and what treatment was given before as well. So I think so for second-line AI models still need to get better. But in first-line setting, it’s really helpful. And I guess the main conclusion that we would like to take from this is that AI is here right now. We can utilize it as a tool that can help us in clinical decision making. We cannot 100% rely on it, but it’s for sure way better than the currently existing literature review. And we can start implementing in a cautious way to help it with our own clinical knowledge to make the best precise precision and precision medicine oriented clinical decision making for lung cancer patients.
This transcript is AI-generated. While we strive for accuracy, please verify this copy with the video.