Research Technology Diagnostics & Imaging Thyroid Disease Management Research and Evidence

AI accuracy in thyroid ultrasound accelerates

March 25, 2026 By Doug Brunk 4 min read
Share Share via Email Share on Facebook Share on LinkedIn Share on Twitter

An artificial intelligence (AI)-assisted thyroid ultrasound system improved its agreement with an experienced endocrine surgeon from 72.7% to 92.6% overall—and to 97% in the most recent patients—after repeated software updates, according to a retrospective study published in Springer Nature.

For the study, corresponding author Thomas J. Musholt, MD, of the section of endocrine surgery in the department of general, visceral and transplantation surgery at University Medical Center Mainz, Germany, and co-investigators evaluated the learning curve of the PIUR tUS Infinity 3D ultrasound software in a real-world endocrine surgery clinic. From March 2023 through October 2024, 243 patients were assessed in 2 phases.

First Phase

In the first phase (March 2023–June 2024), 110 patients with 176 thyroid nodules underwent AI-assisted ultrasound performed by surgeons-in-training. Each examination was then repeated independently by an experienced endocrine surgeon, whose ACR Thyroid Imaging Reporting & Data System (TI-RADS) score served as the reference standard. That assessment was revised only if postoperative histopathology showed it to be incorrect.

In this initial cohort, the AI system assigned the same TI-RADS category as the expert in 128 of 176 nodules (72.7%; P = .0001; Cohen’s kappa = 0.6203). Small differences of 1 to 3 points occurred in 36 nodules (20.5%). Larger, clinically relevant differences (>3 points) were seen in 12 nodules (6.8%).

Early errors most often involved nodules with microcalcifications or echogenic foci (36/176 [20%]). The software also had difficulty in patients with autoimmune thyroiditis, where inflamed, inhomogeneous tissue made it harder to distinguish true nodules from background changes.

After several rounds of software updates that incorporated validated clinical data, the investigators repeated the analysis. In the second phase, 133 patients with 227 nodules were evaluated. Overall agreement with the expert rose to 210 of 227 nodules (92.6%). Small differences were seen in 11 nodules (4.8%), and clinically relevant differences in 6 (2.6%). Agreement improved considerably (Cohen’s kappa = 0.8955; P = .087).

Performance was highest in the final 23 patients (33 nodules), in whom the AI system agreed with the expert in 32 of 33 nodules (97%), with only 1 relevant discrepancy (3%). Across sequential patient groups, concordance increased from 63%–73% in earlier cohorts to 97% in the final group.

Updated software reanalysis

When the original 110 patients were reanalyzed using the updated software, the system identified 194 nodules compared with 176 initially. The improvement reflected better detection and separation of nodules within conglomerates. No nodule was scored less accurately with the updated version.

Histopathologic results were available for 57 patients who underwent surgery. Findings included 19 papillary thyroid carcinomas, 2 medullary thyroid carcinomas, 1 poorly differentiated or anaplastic carcinoma, and 35 benign lesions. Three papillary microcarcinomas were found among nodules initially classified as TI-RADS 1–3. TI-RADS 4 and 5 nodules included both malignant tumors and benign conditions such as toxic nodules and Hashimoto thyroiditis.

Remaining major discrepancies after software updates (6 nodules) were mainly in cases of thyroiditis, hemorrhagic cyst, or intrathyroidal thymus.

The researchers acknowledged certain limitations of the study, including its retrospective design, use of a single expert as the reference standard for most nodules, and lack of histopathology for nonsurgical cases. They also noted that incorporating local corrected data into software updates may have favored improved performance at their institution.

“AI-assisted ultrasound systems are already a solid tool in the diagnosis of true thyroid nodules in non-inflamed tissue,” they wrote. Although the software “cannot yet replace an experienced clinician or endocrine surgeon in complex or unusual cases,” they concluded that its rapid improvement with iterative training is “indeed encouraging.”

The researchers disclosed that the PIUR tUS Infinity system is on loan to University Medical Center Mainz for research and teaching purposes.

AACE Endocrine AI is published by Conexiant under a license arrangement with the American Association of Clinical Endocrinology, Inc. (AACE®). The ideas and opinions expressed in AACE Endocrine AI do not necessarily reflect those of Conexiant or AACE. For more information, see Policies.

Related Content