Tether Launches Local Medical AI for Smartphones and Wearables
Tether’s AI research group has released QVAC MedPsy, a new medical language model series that runs directly on smartphones, wearables, and other devices with limited processing power. The models achieve performance that rivals or exceeds much larger systems, all while keeping data local and private.
Most current medical AI relies on large models hosted on remote servers, which requires sending sensitive patient data through the cloud. This approach creates privacy risks and compliance headaches, especially as the healthcare AI market grows from $36 billion today toward $500 billion by 2033.
QVAC MedPsy challenges the assumption that better AI requires bigger models. A 1.7 billion parameter version scored 62.62 across seven medical benchmarks, outperforming Google’s MedGemma-1.5-4B-it by 11.42 points despite being less than half the size. On real-world clinical tests like HealthBench Hard, the 1.7B model even surpassed MedGemma 27B, a model nearly sixteen times larger. The 4 billion parameter version scored 70.54, beating models almost seven times its size, including MedGemma-27B-text.
The performance gains come from a new post-training process that combines broad medical supervision, high-value clinical reasoning data, and reinforcement learning focused on harder medical cases. The models also cut inference costs significantly. The 4B model generates responses in about 909 tokens versus 2,953 for comparable systems, a 3.2x reduction. The smaller 1.7B model uses around 1,110 tokens versus 1,901, a 1.7x reduction.
Local Deployment and Privacy
The models are released in quantized GGUF formats designed for local deployment. The Q4_K_M versions are around 1.2 GB for the 1.7B model and 2.6 GB for the 4B model. In testing, these compressed versions retained most benchmark performance while being practical for mobile and edge environments.
This shift could change where medical AI is used. Systems that previously required external processing can now support clinicians in hospitals, on mobile devices, or anywhere connectivity, latency, or privacy constraints make cloud-based models impractical. It removes one of the main barriers to healthcare AI adoption: the need to move sensitive data outside controlled environments.
Tether CEO Paolo Ardoino said the focus was on improving efficiency at the model level rather than scaling up size. The 1.7B model outperformed larger systems like MedGemma-4B, and the 4B model exceeded results from models nearly seven times its size while using up to three times fewer tokens per response. This combination reduces compute requirements, latency, and cost, allowing the model to run on standard hardware without remote infrastructure.
Redefining AI Infrastructure
For the past decade, AI progress has been tied to cloud-based compute. QVAC MedPsy points toward efficiency, locality, and privacy defining performance. If these gains hold in real-world deployments, they could reshape medical AI infrastructure, shifting the advantage toward local systems with lower cost, lower latency, and greater control over sensitive data.
More details are available at https://qvac.tether.io/models/









