azure
3186 TopicsOptimizing Azure Healthcare Multimodal AI Models for Intel CPU Architecture
Alexander Mehmet Ersoy, Principal Product Manager, Microsoft HLS AI Abhishek Khowala, Principal AI Engineer, Intel Ravi Panchumarthy, AI Framework Engineer, Intel Srinarayan Srikanthan, AI Framework Engineer, Intel Ekaterina Aidova, AI Frameworks Engineer, Intel Alberto Santamaria-Pang, Principal Applied Data Scientist, Microsoft HLS AI and Adjunct Faculty at Johns Hopkins Medicine, Microsoft Peter Lee, Applied Scientist, Microsoft HLS AI and Adjunct Assistant Professor at Vanderbilt University Ivan Tarapov, Sr. Director, Microsoft HLS AI Pradeep Sakhamoori, Sr. SW Engineer, Microsoft The Rise of Multimodal AI in Healthcare The healthcare sector is witnessing a surge in the adoption of multimodal AI models, which are crucial for applications ranging from diagnostics to personalized treatment plans. These models combine data from various sources such as medical images, patient records, and genomic data to provide comprehensive insights. Microsoft’s Azure AI Foundry's Model Catalog of multimodal healthcare foundation models is at the forefront of this change. Models recently launched (such as MedImageInsights, MedImageParse, CXRReportGen [8], and many others) are designed to help healthcare organizations rapidly build and deploy AI solutions tailored to their specific needs, while minimizing the extensive compute and data requirements typically associated with building multimodal models from scratch. Real-World Examples from our industry partners regarding the adoption of multimodal AI models are highlighted in the article “Unlocking next-generation AI capabilities with healthcare AI models”. Challenges and Opportunities in Hardware Optimization As models get more complex, which is the case with the foundation model trend, the demands on the hardware rise. While GPUs remain the platform of choice for minimizing the model execution times, CPUs present substantial optimization possibilities, especially for inference workloads. We believe that providing a framework for efficient CPU-based environments holds a huge potential for many production scenarios where speed can be traded off for cost savings. With multimodal healthcare AI, the complexity of handling different data modalities and ensuring efficient inference requires innovative solutions and collaboration between industry leaders. Companies are increasingly looking towards hardware-specific optimizations to enhance model efficiency and reduce latency while keeping costs at bay. Intel, with its robust suite of AI tools and extensions for frameworks like PyTorch, is pioneering this optimization effort. For instance, the Intel® Distribution of OpenVINO™ toolkit has been instrumental in accelerating the development of computer vision and deep learning applications in healthcare [1]. You can learn about our recent collaboration with Intel on AI optimizations to advance medical innovations in the article "Empower Medical Innovations: Intel Accelerates PadChest & fMRI Models on Microsoft Azure* Machine Learning”. The demand for AI applications in healthcare is rapidly increasing. Multimodal AI models, which can process and analyze complex datasets, are essential for tasks such as early disease detection, treatment planning, and patient monitoring. While optimizing these models to perform efficiently on specific hardware is important, it is not necessarily a barrier to adoption. Models optimized with CUDA for Nvidia GPUs often deliver optimal performance and run faster than on any other hardware. However, the benefit of using CPUs lies in the tradeoff they offer. You can choose to optimize for speed by running your model on a GPU and optimizing for it in PyTorch, or you can optimize for cost by sacrificing speed. This is the proposition here: the option to run the model slower with an accessible CPU, which can be advantageous in scenarios where speed is not the primary concern, but access to GPU hardware is. The Intel® oneAPI Deep Neural Network Library (oneDNN) have proven effective in reducing GPU requirement burden and accelerating time to market for AI solutions [2]. Both Intel® Extension for PyTorch (IPEX) and OpenVINO utilize the Intel® oneDNN to accelerate deep learning operations, taking advantage of underlying hardware features. IPEX optimizes existing PyTorch workflows with minimal code changes. OpenVINO provides cross-platform deep learning optimization for deployment flexibility. In this blog post, a custom deployment was implemented using CXRReportGen along with both IPEX and OpenVINO optimizations, demonstrating how these techniques can support different deployment scenarios and technical requirements. This optimization is accessible through Azure's compute services and Intel's technology. Benchmarking and Performance Acceleration To address these challenges, our new collaboration with Intel focuses on leveraging Intel’s advanced AI tools and hardware capabilities to optimize multimodal AI models for greater healthcare access. By utilizing Intel's Extension for PyTorch and other optimization techniques, we aim to optimize CPUs for best model run time speed. While this may slightly degrade performance, the main benefit is addressing the problem of GPU hardware scarcity. This partnership not only underscores the importance of hardware-specific optimizations but also sets a new standard for AI model deployment in real-world healthcare applications. Both IPEX and OpenVINO are built on a common foundation - Intel® oneDNN which is a high-performance library designed specifically for deep learning applications and optimized for Intel architecture. oneDNN leverages specialized hardware instructions available in Intel processors such as Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Vector Neural Network Instructions (VNNI) and Intel® Advanced Matrix Extensions (Intel® AMX) [3] on Intel CPUs as well as Intel XeMatrix Extensions (XMX) AI engines on Intel discrete GPUs. Figure 1: OneDNN Library IPEX [4] extends PyTorch* with the latest performance optimizations for Intel hardware [5]. It leverages oneDNN under the hood to provide optimized implementations of key operations. This allows developers to stay within their existing PyTorch code with minimal changes - making it an excellent choice for teams already comfortable with the PyTorch ecosystem who want to quickly optimize their models for Intel hardware. import torch ############## import ipex ############### import intel_extension_for_pytorch as ipex model = Model() model.eval() ############## Optimize with IPEX ############### model = ipex.optimize(model, dtype=torch.bfloat16) # Continue with inference as normal Figure 2. Intel Extension for PyTorch The Intel® Distribution of OpenVINO™ toolkit is a powerful solution for optimizing and deploying deep learning models across a wide range of Intel hardware [6]. Like IPEX, it leverages oneDNN under the hood, but takes a different approach - offering cross-platform optimization and flexible deployment options. OpenVINO supports two main workflows: a convenience workflow, where you run models directly with minimal setup, and a performance workflow, recommended for production, where models are first converted offline into the OpenVINO Intermediate Representation (IR). This one-time conversion step enables highly optimized inference and allows the final application to remain lightweight and efficient. Here’s a simple example using OpenVINO for inference with a pre-converted IR model. Refer to OpenVINO Notebooks repo for more samples: import openvino as ov core = ov.Core() ############## Load the OpenVINO IR model ############### compiled_model = core.compile_model("model.xml", "CPU") ############## Run inference ################### infer_request = compiled_model.create_infer_request() results = infer_request.infer({input_tensor_name: input_tensor}) Figure 3: OpenVINO toolkit Overview. IPEX and OpenVINO are supported in all Intel architectures. However, for optimal performance, Intel recommends using instances powered by 4th Gen Intel® Xeon® Scalable processors or newer, which feature AMX and other hardware acceleration capabilities, such as Azure’s v6-series (e.g., Standard_E48s_v6) [7]. Results We conducted a detailed performance benchmark by using CXRReportGen, a state-of-the-art foundation model designed to generate a list of radiological findings from chest X-rays, over Standard_E48s_v6 hardware (48 vCPUs, 248 GiB RAM) with and without IPEX and OpenVINO optimization. We realized up to 70% improvement in CXRReportGen foundation model run time when applying optimizations with IPEX and similarly substantial gains using OpenVINO, compared to the non-optimized baseline on the same CPU hardware. This significant improvement highlights the potential of leveraging Intel's performance optimizations to make critical healthcare AI models more cost-efficient and accessible. Such advancements enable healthcare providers to deploy advanced diagnostic tools even in resource-constrained environments, ultimately improving patient care and operational efficiency. SKU Run Type (100 Runs) Mean Run Time (seconds) Standard Deviation of Run Time (seconds) Standard_E48s_v6 (48 vCPUs, 348 GiB RAM) No Optimization 22.47 0.1061 Standard_E48s_v6 (48 vCPUs, 348 GiB RAM) IPEX 8.21 0.2375 Standard_E48s_v6 (48 vCPUs, 348 GiB RAM) OpenVINO 7.01 0.0569 Table 1: Performance Comparison of CXRReportGen Model Across 100 Runs with CPU. Future Prospects and Innovations Our benchmarks with Intel optimizations with both IPEX and OpenVINO show great potential on decreasing the model run time of our foundation models and increasing scalability via CPU. This optimization positions Intel CPUs as a viable deployment. This not only increases deployment options but also offers opportunities to reduce cloud costs with CPU-based instances and even consider deploying these workflows on existing compute headroom at the edge. For custom deployments, the setup described in this blog post is now available on the provided compute instances in Azure and with optimization software from Intel. So that developers can optimize inference workloads while taking advantage of large memory pools available via CPU and use towards handling large batch workloads. Our advancements with Intel in model runtime optimizations are considered to be available in the Azure AI model catalogs. Please stay tuned for further updates. As we continue to innovate and optimize, the potential for AI to transform healthcare and improve patient outcomes becomes increasingly attainable. We are now more equipped than ever to making it easier for our partners and customers to create connected experiences at every point of care, empower their healthcare workforce, and unlock the value from their data using data standards that are important to the healthcare industry. References [1] Intel OpenVINO Optimizes Deep Learning Performance for Healthcare Imaging [2] Accelerating Healthcare Diagnostics with Intel oneAPI and AI Tools [3] Intel Advanced Matrix Extensions [4] Intel Extension for Pytorch [5] Accelerate with Intel Extension to PyTorch [6] Intel Accelerates PadChest and fMRI Models on Azure ML [7] Azure’s first 5th Gen Intel® Xeon® processor instances are now available and we're excited! [8] CxrReportGen Model Card in Azure AI Foundry The healthcare AI models in Azure AI Foundry are intended for research and model development exploration. The models are not designed or intended to be deployed in clinical settings as-is nor for use in the diagnosis or treatment of any health or medical condition, and the individual models’ performances for such purposes have not been established. You bear sole responsibility and liability for any use of the healthcare AI models, including verification of outputs and incorporation into any product or service intended for a medical purpose or to inform clinical decision-making, compliance with applicable healthcare laws and regulations, and obtaining any necessary clearances or approvals.Understanding Compliance Between Commercial, Government, DoD & Secret Offerings - July 2025 Update
Understanding compliance between Commercial, Government, DoD & Secret Offerings: There remains much confusion as to what service supports what standards best. If you have CMMC, DFARS, ITAR, FedRAMP, CJIS, IRS and other regulatory requirements and you are trying to understand what service is the best fit for your organization then you should read this article.47KViews5likes7CommentsAccelerate cloud migrations with confidence - Join the Dr. Migrate Webinar
Join us for a special Teams Live Event: Solution and Services Assessments with Dr. Migrate Eligible Microsoft Partners can now request access to Dr Migrate licenses to run customer environment assessments with customers across segments. Dr Migrate is an AI-Assisted Cloud Migration platform that helps customers optimize their end-to-end migration and modernization planning— from secure discovery to executive-ready business case and wave plans migration and modernization roadmaps. With the new Dr Migrate Collector (DMC), Dr Migrate delivers a comprehensive environment assessment without requiring intrusive agents or compromising customer data security. 🎓 Join this session to learn more about Azure Accelerate and how tools like Azure Migrate and Dr Migrate can help you have actionable data and deep solution insights from a customer's IT environment. In this session you’ll learn how to get your teams trained and certified, and how to request a Dr Migrate license, an alternative solution available to partners eligible for support through Azure Accelerate. Tues, Aug 5, 2025 - 9:00am-10:00am MDT Tues, Aug 5, 2025 - 6:00pm-7:00pm MDT *Dr Migrate is available to Azure Expert MSP, Infrastructure & Database, Kubernetes, Migrate to Enterprise Applications, SAP, Analytics, DW Migration, Build AI Apps, Accelerate Developer Productivity, and AI Platform specialized partners.72Views0likes0CommentsCampusSphere: Building the Future of Campus AI with Microsoft's Agentic Framework
Project Overview We are a team of Imperial College Students committed to improving campus life through innovative multi-agent solutions. CampusSphere leverages Microsoft Azure AI capabilities to automate core university campus services. We created an end-to-end solution that allows both students and staff to access a multi-agent framework for room/gym booking, attendance tracking, calendar management, IoT monitoring and more. 🔭 Our Initial Vision: Reimagining Campus Technology When our team at Imperial College London embarked on the CampusSphere project as part of Microsoft's Agentic Campus initiative, we had one clear ambition: to create an intelligent campus ecosystem that would fundamentally change how students, faculty, and staff interact with university services. The inspiration came from a simple observation—despite living in an age of advanced AI, campus technology remained frustratingly fragmented. Students juggled multiple portals for course registration, room booking, dining services, and academic support. Faculty members navigated separate systems for teaching, research, and administrative tasks. The result? Countless hours wasted on mundane navigation tasks that could be better spent on learning, teaching, and innovation. Our vision was ambitious: create a single, intelligent interface that could understand natural language, anticipate user needs, and seamlessly integrate with existing campus infrastructure. We didn't just want to build another campus app—we wanted to demonstrate how Microsoft's agentic AI technologies could create a truly intelligent campus companion. 🧠 Enter CampusSphere CampusSphere is an intelligent campus assistant made up of multiple AI agents, each with a specific domain of expertise — all communicating seamlessly through a centralized architecture. Think of it as a digital concierge for campus life, where your calendar, attendance, IoT data, and facility bookings are coordinated by specialized GPT-powered agents. Here’s what we built: TriageAgent – the brain of the system, using Retrieval-Augmented Generation (RAG) to understand user intent CalendarAgent – handles scheduling, bookings, and reminders AttendanceAgent – tracks check-ins automatically IoTAgent – monitors real-time sensor data from classrooms and labs GymAgent – manages access and reservations for sports facilities 30+ MCP Tools – perform SQL queries, scrape web data, and connect with external APIs All of this is built on Microsoft Azure AI, Semantic Kernel, and Model Context Protocol (MCP) — making it scalable, secure, and lightning fast. 🖥️ The Tech Stack Our Azure-powered architecture showcases a modular and scalable approach to real-time data processing and intelligent agent coordination. The frontend is built using React with a Vite development server, providing a fast and responsive user interface. When users submit a prompt, it travels to a Flask backend server acting as the Triage agent, which intelligently delegates tasks to a FastAPI agent service. This FastAPI service asynchronously communicates with individual agents and handles responses efficiently. Complex queries are routed to MCP Tools, which interact with the CosmosDB-powered Campus Database. Simultaneously, real-time synthetic IoT data is pushed into the database via Azure Function Apps and Azure IoT Hub. Authentication is securely managed: users log in through the frontend, receive a token from the database API server, and use it for authorized access to MCP services, with permissions enforced based on user roles using our custom MCP server implementation. This robust architecture enables seamless integration, real-time data flow, and secure multi-agent collaboration across Azure services. Our system leverages a multi-agent architecture designed to intelligently coordinate task execution across specialized services. At the core is the TriageAgent, which uses Retrieval-Augmented Generation (RAG) to interpret user prompts, enrich them with relevant context, and determine the optimal response path. Based on the nature of the request, it may handle the response directly, seek clarification, or delegate tasks to specific agents via FastAPI. Each specialized agent has a clearly defined role: AttendanceAgent: Interfaces with CosmosDB-backed FastAPI endpoints to check student attendance, using filters like event name, student ID, or date. IoTAgent: Monitors room conditions (e.g., temperature, CO₂ levels) and flags anomalies using real-time data from Azure IoT Hub, processed via FastAPI. CalendarAgent: Handles scheduling, availability checks, and event creation by querying or updating CosmosDB through FastAPI. Future integration with Microsoft Graph API is planned for direct calendar syncing. Gym Slot Agent: Checks available times for gym sessions using dedicated MCP tools. The triage agent serves as the orchestrator, breaking down complex requests (like "Book a gym session") into subtasks. It consults relevant agents (e.g., calendar and gym slot agents), merges results, and then confirms the final action with the user. This distributed and asynchronous workflow reduces backend load and enhances both responsiveness and reliability of the system. 🔮 What’s Next? Integrating CampusSphere with live systems via Microsoft OAuth is crucial for enhancing its capabilities. This integration will grant the agent authenticated access to a wider range of student data, moving beyond synthetic datasets. This expanded access to real-world information will enable deeply personalized advice, such as tailored course selection, scholarship recommendations, event suggestions, and deadline reminders, transforming CampusSphere into a sophisticated, proactive personal assistant. 🤝Meet the Team Behind CampusSphere Our success stemmed from a diverse team of innovators who brought together expertise from multiple domains: Benny Liu - https://www.linkedin.com/in/zong-benny-liu-393a4621b/ Lucas Ng - https://www.linkedin.com/in/lucas-ng-11b317203/ Lu Ju - https://www.linkedin.com/in/lu-ju/ Bruno Duaso - https://www.linkedin.com/in/bruno-duaso-jimeno-744464262/ Martim Coutinho - https://www.linkedin.com/in/martim-pereira-coutinho-116308233/ Krischad Pourpongpan - https://www.linkedin.com/in/krischadpua/ Yixu Pan - https://www.linkedin.com/in/yixu-pan/ Our collaborative approach enabled us to create a sophisticated agentic AI system that demonstrates the powerful potential of Microsoft's AI technologies in educational environments. 🧑💻 Project Repository: GitHub - Imperial-Microsoft-Agentic-Campus/CampusSphere Contribute to Imperial-Microsoft-Agentic-Campus/CampusSphere development by creating an account on GitHub. github.com Have questions about implementing similar solutions at your institution? Connect with our team members on LinkedIn—we're always excited to share knowledge and collaborate on innovative campus technology projects. 📚Get Started with Microsoft's AI Tools Ready to explore the technologies that made CampusSphere possible? Here are essential resources: Microsoft Semantic Kernel: The core framework for building AI agent orchestration systems. Learn how to create, coordinate, and manage multiple AI agents working together seamlessly. AI Agents for Beginners: A comprehensive guide to understanding and building AI agents from the ground up. Perfect for getting started with agentic AI development. Model Context Protocol (MCP): Learn about the protocol that enables secure connections between AI models and external tools and services—essential for building integrated AI systems. Windows AI Toolkit: Microsoft's toolkit for developing AI applications on Windows, providing local AI model development capabilities and deployment tools. Azure Container Apps: Understand how to deploy and scale containerized AI applications in the cloud, perfect for hosting multi-agent systems. Azure Cosmos DB Security: Essential security practices for managing data in AI applications, covering encryption, access control, and compliance.135Views2likes0CommentsAugust Calendar is here!
🌟 Community Spirit? CHECKED! 🌍 Amazing Members & Audiences? DOUBLE CHECK! 🎤 Phenomenal Speakers Locked In? CHECKED! 🚀 Global Live Sessions? YOU BET! The stage is set. The excitement is real. It’s that time again, time to ignite the community with another monthly calendar! 🔥✨ We’ve lined up a powerhouse of sessions packed with world-class content, covering the best of Microsoft, from Coding, Cloud, Migration, Data, Security, AI, and so much more! 💻☁️🔐🤖 But wait, that’s not all! For the first time ever, we’ve smashed through time zones! No matter where you are in the world, you can tune in LIVE and learn from extraordinary speakers sharing their insights, experiences, and passion. 🌏⏰ What do you need to do? It’s easy: 👉 Register for the sessions 👉 Mark your calendar 👉 Grab your coffee, tea, or ice-cold soda 👉 Join us and soak up the knowledge! We believe in what makes this community truly special, and that’s YOU. Let’s set August on fire together! 🔥 Are you ready to be inspired, to grow, and to connect with Microsoft Learn family? Don’t miss out, August is YOUR month! 💥🙌 📢 Shehan Perera 📖 Let There Be Cloud-Native Endpoints 📅 5 Aug 2025 (19:00 AEST) (11:00 CEST) 📢Shahab Ganji 📖 Create a Tic Tac Toe game and learn about Event Sourcing 📅 8 Aug 2025 18:00 CEST 📢 Ronak Vachhani 📖Data Intelligence at Your Fingertips: Fabric’s AI Functions & Data Agents 📅16 Aug 2025 (16:00 AEST) (08:00 CEST) 📢Laïla Bougriâ 📖Change is inevitable: versioning event-driven systems 📅22 Aug 2025 18:00 CEST 📢AJ Bajada 📖Revolutionising DevOps with AI: From Pipelines to Deployment 📅28 Aug 2025 (19:30 AEST) (11:30 CEST) 📢James Eastham 📖So, You Want To Build An Event Driven System? 📅29 Aug 2025 17:00 CEST58Views0likes0CommentsCurrent Issue with Private Plans on Marketplace Subscriptions
We typically create private plans for customers on our transactable public offer in the Marketplace, tailored to specific commercial terms and custom features. However, over the past few weeks, we've observed that newly created private plans are not being reflected in the subscriptions managed by Cloud Solution Providers (CSPs). To address this, we're exploring alternative approaches to ensure plan visibility and alignment with CSP-managed subscriptions.Solved46Views0likes2CommentsIntroducing Azure AI Models: The Practical, Hands-On Course for Real Azure AI Skills
Hello everyone, Today, I’m excited to share something close to my heart. After watching so many developers, including myself—get lost in a maze of scattered docs and endless tutorials, I knew there had to be a better way to learn Azure AI. So, I decided to build a guide from scratch, with a goal to break things down step by step—making it easy for beginners to get started with Azure, My aim was to remove the guesswork and create a resource where anyone could jump in, follow along, and actually see results without feeling overwhelmed. Introducing Azure AI Models Guide. This is a brand new, solo-built, open-source repo aimed at making Azure AI accessible for everyone—whether you’re just getting started or want to build real, production-ready apps using Microsoft’s latest AI tools. The idea is simple: bring all the essentials into one place. You’ll find clear lessons, hands-on projects, and sample code in Python, JavaScript, C#, and REST—all structured so you can learn step by step, at your own pace. I wanted this to be the resource I wish I’d had when I started: straightforward, practical, and friendly to beginners and pros alike. It’s early days for the project, but I’m excited to see it grow. If you’re curious.. Check out the repo at https://github.com/DrHazemAli/Azure-AI-Models Your feedback—and maybe even your contributions—will help shape where it goes next!274Views1like4Comments