TensorRT-LLM brings local AI computing to NVIDIA GPUs


What you need to know

  • TensorRT-LLM is adding OpenAI’s Chat API support for desktops and laptops with RTX GPUs starting at 8GB of VRAM.
  • Users can process LLM queries faster and locally without uploading datasets to the cloud.
  • NVIDIA pairs this with “Retrieval-Augmented Generation” (RAG), allowing more bespoke LLM use cases.

During Microsoft’s Ignite conference today, NVIDIA announced an update to their TensorRT-LLM, which launched in October. The main announcements today are that the TensorRT-LLM feature is now gaining support for LLM APIs, specifically OpenAI Chat API, which is the most well-known at this point, and also that they have worked to improve performance with TensorRT-LLM to get better performance per token on their GPUs. 

There is a tertiary announcement that is quite interesting also. NVIDIA is going to include Retrieval-Augmented Generation with the TensorRT-LLM. This allows an LLM to use an external data source for its knowledge base rather than relying on anything online—a highly demanded feature for AI.

What is TensorRT-LLM?

READ MORE FROM IGNITE 2023





Source link

Previous article3 Crypto Stocks To Buy Before The 2024 Bitcoin Halving – Barchart
Next articleThis Bosch electric drill deal is a bargain for DIY enthusiasts