Dec 17, 2025

Remotely configure a fully local RAG system on an Nvidia DGX Shark and Mac Studio

Job / Advertisement Description

We need to configure a fully local Retrieval-Augmented Generation (RAG) system on an Nvidia DGX Shark and Mac Studio, without using cloud services. This includes deploying local LLM models, such as Mistral or Llama, on GPU/ARM platforms, as well as integrating with internal data, such as PDF documents and knowledge bases, and external APIs. The system must support local vector storage and provide chat functionality with the ability to retrain on user data. Your task is to select and configure the appropriate technology stack, such as Ollama/LM Studio or TensorRTLLM with LangChain/LlamaIndex, and ensure the system runs smoothly on Linux and macOS. This assignment will require skills in working with local LLM models, deploying them to GPU and ARM platforms, experience with vector databases and tools such as LangChain or LlamaIndex, as well as DevOps/MLOps skills for maintaining the system on Linux and macOS.