Audio ML Engineer – Real-Time AMD (Answering Machine Detection) Model Development

Vollna Client
Remote

Job / Advertisement Description

Project Overview: We are looking for an experienced Machine Learning Engineer specialized in audio processing and deep learning. The goal is to design, train, and deploy a high-performance AMD (Answering Machine Detection) model for telephony, using an existing dataset of approximately 67,000 labeled audio samples. The model must operate in real-time with low latency, and integrate into our existing calling infrastructure (Drachtio / Asterisk / FreeSWITCH / Vicidial). Mission Responsibilities: Analyze and preprocess the existing dataset (cleaning, balancing, train/val/test split) Extract audio features such as Mel-spectrograms, MFCC, STFT, normalization Design and train a CNN/CRNN model for AMD classification (Human / Voicemail / Silence / Fax / Other if needed) Optimize the model for real-time inference Build a reproducible training pipeline (PyTorch or TensorFlow) Evaluate the model with solid metrics (per-class F1 score, confusion matrix, recall/precision) Test and validate performance using real call samples and streaming input Develop an inference microservice (REST/gRPC/WebSocket) for integration with backend systems Integrate the solution into our current telephony infrastructure (Drachtio / Asterisk / FreeSWITCH / Vicidial) Provide technical documentation and handover Optional: Implement a continuous training pipeline for incremental model improvement Required Skills: Strong experience in deep learning applied to audio Knowledge of CNN/CRNN architectures for classification Solid background in speech/audio signal processing Expertise in PyTorch or TensorFlow Experience generating audio features (Mel Spectrogram, MFCC, STFT) Ability to optimize models for real-time inference on GPU/CPU Experience deploying ML models in production (Docker, APIs, inference servers) Experience with streaming audio or RTP real-time processing English or French communication skills Preferred (not mandatory but strong advantage): Previous AMD or call-center related project experience Knowledge of Asterisk, FreeSWITCH, Drachtio, WebRTC Experience with Whisper/Vosk/Wav2Vec or speech models Experience with large-scale deployment and optimization Deliverables: Fully documented and clean source code Trained AMD model with evaluation report Production-ready inference service (API) Performance metrics and confusion matrix Documentation for deployment and usage