Project Overview:
We are looking for an experienced Machine Learning Engineer specialized in audio processing and deep learning. The goal is to design, train, and deploy a high-performance AMD (Answering Machine Detection) model for telephony, using an existing dataset of approximately 67,000 labeled audio samples. The model must operate in real-time with low latency, and integrate into our existing calling infrastructure (Drachtio / Asterisk / FreeSWITCH / Vicidial).
Mission Responsibilities:
Analyze and preprocess the existing dataset (cleaning, balancing, train/val/test split)
Extract audio features such as Mel-spectrograms, MFCC, STFT, normalization
Design and train a CNN/CRNN model for AMD classification (Human / Voicemail / Silence / Fax / Other if needed)
Optimize the model for real-time inference
Build a reproducible training pipeline (PyTorch or TensorFlow)
Evaluate the model with solid metrics (per-class F1 score, confusion matrix, recall/precision)
Test and validate performance using real call samples and streaming input
Develop an inference microservice (REST/gRPC/WebSocket) for integration with backend systems
Integrate the solution into our current telephony infrastructure (Drachtio / Asterisk / FreeSWITCH / Vicidial)
Provide technical documentation and handover
Optional: Implement a continuous training pipeline for incremental model improvement
Required Skills:
Strong experience in deep learning applied to audio
Knowledge of CNN/CRNN architectures for classification
Solid background in speech/audio signal processing
Expertise in PyTorch or TensorFlow
Experience generating audio features (Mel Spectrogram, MFCC, STFT)
Ability to optimize models for real-time inference on GPU/CPU
Experience deploying ML models in production (Docker, APIs, inference servers)
Experience with streaming audio or RTP real-time processing
English or French communication skills
Preferred (not mandatory but strong advantage):
Previous AMD or call-center related project experience
Knowledge of Asterisk, FreeSWITCH, Drachtio, WebRTC
Experience with Whisper/Vosk/Wav2Vec or speech models
Experience with large-scale deployment and optimization
Deliverables:
Fully documented and clean source code
Trained AMD model with evaluation report
Production-ready inference service (API)
Performance metrics and confusion matrix
Documentation for deployment and usage