Aravind Sundaresan — Infrastructure Engineer
7+ years building distributed systems and platform infrastructure. Microsoft R&D India (Software Engineer II, 2021–Present). Ex-Amazon (Software Engineer — Test & Device Infrastructure, 2017–2023). OMG Labs (Technical Lead, 2016–2017). Based in Hyderabad, open to relocation.
Work Experience
Microsoft R&D India — Software Engineer II (2021–Present): Distributed metadata platform across 17,000+ microservices. Redis adaptive concurrency control. Azure BCDR active-passive. ARM64 Windows Validation Pipeline. RPA automation reducing UI defects ~40%.
Amazon — Software Engineer, Test & Device Infrastructure (2017–2023): Java scheduler for 50+ Alexa physical devices 24/7. ADB log diagnostic tooling adopted by 7+ teams. 98%+ platform uptime. 1,000+ QA hours saved per year.
OMG Labs — Technical Lead (2016–2017): Subscription-based customer acquisition models. Managed e-commerce tools and end-to-end strategy. Amazon Launchpad startup selection.
Open-Source Projects
ServiceScope: AI-native blast-radius analysis for Python microservices using AST walking and local LLM inference. Python, FastAPI, Celery, PostgreSQL, Neo4j, Ollama, Redis, Docker.
ACO — Adaptive Compute Orchestrator: Predictive job scheduler with LSTM per-node predictors and ant-colony optimisation. Python, FastAPI, PyTorch, Asyncio. 202 tests passing.
Clairvoyant — Predictive SJF Scheduler for LLM Inference
Eliminates Head-of-Line Blocking in serial LLM backends via ML-driven Shortest-Job-First scheduling.
Serial inference backends (Ollama, llama.cpp) dispatch requests FCFS — short requests queue behind long ones and P50 latency collapses under burst load. Clairvoyant is a Go HTTP sidecar proxy that predicts output token length in 0.029ms using an ONNX-exported XGBoost model, reorders requests via a min-heap priority queue with starvation protection, and reduces short-request P50 latency by 70–76% on RTX 4090 and 68.1% on Apple M1. Validated across 7 public LLM datasets with 62–96% ranking accuracy.
- Metrics: 0.029ms prediction latency | 70–76% P50 reduction (RTX 4090) | 68.1% P50 reduction (Apple M1) | 62–96% ranking accuracy
- Stack: Go, Python, XGBoost, ONNX Runtime, vLLM, OpenAI-compatible, Kubernetes
- Open Source: Open contributor to vLLM v1 core scheduler — PR #41952 (preemption ordering) · PR #44773 (per-request preemption histogram)
Links:
arXiv:2606.07248 |
GitHub |
HuggingFace |
Edge Pipeline |
Blog
Core Skills
Distributed Systems · Event-Driven Architecture · Concurrency Control · Traffic Shaping · Observability · CI/CD Platforms · Device Orchestration · Platform Reliability · LLM Inference Infrastructure · C# · Java · Python · Go · Azure (Service Bus, Cosmos DB, Functions, ADX) · AWS (EC2, Lambda) · Redis · PostgreSQL · Neo4j · FastAPI · Celery · Docker · Alembic
Contact
Email: aravindsundaresan099@gmail.com · LinkedIn: linkedin.com/in/aravind-sundaresan · GitHub: github.com/Aravind0403 · Substack: aravindsundaresan.substack.com