Appendix D: Career Development
This appendix combines two essential resources for your AI engineering career: Learning Paths to guide your study through the textbook based on your background and goals, and Interview Preparation to help you succeed in AI/ML engineering interviews at all levels.
Part I: Learning Paths
This section provides guided reading orders through the textbook based on your background and goals. Each path includes the recommended sequence, time estimates, and key focus areas.
How to Use These Paths
- Choose your path based on current role and goal
- Follow the sequence - chapters build on each other
- Do the exercises - reading alone isn’t enough
- Build something - apply knowledge to a real project
- Revisit as needed - use as reference later
Path 1: New to AI/ML (SWE Background)
For: Software engineers with no ML experience Time: 4-6 weeks (10-15 hours/week) Goal: Foundational understanding to contribute to AI projects
Week 1: Understanding LLMs
┌─────────────────────────────────────────────────────────────────────────────┐
│ WEEK 1: FOUNDATIONS │
└─────────────────────────────────────────────────────────────────────────────┘
Day 1-2: Chapter 5 - LLM/NLP Foundations
Focus: Tokenization, embeddings, attention mechanism
Skip: Mathematical details (return later if needed)
Day 3-4: Chapter 6 - Prompt Engineering
Focus: Structured outputs, few-shot learning
Exercise: Try 10 different prompting patterns on a real task
Day 5: Appendix A - Glossary
Review all terms, note unfamiliar ones
Key Concepts to Master:
- How tokenization works
- What embeddings represent
- Why attention matters
- Basic prompt patterns
Week 2: Building RAG Systems
Day 1-3: Chapter 7 - RAG Systems Deep Dive
Focus: Chunking, embeddings, basic retrieval
Skip: GraphRAG, advanced patterns (come back in Week 5)
Day 4-5: Hands-on Practice
Build a simple RAG system with:
- 100 documents
- Basic chunking
- Vector search
- Simple prompt template
Key Concepts to Master:
- Chunking strategies
- Vector similarity search
- Context assembly
Week 3: Deployment Basics
Day 1-2: Chapter 9 - LLM Deployment
Focus: API integration, basic serving concepts
Skip: GPU optimization (defer to later)
Day 3-4: Chapter 14 - Backend Engineering for AI
Focus: Testing, debugging LLM apps
Exercise: Write tests for your Week 2 RAG system
Day 5: Review and consolidate
Complete practical exercises
Week 4: Evaluation and Quality
Day 1-3: Chapter 15 - MLOps & Evaluation
Focus: Basic metrics, simple evaluation pipelines
Skip: Complex A/B testing (return when needed)
Day 4-5: Build an evaluation pipeline for your RAG system
Week 5: Expanding Knowledge
Day 1-2: Chapter 8 - Agentic Systems
Focus: Tool use, basic agent patterns
Day 3-4: Chapter 16 - Security
Focus: Prompt injection basics
Day 5: Part II of this appendix (Interview Prep - AI Engineer II section)
Week 6: Integration Project
Build a complete application using what you’ve learned:
- Document Q&A system with evaluation
- Include basic security measures
- Write comprehensive tests
Path 1 Completion Checklist
□ Can explain how an LLM generates text
□ Can write effective prompts for different tasks
□ Can build a basic RAG system
□ Can evaluate RAG quality
□ Understand basic security concerns
□ Built at least one working application
Path 2: Backend Engineer Adding AI
For: Experienced backend engineers integrating AI into existing systems Time: 3-4 weeks (15-20 hours/week) Goal: Productionize AI features in backend services
Week 1: Practical LLM Integration
┌─────────────────────────────────────────────────────────────────────────────┐
│ WEEK 1: INTEGRATION PATTERNS │
└─────────────────────────────────────────────────────────────────────────────┘
Day 1: Chapter 6 - Prompt Engineering
Focus: Structured outputs, caching strategies
Day 2-3: Chapter 9 - LLM Deployment
Focus: API integration, rate limiting, error handling
Exercise: Build a robust LLM client with retries
Day 4-5: Chapter 14 - Backend Engineering for AI
Focus: Integration patterns, testing strategies
Key Patterns to Implement:
- Retry with exponential backoff
- Response caching
- Structured output parsing
- Graceful degradation
Week 2: RAG for Production
Day 1-2: Chapter 7 - RAG Systems (focus on production sections)
Focus: Vector databases, scaling, caching
Day 3-4: Chapter 31 - Reliability Engineering
Focus: SLOs for AI, graceful degradation
Day 5: Design a production RAG architecture
Include: caching, monitoring, fallbacks
Week 3: Security and Scale
Day 1-2: Chapter 16 - Security
Focus: Defense-in-depth, input validation
Exercise: Implement security layers
Day 3-4: Chapter 25 - System Design at Scale
Focus: AI-specific scaling patterns
Day 5: Chapter 32 - Cost Engineering
Focus: Cost optimization, attribution
Week 4: Agentic Systems
Day 1-3: Chapter 8 - Agentic Systems
Focus: Tool use, safety constraints
Exercise: Build a tool-using agent
Day 4-5: Integration project
Add AI feature to an existing backend service
Path 2 Completion Checklist
□ Can integrate LLM APIs with proper error handling
□ Can design production RAG architecture
□ Can implement AI security measures
□ Understand cost implications of AI features
□ Can build and deploy tool-using agents
□ Integrated AI into at least one backend service
Path 3: Senior to Staff Transition
For: Senior engineers preparing for Staff-level AI responsibilities Time: 6-8 weeks (15-20 hours/week) Goal: Lead AI system design and cross-team initiatives
Phase 1: Technical Depth (Weeks 1-3)
┌─────────────────────────────────────────────────────────────────────────────┐
│ PHASE 1: DEEP TECHNICAL KNOWLEDGE │
└─────────────────────────────────────────────────────────────────────────────┘
Week 1:
- Chapter 5: LLM Foundations (deep read, including math)
- Chapter 27: Performance Engineering (full chapter)
- Appendix C: Read 3-5 foundational papers
Week 2:
- Chapter 7: RAG Systems (including GraphRAG, advanced patterns)
- Chapter 30: Data Architecture for AI
- Exercise: Design a feature store
Week 3:
- Chapter 8: Agentic Systems (full depth)
- Chapter 16: Security (including agentic security)
- Chapter 9: Deployment (optimization focus)
Phase 2: Architecture and Scale (Weeks 4-5)
Week 4:
- Chapter 25: System Design at Scale
- Chapter 31: Reliability Engineering
- Exercise: Design a system handling 1M requests/day
Week 5:
- Chapter 26: Technical Decision Making
- Chapter 32: Cost Engineering
- Appendix G: Study ADR examples
- Exercise: Write 3 ADRs for hypothetical decisions
Phase 3: Leadership Skills (Weeks 6-7)
Week 6:
- Chapter 21: Deepening Technical Expertise
- Chapter 28: Research-to-Production
- Chapter 29: Cross-Team Technical Leadership
- Appendix C: Identify 2-3 papers to implement
Week 7:
- Chapter 23: Technical Communication
- Chapter 24: Mentorship Foundations
- Exercise: Write a design doc for an AI system
- Exercise: Create a learning plan for a junior engineer
Phase 4: Capstone (Week 8)
- Review Part II of this appendix (Interview Prep - Staff level)
- Complete one capstone project from Appendix E
- Present design to peers for feedback
Path 3 Key Deliverables
□ 3 ADRs written for AI decisions
□ 1 system design document
□ 1 design doc reviewed by peers
□ 2-3 papers summarized with implementation notes
□ 1 capstone project completed
□ Can explain tradeoffs in major AI architecture decisions
Path 4: Interview Preparation (2-Week Intensive)
For: Engineers preparing for AI/ML engineering interviews Time: 2 weeks (25-30 hours/week) Goal: Pass technical interviews for AI engineering roles
Week 1: Core Knowledge
┌─────────────────────────────────────────────────────────────────────────────┐
│ WEEK 1: FUNDAMENTALS │
└─────────────────────────────────────────────────────────────────────────────┘
Day 1:
Morning: Chapter 5 - LLM Foundations (3 hours)
Afternoon: Practice explaining transformers, attention
Day 2:
Morning: Chapter 7 - RAG Systems (3 hours)
Afternoon: Implement a basic RAG system from scratch
Day 3:
Morning: Chapter 9 - Deployment (2 hours)
Afternoon: Chapter 27 - Performance (2 hours, focus on inference)
Day 4:
Morning: Chapter 15 - Evaluation (2 hours)
Afternoon: Practice evaluation questions
Day 5:
Morning: Chapter 8 - Agentic Systems (3 hours)
Afternoon: Practice system design scenarios
Week 2: System Design + Practice
Day 1:
Morning: Chapter 25 - System Design at Scale
Afternoon: Practice design: "Design a RAG system for 10M documents"
Day 2:
Morning: Chapter 16 - Security
Afternoon: Practice design: "Add AI to an existing product"
Day 3:
Morning: Chapter 32 - Cost Engineering
Afternoon: Practice: Cost estimation exercises
Day 4:
Full day: Part II of this appendix (Interview Preparation)
Do all practice questions
Day 5:
Mock interviews with peers
Review weak areas
Interview Topics Checklist
Fundamentals:
□ Explain transformers and attention
□ Compare embedding models
□ Describe tokenization approaches
RAG:
□ Design a RAG pipeline
□ Discuss chunking tradeoffs
□ Explain hybrid search
□ Handle evaluation
Deployment:
□ Compare vLLM vs TGI
□ Explain batching strategies
□ Discuss quantization tradeoffs
System Design:
□ Design AI search system
□ Design AI chatbot at scale
□ Design agent with tools
□ Handle failure scenarios
Evaluation:
□ Design evaluation pipeline
□ Discuss LLM-as-judge
□ Explain A/B testing for AI
Security:
□ Explain prompt injection
□ Design defense-in-depth
□ Agent safety patterns
Path 5: ML Engineer Deepening LLM Knowledge
For: ML engineers with classical ML background, new to LLMs Time: 3-4 weeks (15-20 hours/week) Goal: Bridge from traditional ML to LLM engineering
Week 1: LLM-Specific Concepts
Day 1-2: Chapter 5 - LLM Foundations
Compare to your existing ML knowledge
Focus: What's different from traditional ML?
Day 3-4: Chapter 6 - Prompt Engineering
This is new - no training loop equivalent
Day 5: Chapter 14 - Backend Engineering
Focus: Fine-tuning section
Compare: How is this different from traditional training?
Week 2: Retrieval and Data
Day 1-2: Chapter 7 - RAG Systems
Your embedding/similarity search knowledge transfers!
Day 3-4: Chapter 30 - Data Architecture
Focus: Training-serving skew (familiar concept, LLM specifics)
Day 5: Chapter 15 - MLOps & Evaluation
Focus: LLM-specific evaluation challenges
Week 3: Production Systems
Day 1-2: Chapter 9 - Deployment
Focus: How LLM serving differs from traditional model serving
Day 3-4: Chapter 27 - Performance Engineering
Your GPU knowledge helps! Focus on LLM-specific optimizations
Day 5: Chapter 8 - Agentic Systems
New paradigm - models as reasoning engines
Week 4: Advanced Topics
Day 1-2: Chapter 25 - System Design
Apply your architecture knowledge to LLM systems
Day 3-4: Chapter 16 - Security
New attack vectors specific to LLMs
Day 5: Review and gap-fill
Key Differences to Understand
| Traditional ML | LLM Engineering |
|---|---|
| Training is central | Prompting is central |
| Feature engineering | Context engineering |
| Model versioning | Prompt versioning |
| Batch inference common | Real-time generation |
| Training data curation | RAG corpus curation |
| Metrics well-defined | Metrics often subjective |
Path 6: Tech Lead Establishing AI Practice
For: Tech leads setting up AI capabilities for their team Time: 4 weeks (10-15 hours/week) Goal: Establish team processes and make sound technical decisions
Week 1: Foundation Knowledge
Day 1: Chapter 5 - LLM Foundations (skim for concepts)
Day 2: Chapter 6 - Prompt Engineering (practical focus)
Day 3: Chapter 7 - RAG Systems (architecture focus)
Day 4: Chapter 9 - Deployment (infrastructure decisions)
Day 5: Appendix B - Tools overview
Week 2: Making Decisions
Day 1-2: Chapter 26 - Technical Decision Making
Study ADR process
Day 3: Appendix G - Architecture Decision Records
Review all example ADRs
Day 4-5: Write ADRs for your team's AI decisions:
- Which LLM to use
- Self-hosted vs API
- Vector database selection
Week 3: Process and Quality
Day 1-2: Chapter 15 - MLOps & Evaluation
Focus: What processes does your team need?
Day 3: Chapter 16 - Security
Focus: What policies to establish?
Day 4-5: Chapter 20 - Responsible AI
Focus: Governance processes
Week 4: Team Development
Day 1: Chapter 24 - Mentorship
How to grow AI skills on your team
Day 2: Chapter 23 - Technical Communication
Standards for design docs
Day 3: This appendix (Learning Paths section)
Create learning paths for team members
Day 4-5: Create team documentation:
- AI coding standards
- Review checklist
- Learning resources
Deliverables for Your Team
□ Technology decisions documented (ADRs)
□ Security guidelines established
□ Evaluation standards defined
□ Learning paths for team members
□ Design doc template for AI features
□ Code review checklist for AI code
Quick Reference: Chapter by Topic
By Role/Level
| Topic | AI Engineer II | Senior | Staff+ |
|---|---|---|---|
| LLM Basics | Ch 5, 6 | Ch 5, 6 | Ch 5 (deep) |
| RAG | Ch 7 (basics) | Ch 7 (full) | Ch 7 + GraphRAG |
| Agents | Ch 8 (basics) | Ch 8 (full) | Ch 8 + security |
| Deployment | Ch 9 (API) | Ch 9 (full) | Ch 9 + Ch 27 |
| Evaluation | Ch 15 (basics) | Ch 15 (full) | Ch 15 + custom |
| Architecture | — | Ch 25 (read) | Ch 25, 26, 30 |
| Leadership | — | Ch 22, 23 | Ch 29 |
By Topic
| Topic | Primary Chapters | Supporting |
|---|---|---|
| Building RAG | 7 | 5, 6, 9 |
| Building Agents | 8 | 6, 16 |
| Production ML | 9, 15, 27 | 25, 31 |
| System Design | 25, 30 | 9, 27, 31 |
| Security | 16 | 8, 20 |
| Career Growth | 21-24 | 29 |
| Interview Prep | Part II | 5, 7, 9, 25 |
Tracking Your Progress
Use this tracker as you work through your chosen path:
Chapter Progress:
□ Ch 5: LLM/NLP Foundations [___________]
□ Ch 6: Prompt Engineering [___________]
□ Ch 7: RAG Systems [___________]
□ Ch 8: Agentic Systems [___________]
□ Ch 9: LLM Deployment [___________]
□ Ch 10: Orchestration Frameworks [___________]
□ Ch 11: Observability/Guardrails [___________]
□ Ch 12: Cloud AI Providers [___________]
□ Ch 13: Multi-Cloud Patterns [___________]
□ Ch 14: Backend Engineering [___________]
□ Ch 15: MLOps & Evaluation [___________]
□ Ch 16: Security [___________]
□ Ch 17: Vision & Document AI [___________]
□ Ch 18: Audio & Speech [___________]
□ Ch 19: Video & Multimodal RAG [___________]
□ Ch 20: Responsible AI [___________]
□ Ch 21: Technical Expertise [___________]
□ Ch 22: Project Ownership [___________]
□ Ch 23: Technical Communication [___________]
□ Ch 24: Mentorship [___________]
□ Ch 25: System Design [___________]
□ Ch 26: Decision Making [___________]
□ Ch 27: Performance [___________]
□ Ch 28: Research-to-Production [___________]
□ Ch 29: Technical Leadership [___________]
□ Ch 30: Data Architecture [___________]
□ Ch 31: Reliability [___________]
□ Ch 32: Cost Engineering [___________]
Projects Completed:
□ Basic RAG system
□ Agent with tools
□ Evaluation pipeline
□ Production deployment
□ Capstone project
Skills Verified:
□ Can explain core concepts
□ Can implement from scratch
□ Can debug common issues
□ Can design systems
□ Can lead discussions
Part II: Interview Preparation
This section provides guidance for AI/ML engineering interviews at various levels. It covers what to expect, how to prepare, and sample questions with frameworks for answering.
Interview Process Overview
Typical AI Engineering Interview Loop
| Stage | Duration | Focus Areas |
|---|---|---|
| Phone Screen | 30-60 min | ML basics, experience, motivation |
| Technical Coding | 45-60 min | DS&A, ML-specific problems, code quality |
| System Design | 45-60 min | Architecture, scale concerns, trade-offs |
| Behavioral & Culture | 30-45 min | Past work, leadership, collaboration |
Preparation by Level
ML Engineer / AI Engineer (Entry to Mid-Level)
Coding Focus: Strong fundamentals, clean implementation
ML Fundamentals: Know the basics cold - How do common algorithms work (linear regression, decision trees, neural networks)? - What is overfitting and how do you prevent it? - Explain gradient descent - What metrics do you use for classification vs. regression? - How do you handle imbalanced data?
Practical Experience: Demonstrate you can ship ML - Tell me about an ML project you’ve worked on - What challenges did you face? How did you debug them? - How did you evaluate your model?
System Design: Basic ML pipelines - Design a spam classifier - Build a recommendation system for a small app - Create a sentiment analysis API
Senior ML Engineer (4-7 years)
Coding Focus: Efficiency and ML-specific implementations - Implement attention mechanism - Write efficient data pipeline code - Optimize inference bottlenecks
ML Depth: Strong understanding of at least one area - Deep dive on transformers, training dynamics, or retrieval - Trade-offs between different approaches - When to use what technique
System Design: Production-grade systems - Design a real-time recommendation system at scale - Build a RAG system for customer support - Create an LLM-powered search feature
Leadership: Technical decision-making - How do you evaluate build vs. buy? - Tell me about a technical decision you influenced - How do you balance quality and velocity?
Staff Engineer (7+ years)
Technical Strategy: Architecture and direction - How would you design an ML platform? - What’s your approach to technical roadmapping? - How do you evaluate new technologies?
Cross-Functional Leadership: Influence and alignment - Tell me about a time you drove alignment across teams - How do you communicate technical concepts to non-technical stakeholders? - Describe a cross-team initiative you led
System Design: Complex, multi-team systems - Design a company-wide feature platform - Architect an LLM infrastructure for multiple products - Build a real-time ML serving system handling millions of QPS
Technical Judgment: Deep trade-off analysis - When would you not use ML? - How do you decide between approaches? - What’s your framework for technical debt?
Core Technical Areas
Coding Questions
Data Structures & Algorithms (Standard) - Arrays, linked lists, trees, graphs - Sorting, searching, dynamic programming - Time and space complexity
ML-Specific Coding - Implement softmax, cross-entropy loss - Write a simple neural network forward pass - Process and clean a dataset - Implement k-means clustering - Build a simple recommendation algorithm
Sample Questions:
- Implement a function to compute cosine similarity between two vectors
- Write efficient code to find the k most similar items given embeddings
- Implement a simple tokenizer (word-level or BPE-style)
- Write a function to compute precision, recall, and F1
- Implement stratified train/test split
Framework for Coding Interviews:
1. Clarify requirements (5 min)
- Input/output format
- Edge cases
- Scale expectations
2. Design approach (5 min)
- Explain your strategy
- Discuss trade-offs
- Get interviewer buy-in
3. Implement (25 min)
- Write clean code
- Handle edge cases
- Test as you go
4. Test and optimize (5 min)
- Walk through examples
- Discuss optimizations
- Consider extensions
ML System Design
Framework for ML System Design:
1. Clarify Requirements (5-10 min)
├── Business objective
├── Scale (users, requests, data)
├── Latency requirements
├── Data availability
└── Success metrics
2. High-Level Architecture (10 min)
├── Data pipeline
├── Feature engineering
├── Model selection
├── Training infrastructure
└── Serving system
3. Deep Dives (20-30 min)
├── Model choice and trade-offs
├── Feature design
├── Handling scale
├── Evaluation strategy
└── Failure modes
4. Production Considerations (10 min)
├── Monitoring and observability
├── A/B testing
├── Iterative improvement
└── Cost considerations
Common ML System Design Questions:
- Design a recommendation system for [Netflix/Spotify/E-commerce]
- Candidate generation vs. ranking
- Cold start handling
- Real-time vs. batch
- Evaluation metrics
- Design a search ranking system
- Query understanding
- Retrieval (BM25, dense, hybrid)
- Ranking model
- Online/offline evaluation
- Design a RAG-based chatbot for customer support
- Document processing and chunking
- Embedding and retrieval
- Generation with guardrails
- Fallback handling
- Design a content moderation system
- Multi-class classification
- Handling edge cases
- Human-in-the-loop
- False positive/negative trade-offs
- Design a fraud detection system
- Real-time vs. batch
- Feature engineering
- Handling class imbalance
- Concept drift
- Design an LLM serving infrastructure
- Model selection and routing
- Caching strategies
- Rate limiting
- Cost optimization
ML Fundamentals
Topics to Know Deeply:
Supervised Learning - Linear/logistic regression - Decision trees, random forests - Gradient boosting (XGBoost, LightGBM) - Neural networks - Bias-variance trade-off - Regularization techniques
Deep Learning - Backpropagation - Activation functions - Batch normalization, layer normalization - Attention mechanism - Transformer architecture - Training dynamics (learning rate, batch size)
NLP/LLM Specific - Tokenization (BPE, WordPiece) - Word embeddings (Word2Vec, FastText) - Transformer architecture in detail - Pre-training objectives (MLM, CLM) - Fine-tuning approaches (full, LoRA, PEFT) - Prompting and in-context learning
Evaluation - Classification metrics (precision, recall, F1, AUC-ROC) - Regression metrics (MSE, MAE, R²) - Ranking metrics (NDCG, MRR, MAP) - A/B testing methodology - Statistical significance
Common Questions:
- Explain the attention mechanism. Why does it work?
- What’s the difference between batch norm and layer norm?
- How does dropout work? Why does it help?
- What is the vanishing gradient problem? How do you address it?
- Explain the transformer architecture. What are the key components?
- What’s the difference between encoder-only, decoder-only, and encoder-decoder?
- How does RLHF work? What are alternatives?
- What is prompt injection and how do you defend against it?
Behavioral Interview Preparation
Common Behavioral Questions
Past Projects - Tell me about a technically challenging ML project - Describe a time when an ML approach didn’t work. What did you do? - Walk me through how you improved model performance on a difficult problem
Collaboration - Tell me about a time you disagreed with a teammate on a technical approach - How do you work with non-ML engineers? - Describe a cross-functional project you led
Leadership - How do you mentor junior engineers? - Tell me about a time you had to make a difficult technical decision - How do you prioritize when you have competing demands?
Growth - What’s a technical area where you’ve grown significantly? - Tell me about a failure and what you learned - How do you stay current with ML developments?
STAR Framework
Structure behavioral answers using STAR:
Situation: Brief context (2-3 sentences)
Task: Your specific responsibility
Action: What YOU did (most important part)
Result: Quantified outcome and learnings
Example:
Question: Tell me about a time an ML project failed
Situation: “We were building a recommendation system for a new feature, and three weeks before launch, we realized the model wasn’t performing well on our key metric.”
Task: “As the ML lead, I was responsible for diagnosing the issue and getting us back on track for launch.”
Action: “I did three things: First, I analyzed our evaluation data and found we had significant data leakage in our offline metrics—our offline numbers looked great but didn’t reflect real performance. Second, I designed a quick A/B test to validate, which confirmed the issue. Third, I proposed a simpler baseline model that we could ship on time, with plans to iterate.”
Result: “We launched with the simpler model, which still improved our metric by 8% over the control. The experience led me to establish better evaluation practices for the team, including mandatory online/offline correlation checks. The follow-up version, with proper evaluation, achieved 15% improvement.”
Sample Interview Questions by Topic
LLM and Prompting
- How would you improve a prompt that’s giving inconsistent results?
- What’s the difference between zero-shot, few-shot, and fine-tuning?
- How would you evaluate an LLM-based feature?
- Explain chain-of-thought prompting. When would you use it?
- How do you handle hallucinations in production?
RAG Systems
- Walk me through designing a RAG system end-to-end
- How do you evaluate retrieval quality?
- When would you use dense vs. sparse retrieval?
- How do you handle documents that are too long for context?
- What are failure modes of RAG and how do you address them?
Training and Fine-Tuning
- When would you fine-tune vs. prompt an LLM?
- Explain LoRA. Why does it work?
- How do you prevent catastrophic forgetting during fine-tuning?
- What’s the difference between SFT, RLHF, and DPO?
- How do you decide on training data quantity and quality?
Infrastructure and Scale
- How would you reduce inference latency for an LLM service?
- Explain the trade-offs of different quantization approaches
- How would you handle 10x traffic increase?
- Design a caching strategy for an LLM application
- How do you monitor an ML system in production?
Data and Features
- How do you handle training-serving skew?
- What is a feature store and when would you use one?
- How do you detect and handle data drift?
- Explain the importance of point-in-time correctness
- How do you version ML data and models?
Interview Day Tips
Before the Interview
- Review the company’s ML products and published work
- Refresh fundamentals (don’t assume you remember)
- Prepare 2-3 projects to discuss in depth
- Practice explaining technical concepts aloud
- Get good sleep
During the Interview
For Coding:
- Think aloud—silence is concerning
- Start with brute force, then optimize
- Test your code before declaring done
- Ask clarifying questions
For System Design:
- Drive the conversation—don’t wait for prompts
- Draw diagrams (even in virtual interviews)
- Discuss trade-offs explicitly
- Connect to business requirements
For Behavioral:
- Be specific—vague answers don’t score well
- Focus on YOUR contributions
- Include quantified results when possible
- Be honest about failures—they want self-awareness
Questions to Ask Interviewers
Good questions demonstrate genuine interest:
- What ML infrastructure do you use?
- How do you evaluate ML model success?
- What’s the biggest technical challenge the team faces?
- How do you balance research and production?
- What does the on-call look like for ML systems?
Level-Specific Preparation Checklists
Entry-Level (0-3 years)
Mid-Level (3-5 years)
Senior (5-8 years)
Staff+ (8+ years)
Resources for Preparation
Books
- “Designing Machine Learning Systems” (Chip Huyen)
- “Machine Learning Engineering” (Andriy Burkov)
- “Building Machine Learning Powered Applications” (Emmanuel Ameisen)
Courses
- Stanford CS229 (ML fundamentals)
- Stanford CS224N (NLP)
- fast.ai (Practical deep learning)
Practice
- LeetCode for coding
- ML system design practice with peers
- Mock interviews (Pramp, interviewing.io)
Stay Current
- Papers With Code
- ML blogs (OpenAI, Anthropic, Google AI)
- r/MachineLearning
Detailed Answer Examples
ML System Design Example: Design a Recommendation System
Here’s how to structure your answer for a recommendation system design question:
Opening (Clarify Requirements): “Before I dive in, let me make sure I understand the requirements. Is this for a new feature or an existing system? What’s our scale in terms of users and items? What’s the latency budget for serving recommendations? Do we have historical user data already?”
Assume answers: 10M users, 1M items, 100ms latency, we have 6 months of click/purchase data
High-Level Architecture: “I’d structure this as a two-stage system: candidate generation to narrow from 1M items to ~1000, then a ranking model to select the top 10-20.
For candidate generation:
- Item-based collaborative filtering for items similar to user’s history
- User embedding similarity to find items liked by similar users
- Popularity baseline to ensure coverage
For ranking:
- Features: user features, item features, cross features (user-item interactions)
- Model: Gradient boosted trees initially, can upgrade to neural ranking later
- Training: Point-wise learning on click/purchase labels
Serving architecture:
- Pre-computed candidate sets refreshed hourly
- Online ranking with cached user/item features
- Feature store for real-time features (recent clicks)”
Deep Dives (interviewer may ask about specific areas):
“How do you handle cold start?” “For new users: popularity-based recommendations, quickly incorporating their first few interactions. For new items: content-based features (category, description embedding), explore/exploit to get initial signals.”
“How do you evaluate?” “Offline: We’d look at precision@k, recall@k, NDCG. But offline metrics can be misleading, so we’d run A/B tests measuring click-through rate and ultimately conversion. We’d also track diversity and novelty metrics to ensure we’re not just showing popular items.”
“What about serving latency?” “The 100ms budget is tight. I’d pre-compute as much as possible: user embeddings daily, candidate sets hourly. The ranking model would be a lightweight scorer. We’d use caching for frequently requested user recommendations. If needed, we can trade off ranking model complexity for speed.”
Behavioral Answer Example: Technical Disagreement
Question: “Tell me about a time you disagreed with a teammate on a technical approach.”
Using STAR:
Situation: “On my last team, we were building an LLM-powered search feature. The senior engineer on the project wanted to fine-tune a model specifically for our domain, while I believed a RAG approach with a general model would be better.”
Task: “I needed to advocate for my approach while maintaining a collaborative relationship and being open to being wrong.”
Action: “First, I made sure I deeply understood their reasoning—they were concerned about domain-specific vocabulary that a general model might not handle well. That was a valid concern.
I then proposed we run a quick experiment: two days to prototype both approaches on a subset of queries. I wrote up clear evaluation criteria before we ran the experiment so we’d have objective measures.
The results showed RAG actually performed better on most queries, but fine-tuning won on queries with technical jargon. So we ended up with a hybrid: RAG for most queries with a fine-tuned reranker for technical terms.
Throughout, I made sure to frame it as ‘finding the best approach’ rather than ‘proving I was right.’ When the hybrid emerged, we both felt ownership.”
Result: “The hybrid approach improved our search relevance by 20% over the baseline. More importantly, we established a pattern of data-driven decision making that the team used for subsequent technical debates. The other engineer and I became close collaborators afterward.”
Coding Answer Example: Implement Cosine Similarity
Question: “Implement a function to compute cosine similarity between two vectors.”
Initial Clarification: “Are these dense vectors? Any constraints on length? Should I handle edge cases like zero vectors?”
Solution:
import math
from typing import List
def cosine_similarity(vec1: List[float], vec2: List[float]) -> float:
"""
Compute cosine similarity between two vectors.
Args:
vec1: First vector
vec2: Second vector (must be same length)
Returns:
Cosine similarity in range [-1, 1]
Raises:
ValueError: If vectors have different lengths or are empty
"""
if len(vec1) != len(vec2):
raise ValueError("Vectors must have same length")
if len(vec1) == 0:
raise ValueError("Vectors cannot be empty")
# Compute dot product and magnitudes
dot_product = sum(a * b for a, b in zip(vec1, vec2))
magnitude1 = math.sqrt(sum(a * a for a in vec1))
magnitude2 = math.sqrt(sum(b * b for b in vec2))
# Handle zero vectors
if magnitude1 == 0 or magnitude2 == 0:
return 0.0 # Or could raise error depending on requirements
return dot_product / (magnitude1 * magnitude2)
# Test cases
def test_cosine_similarity():
# Identical vectors should have similarity 1
assert abs(cosine_similarity([1, 0], [1, 0]) - 1.0) < 1e-9
# Orthogonal vectors should have similarity 0
assert abs(cosine_similarity([1, 0], [0, 1]) - 0.0) < 1e-9
# Opposite vectors should have similarity -1
assert abs(cosine_similarity([1, 0], [-1, 0]) - (-1.0)) < 1e-9
# General case
assert abs(cosine_similarity([1, 2, 3], [4, 5, 6]) - 0.9746) < 0.001
print("All tests passed!")
test_cosine_similarity()Optimization Discussion: “If we’re computing many similarities, we’d want to use NumPy for vectorization. We could also pre-compute magnitudes if vectors are reused. For very high-dimensional vectors at scale, we’d consider approximate methods like LSH.”
Common Mistakes to Avoid
In Coding Interviews
- Starting to code too quickly: Spend time understanding and planning
- Not testing: Always walk through examples before saying “done”
- Silence: Keep talking about your thought process
- Ignoring edge cases: Empty inputs, negative numbers, etc.
- Over-engineering: Start simple, optimize if asked
In System Design
- Not clarifying requirements: Don’t assume—ask
- Jumping to details: Start with high-level architecture
- Not discussing trade-offs: Every choice has pros and cons
- Forgetting operations: Monitoring, debugging, iteration
- Ignoring scale: Think about 10x and 100x scenarios
In ML Fundamentals
- Memorizing without understanding: Can’t handle follow-up questions
- Not connecting to practice: Know when to use what
- Overcomplicating: Simple explanations are better
- Ignoring limitations: Every technique has failure modes
In Behavioral
- Being too vague: Specific stories are convincing
- Forgetting the result: STAR requires outcomes
- Taking all credit: “We” is OK, but explain YOUR role
- Only successes: Failure stories show self-awareness
Mock Interview Script
Practice with a friend using this structure:
Round 1: Coding (45 min) - 5 min: Introductions - 35 min: Coding problem (e.g., “Implement k-nearest neighbors”) - 5 min: Questions
Round 2: ML Design (45 min) - 5 min: Introductions - 35 min: Design problem (e.g., “Design a content recommendation system”) - 5 min: Questions
Round 3: ML Depth (30 min) - 5 min: Introductions - 20 min: Deep technical discussion (e.g., “Explain transformers in detail”) - 5 min: Questions
Round 4: Behavioral (30 min) - 5 min: Introductions - 20 min: Behavioral questions - 5 min: Questions
Feedback Session (15 min) - What went well - Areas for improvement - Specific suggestions
Week-by-Week Preparation Plan
4 Weeks Before Interview
Week 1: Foundations - Review ML fundamentals (algorithms, evaluation) - Start coding practice (2 problems/day) - List your top 3-4 projects with details
Week 2: System Design - Study ML system design patterns - Practice 2-3 design problems - Review company’s ML products/papers
Week 3: Deep Dives - Pick 1-2 areas to go deep - Read relevant papers - Prepare to explain at multiple levels
Week 4: Polish - Mock interviews (at least 2) - Behavioral story preparation - Rest and mental preparation
Day Before
- Review your project stories
- Light review of fundamentals (don’t cram)
- Prepare questions for interviewers
- Lay out interview logistics
- Get good sleep
Day Of
- Eat well, stay hydrated
- Arrive/log in early
- Have paper/pen ready
- Deep breaths between rounds
- Stay positive regardless of how rounds feel
Final Advice
Be Yourself: Authentic enthusiasm for ML comes through. If you’re genuinely interested in the field, let that show.
Admit Uncertainty: “I don’t know, but here’s how I’d approach finding out” is better than pretending to know.
Show Growth Mindset: Talk about what you’ve learned, not just what you know.
Practice: Interview skills improve with practice. Do mock interviews.
Learn from Rejections: Each interview is learning. Ask for feedback when possible.
Good luck with your interviews!
Company-Specific Preparation
Big Tech (Google, Meta, Amazon, Microsoft)
Focus Areas:
- Heavy emphasis on coding (expect 2+ rounds)
- System design at scale (billions of users)
- Leadership principles (especially Amazon)
- Past project deep dives
Preparation Tips:
- Practice with LeetCode medium/hard problems
- Study their published research papers
- Understand their ML infrastructure (TensorFlow, PyTorch, internal tools)
- Prepare stories that map to leadership principles
AI-First Companies (OpenAI, Anthropic, Cohere)
Focus Areas:
- Deep technical understanding of LLMs
- Research awareness and curiosity
- Safety and alignment considerations
- Novel problem-solving ability
Preparation Tips:
- Read their recent papers and blog posts
- Understand current LLM limitations
- Have opinions on open research questions
- Think about safety implications
ML-Heavy Startups
Focus Areas:
- End-to-end ownership
- Scrappiness and velocity
- Breadth across ML stack
- Business impact awareness
Preparation Tips:
- Understand their product and ML use cases
- Prepare for “wear many hats” scenarios
- Emphasize shipping and iteration
- Show flexibility in approaches
Traditional Tech with ML Teams
Focus Areas:
- Production ML experience
- Working with non-ML teams
- Integrating ML into existing systems
- Business metric focus
Preparation Tips:
- Understand their core business
- Prepare for less ML-specific coding
- Emphasize collaboration and communication
- Focus on practical impact over research novelty
Handling Difficult Interview Situations
When You Don’t Know the Answer
Good response: “I’m not sure about that specific detail, but here’s how I’d approach figuring it out…” or “I haven’t worked with that directly, but based on my understanding of similar systems…”
Why it works: Shows intellectual honesty and problem-solving approach
When You Make a Mistake
Good response: “Actually, let me reconsider that—I think I made an error. The correct approach is…”
Why it works: Self-correction shows strong fundamentals and self-awareness
When the Problem is Unclear
Good response: “Let me make sure I understand the requirements. Are we optimizing for X or Y? What constraints do we have?”
Why it works: Clarification is expected and shows good engineering practice
When You’re Stuck
Good response: “I’m thinking about a few different approaches. Let me talk through what I’m considering…” Then explain your thought process.
Why it works: Interviewers want to see how you think, not just the answer
When the Interviewer Seems Unengaged
Good response: Stay focused and professional. Ask if they have any questions or want you to go deeper on any area.
Why it works: Some interviewers are naturally reserved; don’t let it affect your performance
Post-Interview
After Each Round
- Take brief notes on questions asked
- Note any areas where you felt weak
- Don’t obsess—move on to the next round
After the Interview Day
- Send thank you notes (optional but good practice)
- Reflect on what went well and what to improve
- Don’t second-guess your performance excessively
If You Get an Offer
- Ask for detailed leveling and compensation breakdown
- Negotiate thoughtfully (it’s expected)
- Ask about team matching and projects
If You Get Rejected
- Ask for feedback (may or may not be provided)
- Reflect honestly on areas to improve
- Many successful candidates were rejected before
- Most companies allow re-applying after 6-12 months
Additional Practice Questions
LLM Deep Dive Questions
- Walk me through the transformer architecture component by component.
- How does self-attention differ from cross-attention?
- Explain the difference between pre-training and fine-tuning objectives.
- What are the key challenges in training large language models?
- How would you debug a model that’s producing inconsistent outputs?
- Explain RLHF at a technical level. What are alternatives?
- What’s the relationship between perplexity and quality?
- How do you think about the trade-off between model size and cost?
Production ML Questions
- How do you ensure your offline metrics correlate with online performance?
- Describe your approach to ML model monitoring.
- How do you handle model drift in production?
- What’s your framework for deciding when to retrain a model?
- How do you debug a model that’s performing poorly in production?
- Describe a production ML system you’ve built or maintained.
- How do you balance technical debt and feature velocity in ML systems?
- What are the most common causes of ML system failures you’ve seen?
Leadership and Strategy Questions (for Senior+)
- How do you evaluate whether to use ML for a problem?
- Describe how you’ve influenced technical direction outside your team.
- How do you balance investment in ML infrastructure vs. applications?
- Tell me about a time you had to make a difficult technical trade-off.
- How do you stay current with ML developments while being productive?
- What’s your approach to building high-performing ML teams?
- How do you communicate ML capabilities and limitations to stakeholders?
- What do you see as the most important trends in AI engineering?