Appendix D: Career Development

This appendix combines two essential resources for your AI engineering career: Learning Paths to guide your study through the textbook based on your background and goals, and Interview Preparation to help you succeed in AI/ML engineering interviews at all levels.

Part I: Learning Paths

This section provides guided reading orders through the textbook based on your background and goals. Each path includes the recommended sequence, time estimates, and key focus areas.

How to Use These Paths

Choose your path based on current role and goal
Follow the sequence - chapters build on each other
Do the exercises - reading alone isn’t enough
Build something - apply knowledge to a real project
Revisit as needed - use as reference later

Path 1: New to AI/ML (SWE Background)

For: Software engineers with no ML experience Time: 4-6 weeks (10-15 hours/week) Goal: Foundational understanding to contribute to AI projects

Week 1: Understanding LLMs

┌─────────────────────────────────────────────────────────────────────────────┐
│ WEEK 1: FOUNDATIONS                                                          │
└─────────────────────────────────────────────────────────────────────────────┘

Day 1-2: Chapter 5 - LLM/NLP Foundations
         Focus: Tokenization, embeddings, attention mechanism
         Skip: Mathematical details (return later if needed)

Day 3-4: Chapter 6 - Prompt Engineering
         Focus: Structured outputs, few-shot learning
         Exercise: Try 10 different prompting patterns on a real task

Day 5:   Appendix A - Glossary
         Review all terms, note unfamiliar ones

Key Concepts to Master:

How tokenization works
What embeddings represent
Why attention matters
Basic prompt patterns

Week 2: Building RAG Systems

Day 1-3: Chapter 7 - RAG Systems Deep Dive
         Focus: Chunking, embeddings, basic retrieval
         Skip: GraphRAG, advanced patterns (come back in Week 5)

Day 4-5: Hands-on Practice
         Build a simple RAG system with:
         - 100 documents
         - Basic chunking
         - Vector search
         - Simple prompt template

Key Concepts to Master:

Chunking strategies
Vector similarity search
Context assembly

Week 3: Deployment Basics

Day 1-2: Chapter 9 - LLM Deployment
         Focus: API integration, basic serving concepts
         Skip: GPU optimization (defer to later)

Day 3-4: Chapter 14 - Backend Engineering for AI
         Focus: Testing, debugging LLM apps
         Exercise: Write tests for your Week 2 RAG system

Day 5:   Review and consolidate
         Complete practical exercises

Week 4: Evaluation and Quality

Day 1-3: Chapter 15 - MLOps & Evaluation
         Focus: Basic metrics, simple evaluation pipelines
         Skip: Complex A/B testing (return when needed)

Day 4-5: Build an evaluation pipeline for your RAG system

Week 5: Expanding Knowledge

Day 1-2: Chapter 8 - Agentic Systems
         Focus: Tool use, basic agent patterns

Day 3-4: Chapter 16 - Security
         Focus: Prompt injection basics

Day 5:   Part II of this appendix (Interview Prep - AI Engineer II section)

Week 6: Integration Project

Build a complete application using what you’ve learned:

Document Q&A system with evaluation
Include basic security measures
Write comprehensive tests

Path 1 Completion Checklist

□ Can explain how an LLM generates text
□ Can write effective prompts for different tasks
□ Can build a basic RAG system
□ Can evaluate RAG quality
□ Understand basic security concerns
□ Built at least one working application

Path 2: Backend Engineer Adding AI

For: Experienced backend engineers integrating AI into existing systems Time: 3-4 weeks (15-20 hours/week) Goal: Productionize AI features in backend services

Week 1: Practical LLM Integration

┌─────────────────────────────────────────────────────────────────────────────┐
│ WEEK 1: INTEGRATION PATTERNS                                                 │
└─────────────────────────────────────────────────────────────────────────────┘

Day 1:   Chapter 6 - Prompt Engineering
         Focus: Structured outputs, caching strategies

Day 2-3: Chapter 9 - LLM Deployment
         Focus: API integration, rate limiting, error handling
         Exercise: Build a robust LLM client with retries

Day 4-5: Chapter 14 - Backend Engineering for AI
         Focus: Integration patterns, testing strategies

Key Patterns to Implement:

Retry with exponential backoff
Response caching
Structured output parsing
Graceful degradation

Week 2: RAG for Production

Day 1-2: Chapter 7 - RAG Systems (focus on production sections)
         Focus: Vector databases, scaling, caching

Day 3-4: Chapter 31 - Reliability Engineering
         Focus: SLOs for AI, graceful degradation

Day 5:   Design a production RAG architecture
         Include: caching, monitoring, fallbacks

Week 3: Security and Scale

Day 1-2: Chapter 16 - Security
         Focus: Defense-in-depth, input validation
         Exercise: Implement security layers

Day 3-4: Chapter 25 - System Design at Scale
         Focus: AI-specific scaling patterns

Day 5:   Chapter 32 - Cost Engineering
         Focus: Cost optimization, attribution

Week 4: Agentic Systems

Day 1-3: Chapter 8 - Agentic Systems
         Focus: Tool use, safety constraints
         Exercise: Build a tool-using agent

Day 4-5: Integration project
         Add AI feature to an existing backend service

Path 2 Completion Checklist

□ Can integrate LLM APIs with proper error handling
□ Can design production RAG architecture
□ Can implement AI security measures
□ Understand cost implications of AI features
□ Can build and deploy tool-using agents
□ Integrated AI into at least one backend service

Path 3: Senior to Staff Transition

For: Senior engineers preparing for Staff-level AI responsibilities Time: 6-8 weeks (15-20 hours/week) Goal: Lead AI system design and cross-team initiatives

Phase 1: Technical Depth (Weeks 1-3)

┌─────────────────────────────────────────────────────────────────────────────┐
│ PHASE 1: DEEP TECHNICAL KNOWLEDGE                                            │
└─────────────────────────────────────────────────────────────────────────────┘

Week 1:
  - Chapter 5: LLM Foundations (deep read, including math)
  - Chapter 27: Performance Engineering (full chapter)
  - Appendix C: Read 3-5 foundational papers

Week 2:
  - Chapter 7: RAG Systems (including GraphRAG, advanced patterns)
  - Chapter 30: Data Architecture for AI
  - Exercise: Design a feature store

Week 3:
  - Chapter 8: Agentic Systems (full depth)
  - Chapter 16: Security (including agentic security)
  - Chapter 9: Deployment (optimization focus)

Phase 2: Architecture and Scale (Weeks 4-5)

Week 4:
  - Chapter 25: System Design at Scale
  - Chapter 31: Reliability Engineering
  - Exercise: Design a system handling 1M requests/day

Week 5:
  - Chapter 26: Technical Decision Making
  - Chapter 32: Cost Engineering
  - Appendix G: Study ADR examples
  - Exercise: Write 3 ADRs for hypothetical decisions

Phase 3: Leadership Skills (Weeks 6-7)

Week 6:
  - Chapter 21: Deepening Technical Expertise
  - Chapter 28: Research-to-Production
  - Chapter 29: Cross-Team Technical Leadership
  - Appendix C: Identify 2-3 papers to implement

Week 7:
  - Chapter 23: Technical Communication
  - Chapter 24: Mentorship Foundations
  - Exercise: Write a design doc for an AI system
  - Exercise: Create a learning plan for a junior engineer

Phase 4: Capstone (Week 8)

- Review Part II of this appendix (Interview Prep - Staff level)
- Complete one capstone project from Appendix E
- Present design to peers for feedback

Path 3 Key Deliverables

□ 3 ADRs written for AI decisions
□ 1 system design document
□ 1 design doc reviewed by peers
□ 2-3 papers summarized with implementation notes
□ 1 capstone project completed
□ Can explain tradeoffs in major AI architecture decisions

Path 4: Interview Preparation (2-Week Intensive)

For: Engineers preparing for AI/ML engineering interviews Time: 2 weeks (25-30 hours/week) Goal: Pass technical interviews for AI engineering roles

Week 1: Core Knowledge

┌─────────────────────────────────────────────────────────────────────────────┐
│ WEEK 1: FUNDAMENTALS                                                         │
└─────────────────────────────────────────────────────────────────────────────┘

Day 1:
  Morning: Chapter 5 - LLM Foundations (3 hours)
  Afternoon: Practice explaining transformers, attention

Day 2:
  Morning: Chapter 7 - RAG Systems (3 hours)
  Afternoon: Implement a basic RAG system from scratch

Day 3:
  Morning: Chapter 9 - Deployment (2 hours)
  Afternoon: Chapter 27 - Performance (2 hours, focus on inference)

Day 4:
  Morning: Chapter 15 - Evaluation (2 hours)
  Afternoon: Practice evaluation questions

Day 5:
  Morning: Chapter 8 - Agentic Systems (3 hours)
  Afternoon: Practice system design scenarios

Week 2: System Design + Practice

Day 1:
  Morning: Chapter 25 - System Design at Scale
  Afternoon: Practice design: "Design a RAG system for 10M documents"

Day 2:
  Morning: Chapter 16 - Security
  Afternoon: Practice design: "Add AI to an existing product"

Day 3:
  Morning: Chapter 32 - Cost Engineering
  Afternoon: Practice: Cost estimation exercises

Day 4:
  Full day: Part II of this appendix (Interview Preparation)
  Do all practice questions

Day 5:
  Mock interviews with peers
  Review weak areas

Interview Topics Checklist

Fundamentals:
□ Explain transformers and attention
□ Compare embedding models
□ Describe tokenization approaches

RAG:
□ Design a RAG pipeline
□ Discuss chunking tradeoffs
□ Explain hybrid search
□ Handle evaluation

Deployment:
□ Compare vLLM vs TGI
□ Explain batching strategies
□ Discuss quantization tradeoffs

System Design:
□ Design AI search system
□ Design AI chatbot at scale
□ Design agent with tools
□ Handle failure scenarios

Evaluation:
□ Design evaluation pipeline
□ Discuss LLM-as-judge
□ Explain A/B testing for AI

Security:
□ Explain prompt injection
□ Design defense-in-depth
□ Agent safety patterns

Path 5: ML Engineer Deepening LLM Knowledge

For: ML engineers with classical ML background, new to LLMs Time: 3-4 weeks (15-20 hours/week) Goal: Bridge from traditional ML to LLM engineering

Week 1: LLM-Specific Concepts

Day 1-2: Chapter 5 - LLM Foundations
         Compare to your existing ML knowledge
         Focus: What's different from traditional ML?

Day 3-4: Chapter 6 - Prompt Engineering
         This is new - no training loop equivalent

Day 5:   Chapter 14 - Backend Engineering
         Focus: Fine-tuning section
         Compare: How is this different from traditional training?

Week 2: Retrieval and Data

Day 1-2: Chapter 7 - RAG Systems
         Your embedding/similarity search knowledge transfers!

Day 3-4: Chapter 30 - Data Architecture
         Focus: Training-serving skew (familiar concept, LLM specifics)

Day 5:   Chapter 15 - MLOps & Evaluation
         Focus: LLM-specific evaluation challenges

Week 3: Production Systems

Day 1-2: Chapter 9 - Deployment
         Focus: How LLM serving differs from traditional model serving

Day 3-4: Chapter 27 - Performance Engineering
         Your GPU knowledge helps! Focus on LLM-specific optimizations

Day 5:   Chapter 8 - Agentic Systems
         New paradigm - models as reasoning engines

Week 4: Advanced Topics

Day 1-2: Chapter 25 - System Design
         Apply your architecture knowledge to LLM systems

Day 3-4: Chapter 16 - Security
         New attack vectors specific to LLMs

Day 5:   Review and gap-fill

Key Differences to Understand

Traditional ML	LLM Engineering
Training is central	Prompting is central
Feature engineering	Context engineering
Model versioning	Prompt versioning
Batch inference common	Real-time generation
Training data curation	RAG corpus curation
Metrics well-defined	Metrics often subjective

Path 6: Tech Lead Establishing AI Practice

For: Tech leads setting up AI capabilities for their team Time: 4 weeks (10-15 hours/week) Goal: Establish team processes and make sound technical decisions

Week 1: Foundation Knowledge

Day 1: Chapter 5 - LLM Foundations (skim for concepts)
Day 2: Chapter 6 - Prompt Engineering (practical focus)
Day 3: Chapter 7 - RAG Systems (architecture focus)
Day 4: Chapter 9 - Deployment (infrastructure decisions)
Day 5: Appendix B - Tools overview

Week 2: Making Decisions

Day 1-2: Chapter 26 - Technical Decision Making
         Study ADR process

Day 3:   Appendix G - Architecture Decision Records
         Review all example ADRs

Day 4-5: Write ADRs for your team's AI decisions:
         - Which LLM to use
         - Self-hosted vs API
         - Vector database selection

Week 3: Process and Quality

Day 1-2: Chapter 15 - MLOps & Evaluation
         Focus: What processes does your team need?

Day 3:   Chapter 16 - Security
         Focus: What policies to establish?

Day 4-5: Chapter 20 - Responsible AI
         Focus: Governance processes

Week 4: Team Development

Day 1:   Chapter 24 - Mentorship
         How to grow AI skills on your team

Day 2:   Chapter 23 - Technical Communication
         Standards for design docs

Day 3:   This appendix (Learning Paths section)
         Create learning paths for team members

Day 4-5: Create team documentation:
         - AI coding standards
         - Review checklist
         - Learning resources

Deliverables for Your Team

□ Technology decisions documented (ADRs)
□ Security guidelines established
□ Evaluation standards defined
□ Learning paths for team members
□ Design doc template for AI features
□ Code review checklist for AI code

Quick Reference: Chapter by Topic

By Role/Level

Topic	AI Engineer II	Senior	Staff+
LLM Basics	Ch 5, 6	Ch 5, 6	Ch 5 (deep)
RAG	Ch 7 (basics)	Ch 7 (full)	Ch 7 + GraphRAG
Agents	Ch 8 (basics)	Ch 8 (full)	Ch 8 + security
Deployment	Ch 9 (API)	Ch 9 (full)	Ch 9 + Ch 27
Evaluation	Ch 15 (basics)	Ch 15 (full)	Ch 15 + custom
Architecture	—	Ch 25 (read)	Ch 25, 26, 30
Leadership	—	Ch 22, 23	Ch 29

By Topic

Topic	Primary Chapters	Supporting
Building RAG	7	5, 6, 9
Building Agents	8	6, 16
Production ML	9, 15, 27	25, 31
System Design	25, 30	9, 27, 31
Security	16	8, 20
Career Growth	21-24	29
Interview Prep	Part II	5, 7, 9, 25

Tracking Your Progress

Use this tracker as you work through your chosen path:

Chapter Progress:
□ Ch 5:  LLM/NLP Foundations      [___________]
□ Ch 6:  Prompt Engineering       [___________]
□ Ch 7:  RAG Systems              [___________]
□ Ch 8:  Agentic Systems          [___________]
□ Ch 9:  LLM Deployment           [___________]
□ Ch 10: Orchestration Frameworks [___________]
□ Ch 11: Observability/Guardrails [___________]
□ Ch 12: Cloud AI Providers       [___________]
□ Ch 13: Multi-Cloud Patterns     [___________]
□ Ch 14: Backend Engineering      [___________]
□ Ch 15: MLOps & Evaluation       [___________]
□ Ch 16: Security                 [___________]
□ Ch 17: Vision & Document AI     [___________]
□ Ch 18: Audio & Speech           [___________]
□ Ch 19: Video & Multimodal RAG   [___________]
□ Ch 20: Responsible AI           [___________]
□ Ch 21: Technical Expertise      [___________]
□ Ch 22: Project Ownership        [___________]
□ Ch 23: Technical Communication  [___________]
□ Ch 24: Mentorship               [___________]
□ Ch 25: System Design            [___________]
□ Ch 26: Decision Making          [___________]
□ Ch 27: Performance              [___________]
□ Ch 28: Research-to-Production   [___________]
□ Ch 29: Technical Leadership     [___________]
□ Ch 30: Data Architecture        [___________]
□ Ch 31: Reliability              [___________]
□ Ch 32: Cost Engineering         [___________]

Projects Completed:
□ Basic RAG system
□ Agent with tools
□ Evaluation pipeline
□ Production deployment
□ Capstone project

Skills Verified:
□ Can explain core concepts
□ Can implement from scratch
□ Can debug common issues
□ Can design systems
□ Can lead discussions

Part II: Interview Preparation

This section provides guidance for AI/ML engineering interviews at various levels. It covers what to expect, how to prepare, and sample questions with frameworks for answering.

Interview Process Overview

Typical AI Engineering Interview Loop

Typical AI Engineering Interview Loop
Stage	Duration	Focus Areas
Phone Screen	30-60 min	ML basics, experience, motivation
Technical Coding	45-60 min	DS&A, ML-specific problems, code quality
System Design	45-60 min	Architecture, scale concerns, trade-offs
Behavioral & Culture	30-45 min	Past work, leadership, collaboration

Preparation by Level

ML Engineer / AI Engineer (Entry to Mid-Level)

Coding Focus: Strong fundamentals, clean implementation

ML Fundamentals: Know the basics cold - How do common algorithms work (linear regression, decision trees, neural networks)? - What is overfitting and how do you prevent it? - Explain gradient descent - What metrics do you use for classification vs. regression? - How do you handle imbalanced data?

Practical Experience: Demonstrate you can ship ML - Tell me about an ML project you’ve worked on - What challenges did you face? How did you debug them? - How did you evaluate your model?

System Design: Basic ML pipelines - Design a spam classifier - Build a recommendation system for a small app - Create a sentiment analysis API

Senior ML Engineer (4-7 years)

Coding Focus: Efficiency and ML-specific implementations - Implement attention mechanism - Write efficient data pipeline code - Optimize inference bottlenecks

ML Depth: Strong understanding of at least one area - Deep dive on transformers, training dynamics, or retrieval - Trade-offs between different approaches - When to use what technique

System Design: Production-grade systems - Design a real-time recommendation system at scale - Build a RAG system for customer support - Create an LLM-powered search feature

Leadership: Technical decision-making - How do you evaluate build vs. buy? - Tell me about a technical decision you influenced - How do you balance quality and velocity?

Staff Engineer (7+ years)

Technical Strategy: Architecture and direction - How would you design an ML platform? - What’s your approach to technical roadmapping? - How do you evaluate new technologies?

Cross-Functional Leadership: Influence and alignment - Tell me about a time you drove alignment across teams - How do you communicate technical concepts to non-technical stakeholders? - Describe a cross-team initiative you led

System Design: Complex, multi-team systems - Design a company-wide feature platform - Architect an LLM infrastructure for multiple products - Build a real-time ML serving system handling millions of QPS

Technical Judgment: Deep trade-off analysis - When would you not use ML? - How do you decide between approaches? - What’s your framework for technical debt?

Core Technical Areas

Coding Questions

Data Structures & Algorithms (Standard) - Arrays, linked lists, trees, graphs - Sorting, searching, dynamic programming - Time and space complexity

ML-Specific Coding - Implement softmax, cross-entropy loss - Write a simple neural network forward pass - Process and clean a dataset - Implement k-means clustering - Build a simple recommendation algorithm

Sample Questions:

Implement a function to compute cosine similarity between two vectors
Write efficient code to find the k most similar items given embeddings
Implement a simple tokenizer (word-level or BPE-style)
Write a function to compute precision, recall, and F1
Implement stratified train/test split

Framework for Coding Interviews:

1. Clarify requirements (5 min)
   - Input/output format
   - Edge cases
   - Scale expectations

2. Design approach (5 min)
   - Explain your strategy
   - Discuss trade-offs
   - Get interviewer buy-in

3. Implement (25 min)
   - Write clean code
   - Handle edge cases
   - Test as you go

4. Test and optimize (5 min)
   - Walk through examples
   - Discuss optimizations
   - Consider extensions

ML System Design

Framework for ML System Design:

1. Clarify Requirements (5-10 min)
   ├── Business objective
   ├── Scale (users, requests, data)
   ├── Latency requirements
   ├── Data availability
   └── Success metrics

2. High-Level Architecture (10 min)
   ├── Data pipeline
   ├── Feature engineering
   ├── Model selection
   ├── Training infrastructure
   └── Serving system

3. Deep Dives (20-30 min)
   ├── Model choice and trade-offs
   ├── Feature design
   ├── Handling scale
   ├── Evaluation strategy
   └── Failure modes

4. Production Considerations (10 min)
   ├── Monitoring and observability
   ├── A/B testing
   ├── Iterative improvement
   └── Cost considerations

Common ML System Design Questions:

Design a recommendation system for [Netflix/Spotify/E-commerce]
- Candidate generation vs. ranking
- Cold start handling
- Real-time vs. batch
- Evaluation metrics
Design a search ranking system
- Query understanding
- Retrieval (BM25, dense, hybrid)
- Ranking model
- Online/offline evaluation
Design a RAG-based chatbot for customer support
- Document processing and chunking
- Embedding and retrieval
- Generation with guardrails
- Fallback handling
Design a content moderation system
- Multi-class classification
- Handling edge cases
- Human-in-the-loop
- False positive/negative trade-offs
Design a fraud detection system
- Real-time vs. batch
- Feature engineering
- Handling class imbalance
- Concept drift
Design an LLM serving infrastructure
- Model selection and routing
- Caching strategies
- Rate limiting
- Cost optimization

ML Fundamentals

Topics to Know Deeply:

Supervised Learning - Linear/logistic regression - Decision trees, random forests - Gradient boosting (XGBoost, LightGBM) - Neural networks - Bias-variance trade-off - Regularization techniques

Deep Learning - Backpropagation - Activation functions - Batch normalization, layer normalization - Attention mechanism - Transformer architecture - Training dynamics (learning rate, batch size)

NLP/LLM Specific - Tokenization (BPE, WordPiece) - Word embeddings (Word2Vec, FastText) - Transformer architecture in detail - Pre-training objectives (MLM, CLM) - Fine-tuning approaches (full, LoRA, PEFT) - Prompting and in-context learning

Evaluation - Classification metrics (precision, recall, F1, AUC-ROC) - Regression metrics (MSE, MAE, R²) - Ranking metrics (NDCG, MRR, MAP) - A/B testing methodology - Statistical significance

Common Questions:

Explain the attention mechanism. Why does it work?
What’s the difference between batch norm and layer norm?
How does dropout work? Why does it help?
What is the vanishing gradient problem? How do you address it?
Explain the transformer architecture. What are the key components?
What’s the difference between encoder-only, decoder-only, and encoder-decoder?
How does RLHF work? What are alternatives?
What is prompt injection and how do you defend against it?

Behavioral Interview Preparation

Common Behavioral Questions

Past Projects - Tell me about a technically challenging ML project - Describe a time when an ML approach didn’t work. What did you do? - Walk me through how you improved model performance on a difficult problem

Collaboration - Tell me about a time you disagreed with a teammate on a technical approach - How do you work with non-ML engineers? - Describe a cross-functional project you led

Leadership - How do you mentor junior engineers? - Tell me about a time you had to make a difficult technical decision - How do you prioritize when you have competing demands?

Growth - What’s a technical area where you’ve grown significantly? - Tell me about a failure and what you learned - How do you stay current with ML developments?

STAR Framework

Structure behavioral answers using STAR:

Situation: Brief context (2-3 sentences)
Task: Your specific responsibility
Action: What YOU did (most important part)
Result: Quantified outcome and learnings

Example:

Question: Tell me about a time an ML project failed

Situation: “We were building a recommendation system for a new feature, and three weeks before launch, we realized the model wasn’t performing well on our key metric.”

Task: “As the ML lead, I was responsible for diagnosing the issue and getting us back on track for launch.”

Action: “I did three things: First, I analyzed our evaluation data and found we had significant data leakage in our offline metrics—our offline numbers looked great but didn’t reflect real performance. Second, I designed a quick A/B test to validate, which confirmed the issue. Third, I proposed a simpler baseline model that we could ship on time, with plans to iterate.”

Result: “We launched with the simpler model, which still improved our metric by 8% over the control. The experience led me to establish better evaluation practices for the team, including mandatory online/offline correlation checks. The follow-up version, with proper evaluation, achieved 15% improvement.”

Sample Interview Questions by Topic

LLM and Prompting

How would you improve a prompt that’s giving inconsistent results?
What’s the difference between zero-shot, few-shot, and fine-tuning?
How would you evaluate an LLM-based feature?
Explain chain-of-thought prompting. When would you use it?
How do you handle hallucinations in production?

RAG Systems

Walk me through designing a RAG system end-to-end
How do you evaluate retrieval quality?
When would you use dense vs. sparse retrieval?
How do you handle documents that are too long for context?
What are failure modes of RAG and how do you address them?

Training and Fine-Tuning

When would you fine-tune vs. prompt an LLM?
Explain LoRA. Why does it work?
How do you prevent catastrophic forgetting during fine-tuning?
What’s the difference between SFT, RLHF, and DPO?
How do you decide on training data quantity and quality?

Infrastructure and Scale

How would you reduce inference latency for an LLM service?
Explain the trade-offs of different quantization approaches
How would you handle 10x traffic increase?
Design a caching strategy for an LLM application
How do you monitor an ML system in production?

Data and Features

How do you handle training-serving skew?
What is a feature store and when would you use one?
How do you detect and handle data drift?
Explain the importance of point-in-time correctness
How do you version ML data and models?

Interview Day Tips

Before the Interview

Review the company’s ML products and published work
Refresh fundamentals (don’t assume you remember)
Prepare 2-3 projects to discuss in depth
Practice explaining technical concepts aloud
Get good sleep

During the Interview

For Coding:

Think aloud—silence is concerning
Start with brute force, then optimize
Test your code before declaring done
Ask clarifying questions

For System Design:

Drive the conversation—don’t wait for prompts
Draw diagrams (even in virtual interviews)
Discuss trade-offs explicitly
Connect to business requirements

For Behavioral:

Be specific—vague answers don’t score well
Focus on YOUR contributions
Include quantified results when possible
Be honest about failures—they want self-awareness

Questions to Ask Interviewers

Good questions demonstrate genuine interest:

What ML infrastructure do you use?
How do you evaluate ML model success?
What’s the biggest technical challenge the team faces?
How do you balance research and production?
What does the on-call look like for ML systems?

Resources for Preparation

Books

“Designing Machine Learning Systems” (Chip Huyen)
“Machine Learning Engineering” (Andriy Burkov)
“Building Machine Learning Powered Applications” (Emmanuel Ameisen)

Courses

Stanford CS229 (ML fundamentals)
Stanford CS224N (NLP)
fast.ai (Practical deep learning)

Practice

LeetCode for coding
ML system design practice with peers
Mock interviews (Pramp, interviewing.io)

Stay Current

Papers With Code
ML blogs (OpenAI, Anthropic, Google AI)
r/MachineLearning

Detailed Answer Examples

ML System Design Example: Design a Recommendation System

Here’s how to structure your answer for a recommendation system design question:

Opening (Clarify Requirements): “Before I dive in, let me make sure I understand the requirements. Is this for a new feature or an existing system? What’s our scale in terms of users and items? What’s the latency budget for serving recommendations? Do we have historical user data already?”

Assume answers: 10M users, 1M items, 100ms latency, we have 6 months of click/purchase data

High-Level Architecture: “I’d structure this as a two-stage system: candidate generation to narrow from 1M items to ~1000, then a ranking model to select the top 10-20.

For candidate generation:

Item-based collaborative filtering for items similar to user’s history
User embedding similarity to find items liked by similar users
Popularity baseline to ensure coverage

For ranking:

Features: user features, item features, cross features (user-item interactions)
Model: Gradient boosted trees initially, can upgrade to neural ranking later
Training: Point-wise learning on click/purchase labels

Serving architecture:

Pre-computed candidate sets refreshed hourly
Online ranking with cached user/item features
Feature store for real-time features (recent clicks)”

Deep Dives (interviewer may ask about specific areas):

“How do you handle cold start?” “For new users: popularity-based recommendations, quickly incorporating their first few interactions. For new items: content-based features (category, description embedding), explore/exploit to get initial signals.”

“How do you evaluate?” “Offline: We’d look at precision@k, recall@k, NDCG. But offline metrics can be misleading, so we’d run A/B tests measuring click-through rate and ultimately conversion. We’d also track diversity and novelty metrics to ensure we’re not just showing popular items.”

“What about serving latency?” “The 100ms budget is tight. I’d pre-compute as much as possible: user embeddings daily, candidate sets hourly. The ranking model would be a lightweight scorer. We’d use caching for frequently requested user recommendations. If needed, we can trade off ranking model complexity for speed.”

Behavioral Answer Example: Technical Disagreement

Question: “Tell me about a time you disagreed with a teammate on a technical approach.”

Using STAR:

Situation: “On my last team, we were building an LLM-powered search feature. The senior engineer on the project wanted to fine-tune a model specifically for our domain, while I believed a RAG approach with a general model would be better.”

Task: “I needed to advocate for my approach while maintaining a collaborative relationship and being open to being wrong.”

Action: “First, I made sure I deeply understood their reasoning—they were concerned about domain-specific vocabulary that a general model might not handle well. That was a valid concern.

I then proposed we run a quick experiment: two days to prototype both approaches on a subset of queries. I wrote up clear evaluation criteria before we ran the experiment so we’d have objective measures.

The results showed RAG actually performed better on most queries, but fine-tuning won on queries with technical jargon. So we ended up with a hybrid: RAG for most queries with a fine-tuned reranker for technical terms.

Throughout, I made sure to frame it as ‘finding the best approach’ rather than ‘proving I was right.’ When the hybrid emerged, we both felt ownership.”

Result: “The hybrid approach improved our search relevance by 20% over the baseline. More importantly, we established a pattern of data-driven decision making that the team used for subsequent technical debates. The other engineer and I became close collaborators afterward.”

Coding Answer Example: Implement Cosine Similarity

Question: “Implement a function to compute cosine similarity between two vectors.”

Initial Clarification: “Are these dense vectors? Any constraints on length? Should I handle edge cases like zero vectors?”

Solution:

import math
from typing import List

def cosine_similarity(vec1: List[float], vec2: List[float]) -> float:
    """
    Compute cosine similarity between two vectors.

    Args:
        vec1: First vector
        vec2: Second vector (must be same length)

    Returns:
        Cosine similarity in range [-1, 1]

    Raises:
        ValueError: If vectors have different lengths or are empty
    """
    if len(vec1) != len(vec2):
        raise ValueError("Vectors must have same length")

    if len(vec1) == 0:
        raise ValueError("Vectors cannot be empty")

    # Compute dot product and magnitudes
    dot_product = sum(a * b for a, b in zip(vec1, vec2))

    magnitude1 = math.sqrt(sum(a * a for a in vec1))
    magnitude2 = math.sqrt(sum(b * b for b in vec2))

    # Handle zero vectors
    if magnitude1 == 0 or magnitude2 == 0:
        return 0.0  # Or could raise error depending on requirements

    return dot_product / (magnitude1 * magnitude2)


# Test cases
def test_cosine_similarity():
    # Identical vectors should have similarity 1
    assert abs(cosine_similarity([1, 0], [1, 0]) - 1.0) < 1e-9

    # Orthogonal vectors should have similarity 0
    assert abs(cosine_similarity([1, 0], [0, 1]) - 0.0) < 1e-9

    # Opposite vectors should have similarity -1
    assert abs(cosine_similarity([1, 0], [-1, 0]) - (-1.0)) < 1e-9

    # General case
    assert abs(cosine_similarity([1, 2, 3], [4, 5, 6]) - 0.9746) < 0.001

    print("All tests passed!")

test_cosine_similarity()

Optimization Discussion: “If we’re computing many similarities, we’d want to use NumPy for vectorization. We could also pre-compute magnitudes if vectors are reused. For very high-dimensional vectors at scale, we’d consider approximate methods like LSH.”

Common Mistakes to Avoid

In Coding Interviews

Starting to code too quickly: Spend time understanding and planning
Not testing: Always walk through examples before saying “done”
Silence: Keep talking about your thought process
Ignoring edge cases: Empty inputs, negative numbers, etc.
Over-engineering: Start simple, optimize if asked

In System Design

Not clarifying requirements: Don’t assume—ask
Jumping to details: Start with high-level architecture
Not discussing trade-offs: Every choice has pros and cons
Forgetting operations: Monitoring, debugging, iteration
Ignoring scale: Think about 10x and 100x scenarios

In ML Fundamentals

Memorizing without understanding: Can’t handle follow-up questions
Not connecting to practice: Know when to use what
Overcomplicating: Simple explanations are better
Ignoring limitations: Every technique has failure modes

In Behavioral

Being too vague: Specific stories are convincing
Forgetting the result: STAR requires outcomes
Taking all credit: “We” is OK, but explain YOUR role
Only successes: Failure stories show self-awareness

Mock Interview Script

Practice with a friend using this structure:

Round 1: Coding (45 min) - 5 min: Introductions - 35 min: Coding problem (e.g., “Implement k-nearest neighbors”) - 5 min: Questions

Round 2: ML Design (45 min) - 5 min: Introductions - 35 min: Design problem (e.g., “Design a content recommendation system”) - 5 min: Questions

Round 3: ML Depth (30 min) - 5 min: Introductions - 20 min: Deep technical discussion (e.g., “Explain transformers in detail”) - 5 min: Questions

Round 4: Behavioral (30 min) - 5 min: Introductions - 20 min: Behavioral questions - 5 min: Questions

Feedback Session (15 min) - What went well - Areas for improvement - Specific suggestions

Week-by-Week Preparation Plan

4 Weeks Before Interview

Week 1: Foundations - Review ML fundamentals (algorithms, evaluation) - Start coding practice (2 problems/day) - List your top 3-4 projects with details

Week 2: System Design - Study ML system design patterns - Practice 2-3 design problems - Review company’s ML products/papers

Week 3: Deep Dives - Pick 1-2 areas to go deep - Read relevant papers - Prepare to explain at multiple levels

Week 4: Polish - Mock interviews (at least 2) - Behavioral story preparation - Rest and mental preparation

Day Before

Review your project stories
Light review of fundamentals (don’t cram)
Prepare questions for interviewers
Lay out interview logistics
Get good sleep

Day Of

Eat well, stay hydrated
Arrive/log in early
Have paper/pen ready
Deep breaths between rounds
Stay positive regardless of how rounds feel

Final Advice

Be Yourself: Authentic enthusiasm for ML comes through. If you’re genuinely interested in the field, let that show.

Admit Uncertainty: “I don’t know, but here’s how I’d approach finding out” is better than pretending to know.

Show Growth Mindset: Talk about what you’ve learned, not just what you know.

Practice: Interview skills improve with practice. Do mock interviews.

Learn from Rejections: Each interview is learning. Ask for feedback when possible.

Good luck with your interviews!

Company-Specific Preparation

Big Tech (Google, Meta, Amazon, Microsoft)

Focus Areas:

Heavy emphasis on coding (expect 2+ rounds)
System design at scale (billions of users)
Leadership principles (especially Amazon)
Past project deep dives

Preparation Tips:

Practice with LeetCode medium/hard problems
Study their published research papers
Understand their ML infrastructure (TensorFlow, PyTorch, internal tools)
Prepare stories that map to leadership principles

AI-First Companies (OpenAI, Anthropic, Cohere)

Focus Areas:

Deep technical understanding of LLMs
Research awareness and curiosity
Safety and alignment considerations
Novel problem-solving ability

Preparation Tips:

Read their recent papers and blog posts
Understand current LLM limitations
Have opinions on open research questions
Think about safety implications

ML-Heavy Startups

Focus Areas:

End-to-end ownership
Scrappiness and velocity
Breadth across ML stack
Business impact awareness

Preparation Tips:

Understand their product and ML use cases
Prepare for “wear many hats” scenarios
Emphasize shipping and iteration
Show flexibility in approaches

Traditional Tech with ML Teams

Focus Areas:

Production ML experience
Working with non-ML teams
Integrating ML into existing systems
Business metric focus

Preparation Tips:

Understand their core business
Prepare for less ML-specific coding
Emphasize collaboration and communication
Focus on practical impact over research novelty

Handling Difficult Interview Situations

When You Don’t Know the Answer

Good response: “I’m not sure about that specific detail, but here’s how I’d approach figuring it out…” or “I haven’t worked with that directly, but based on my understanding of similar systems…”

Why it works: Shows intellectual honesty and problem-solving approach

When You Make a Mistake

Good response: “Actually, let me reconsider that—I think I made an error. The correct approach is…”

Why it works: Self-correction shows strong fundamentals and self-awareness

When the Problem is Unclear

Good response: “Let me make sure I understand the requirements. Are we optimizing for X or Y? What constraints do we have?”

Why it works: Clarification is expected and shows good engineering practice

When You’re Stuck

Good response: “I’m thinking about a few different approaches. Let me talk through what I’m considering…” Then explain your thought process.

Why it works: Interviewers want to see how you think, not just the answer

When the Interviewer Seems Unengaged

Good response: Stay focused and professional. Ask if they have any questions or want you to go deeper on any area.

Why it works: Some interviewers are naturally reserved; don’t let it affect your performance

Post-Interview

After Each Round

Take brief notes on questions asked
Note any areas where you felt weak
Don’t obsess—move on to the next round

After the Interview Day

Send thank you notes (optional but good practice)
Reflect on what went well and what to improve
Don’t second-guess your performance excessively

If You Get an Offer

Ask for detailed leveling and compensation breakdown
Negotiate thoughtfully (it’s expected)
Ask about team matching and projects

If You Get Rejected

Ask for feedback (may or may not be provided)
Reflect honestly on areas to improve
Many successful candidates were rejected before
Most companies allow re-applying after 6-12 months

Additional Practice Questions

LLM Deep Dive Questions

Walk me through the transformer architecture component by component.
How does self-attention differ from cross-attention?
Explain the difference between pre-training and fine-tuning objectives.
What are the key challenges in training large language models?
How would you debug a model that’s producing inconsistent outputs?
Explain RLHF at a technical level. What are alternatives?
What’s the relationship between perplexity and quality?
How do you think about the trade-off between model size and cost?

Production ML Questions

How do you ensure your offline metrics correlate with online performance?
Describe your approach to ML model monitoring.
How do you handle model drift in production?
What’s your framework for deciding when to retrain a model?
How do you debug a model that’s performing poorly in production?
Describe a production ML system you’ve built or maintained.
How do you balance technical debt and feature velocity in ML systems?
What are the most common causes of ML system failures you’ve seen?

Leadership and Strategy Questions (for Senior+)

How do you evaluate whether to use ML for a problem?
Describe how you’ve influenced technical direction outside your team.
How do you balance investment in ML infrastructure vs. applications?
Tell me about a time you had to make a difficult technical trade-off.
How do you stay current with ML developments while being productive?
What’s your approach to building high-performing ML teams?
How do you communicate ML capabilities and limitations to stakeholders?
What do you see as the most important trends in AI engineering?

# Appendix D: Career Development {.unnumbered} This appendix combines two essential resources for your AI engineering career: **Learning Paths** to guide your study through the textbook based on your background and goals, and **Interview Preparation** to help you succeed in AI/ML engineering interviews at all levels. --- # Part I: Learning Paths This section provides guided reading orders through the textbook based on your background and goals. Each path includes the recommended sequence, time estimates, and key focus areas. --- ## How to Use These Paths 1. **Choose your path** based on current role and goal 2. **Follow the sequence** - chapters build on each other 3. **Do the exercises** - reading alone isn't enough 4. **Build something** - apply knowledge to a real project 5. **Revisit as needed** - use as reference later --- ## Path 1: New to AI/ML (SWE Background) **For**: Software engineers with no ML experience **Time**: 4-6 weeks (10-15 hours/week) **Goal**: Foundational understanding to contribute to AI projects ### Week 1: Understanding LLMs ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ WEEK 1: FOUNDATIONS │ └─────────────────────────────────────────────────────────────────────────────┘ Day 1-2: Chapter 5 - LLM/NLP Foundations Focus: Tokenization, embeddings, attention mechanism Skip: Mathematical details (return later if needed) Day 3-4: Chapter 6 - Prompt Engineering Focus: Structured outputs, few-shot learning Exercise: Try 10 different prompting patterns on a real task Day 5: Appendix A - Glossary Review all terms, note unfamiliar ones ``` **Key Concepts to Master**: - How tokenization works - What embeddings represent - Why attention matters - Basic prompt patterns ### Week 2: Building RAG Systems ``` Day 1-3: Chapter 7 - RAG Systems Deep Dive Focus: Chunking, embeddings, basic retrieval Skip: GraphRAG, advanced patterns (come back in Week 5) Day 4-5: Hands-on Practice Build a simple RAG system with: - 100 documents - Basic chunking - Vector search - Simple prompt template ``` **Key Concepts to Master**: - Chunking strategies - Vector similarity search - Context assembly ### Week 3: Deployment Basics ``` Day 1-2: Chapter 9 - LLM Deployment Focus: API integration, basic serving concepts Skip: GPU optimization (defer to later) Day 3-4: Chapter 14 - Backend Engineering for AI Focus: Testing, debugging LLM apps Exercise: Write tests for your Week 2 RAG system Day 5: Review and consolidate Complete practical exercises ``` ### Week 4: Evaluation and Quality ``` Day 1-3: Chapter 15 - MLOps & Evaluation Focus: Basic metrics, simple evaluation pipelines Skip: Complex A/B testing (return when needed) Day 4-5: Build an evaluation pipeline for your RAG system ``` ### Week 5: Expanding Knowledge ``` Day 1-2: Chapter 8 - Agentic Systems Focus: Tool use, basic agent patterns Day 3-4: Chapter 16 - Security Focus: Prompt injection basics Day 5: Part II of this appendix (Interview Prep - AI Engineer II section) ``` ### Week 6: Integration Project Build a complete application using what you've learned: - Document Q&A system with evaluation - Include basic security measures - Write comprehensive tests ### Path 1 Completion Checklist ``` □ Can explain how an LLM generates text □ Can write effective prompts for different tasks □ Can build a basic RAG system □ Can evaluate RAG quality □ Understand basic security concerns □ Built at least one working application ``` --- ## Path 2: Backend Engineer Adding AI **For**: Experienced backend engineers integrating AI into existing systems **Time**: 3-4 weeks (15-20 hours/week) **Goal**: Productionize AI features in backend services ### Week 1: Practical LLM Integration ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ WEEK 1: INTEGRATION PATTERNS │ └─────────────────────────────────────────────────────────────────────────────┘ Day 1: Chapter 6 - Prompt Engineering Focus: Structured outputs, caching strategies Day 2-3: Chapter 9 - LLM Deployment Focus: API integration, rate limiting, error handling Exercise: Build a robust LLM client with retries Day 4-5: Chapter 14 - Backend Engineering for AI Focus: Integration patterns, testing strategies ``` **Key Patterns to Implement**: - Retry with exponential backoff - Response caching - Structured output parsing - Graceful degradation ### Week 2: RAG for Production ``` Day 1-2: Chapter 7 - RAG Systems (focus on production sections) Focus: Vector databases, scaling, caching Day 3-4: Chapter 31 - Reliability Engineering Focus: SLOs for AI, graceful degradation Day 5: Design a production RAG architecture Include: caching, monitoring, fallbacks ``` ### Week 3: Security and Scale ``` Day 1-2: Chapter 16 - Security Focus: Defense-in-depth, input validation Exercise: Implement security layers Day 3-4: Chapter 25 - System Design at Scale Focus: AI-specific scaling patterns Day 5: Chapter 32 - Cost Engineering Focus: Cost optimization, attribution ``` ### Week 4: Agentic Systems ``` Day 1-3: Chapter 8 - Agentic Systems Focus: Tool use, safety constraints Exercise: Build a tool-using agent Day 4-5: Integration project Add AI feature to an existing backend service ``` ### Path 2 Completion Checklist ``` □ Can integrate LLM APIs with proper error handling □ Can design production RAG architecture □ Can implement AI security measures □ Understand cost implications of AI features □ Can build and deploy tool-using agents □ Integrated AI into at least one backend service ``` --- ## Path 3: Senior to Staff Transition **For**: Senior engineers preparing for Staff-level AI responsibilities **Time**: 6-8 weeks (15-20 hours/week) **Goal**: Lead AI system design and cross-team initiatives ### Phase 1: Technical Depth (Weeks 1-3) ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ PHASE 1: DEEP TECHNICAL KNOWLEDGE │ └─────────────────────────────────────────────────────────────────────────────┘ Week 1: - Chapter 5: LLM Foundations (deep read, including math) - Chapter 27: Performance Engineering (full chapter) - Appendix C: Read 3-5 foundational papers Week 2: - Chapter 7: RAG Systems (including GraphRAG, advanced patterns) - Chapter 30: Data Architecture for AI - Exercise: Design a feature store Week 3: - Chapter 8: Agentic Systems (full depth) - Chapter 16: Security (including agentic security) - Chapter 9: Deployment (optimization focus) ``` ### Phase 2: Architecture and Scale (Weeks 4-5) ``` Week 4: - Chapter 25: System Design at Scale - Chapter 31: Reliability Engineering - Exercise: Design a system handling 1M requests/day Week 5: - Chapter 26: Technical Decision Making - Chapter 32: Cost Engineering - Appendix G: Study ADR examples - Exercise: Write 3 ADRs for hypothetical decisions ``` ### Phase 3: Leadership Skills (Weeks 6-7) ``` Week 6: - Chapter 21: Deepening Technical Expertise - Chapter 28: Research-to-Production - Chapter 29: Cross-Team Technical Leadership - Appendix C: Identify 2-3 papers to implement Week 7: - Chapter 23: Technical Communication - Chapter 24: Mentorship Foundations - Exercise: Write a design doc for an AI system - Exercise: Create a learning plan for a junior engineer ``` ### Phase 4: Capstone (Week 8) ``` - Review Part II of this appendix (Interview Prep - Staff level) - Complete one capstone project from Appendix E - Present design to peers for feedback ``` ### Path 3 Key Deliverables ``` □ 3 ADRs written for AI decisions □ 1 system design document □ 1 design doc reviewed by peers □ 2-3 papers summarized with implementation notes □ 1 capstone project completed □ Can explain tradeoffs in major AI architecture decisions ``` --- ## Path 4: Interview Preparation (2-Week Intensive) **For**: Engineers preparing for AI/ML engineering interviews **Time**: 2 weeks (25-30 hours/week) **Goal**: Pass technical interviews for AI engineering roles ### Week 1: Core Knowledge ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ WEEK 1: FUNDAMENTALS │ └─────────────────────────────────────────────────────────────────────────────┘ Day 1: Morning: Chapter 5 - LLM Foundations (3 hours) Afternoon: Practice explaining transformers, attention Day 2: Morning: Chapter 7 - RAG Systems (3 hours) Afternoon: Implement a basic RAG system from scratch Day 3: Morning: Chapter 9 - Deployment (2 hours) Afternoon: Chapter 27 - Performance (2 hours, focus on inference) Day 4: Morning: Chapter 15 - Evaluation (2 hours) Afternoon: Practice evaluation questions Day 5: Morning: Chapter 8 - Agentic Systems (3 hours) Afternoon: Practice system design scenarios ``` ### Week 2: System Design + Practice ``` Day 1: Morning: Chapter 25 - System Design at Scale Afternoon: Practice design: "Design a RAG system for 10M documents" Day 2: Morning: Chapter 16 - Security Afternoon: Practice design: "Add AI to an existing product" Day 3: Morning: Chapter 32 - Cost Engineering Afternoon: Practice: Cost estimation exercises Day 4: Full day: Part II of this appendix (Interview Preparation) Do all practice questions Day 5: Mock interviews with peers Review weak areas ``` ### Interview Topics Checklist ``` Fundamentals: □ Explain transformers and attention □ Compare embedding models □ Describe tokenization approaches RAG: □ Design a RAG pipeline □ Discuss chunking tradeoffs □ Explain hybrid search □ Handle evaluation Deployment: □ Compare vLLM vs TGI □ Explain batching strategies □ Discuss quantization tradeoffs System Design: □ Design AI search system □ Design AI chatbot at scale □ Design agent with tools □ Handle failure scenarios Evaluation: □ Design evaluation pipeline □ Discuss LLM-as-judge □ Explain A/B testing for AI Security: □ Explain prompt injection □ Design defense-in-depth □ Agent safety patterns ``` --- ## Path 5: ML Engineer Deepening LLM Knowledge **For**: ML engineers with classical ML background, new to LLMs **Time**: 3-4 weeks (15-20 hours/week) **Goal**: Bridge from traditional ML to LLM engineering ### Week 1: LLM-Specific Concepts ``` Day 1-2: Chapter 5 - LLM Foundations Compare to your existing ML knowledge Focus: What's different from traditional ML? Day 3-4: Chapter 6 - Prompt Engineering This is new - no training loop equivalent Day 5: Chapter 14 - Backend Engineering Focus: Fine-tuning section Compare: How is this different from traditional training? ``` ### Week 2: Retrieval and Data ``` Day 1-2: Chapter 7 - RAG Systems Your embedding/similarity search knowledge transfers! Day 3-4: Chapter 30 - Data Architecture Focus: Training-serving skew (familiar concept, LLM specifics) Day 5: Chapter 15 - MLOps & Evaluation Focus: LLM-specific evaluation challenges ``` ### Week 3: Production Systems ``` Day 1-2: Chapter 9 - Deployment Focus: How LLM serving differs from traditional model serving Day 3-4: Chapter 27 - Performance Engineering Your GPU knowledge helps! Focus on LLM-specific optimizations Day 5: Chapter 8 - Agentic Systems New paradigm - models as reasoning engines ``` ### Week 4: Advanced Topics ``` Day 1-2: Chapter 25 - System Design Apply your architecture knowledge to LLM systems Day 3-4: Chapter 16 - Security New attack vectors specific to LLMs Day 5: Review and gap-fill ``` ### Key Differences to Understand | Traditional ML | LLM Engineering | |----------------|-----------------| | Training is central | Prompting is central | | Feature engineering | Context engineering | | Model versioning | Prompt versioning | | Batch inference common | Real-time generation | | Training data curation | RAG corpus curation | | Metrics well-defined | Metrics often subjective | --- ## Path 6: Tech Lead Establishing AI Practice **For**: Tech leads setting up AI capabilities for their team **Time**: 4 weeks (10-15 hours/week) **Goal**: Establish team processes and make sound technical decisions ### Week 1: Foundation Knowledge ``` Day 1: Chapter 5 - LLM Foundations (skim for concepts) Day 2: Chapter 6 - Prompt Engineering (practical focus) Day 3: Chapter 7 - RAG Systems (architecture focus) Day 4: Chapter 9 - Deployment (infrastructure decisions) Day 5: Appendix B - Tools overview ``` ### Week 2: Making Decisions ``` Day 1-2: Chapter 26 - Technical Decision Making Study ADR process Day 3: Appendix G - Architecture Decision Records Review all example ADRs Day 4-5: Write ADRs for your team's AI decisions: - Which LLM to use - Self-hosted vs API - Vector database selection ``` ### Week 3: Process and Quality ``` Day 1-2: Chapter 15 - MLOps & Evaluation Focus: What processes does your team need? Day 3: Chapter 16 - Security Focus: What policies to establish? Day 4-5: Chapter 20 - Responsible AI Focus: Governance processes ``` ### Week 4: Team Development ``` Day 1: Chapter 24 - Mentorship How to grow AI skills on your team Day 2: Chapter 23 - Technical Communication Standards for design docs Day 3: This appendix (Learning Paths section) Create learning paths for team members Day 4-5: Create team documentation: - AI coding standards - Review checklist - Learning resources ``` ### Deliverables for Your Team ``` □ Technology decisions documented (ADRs) □ Security guidelines established □ Evaluation standards defined □ Learning paths for team members □ Design doc template for AI features □ Code review checklist for AI code ``` --- ## Quick Reference: Chapter by Topic ### By Role/Level | Topic | AI Engineer II | Senior | Staff+ | |-------|-------------|--------|--------| | LLM Basics | Ch 5, 6 | Ch 5, 6 | Ch 5 (deep) | | RAG | Ch 7 (basics) | Ch 7 (full) | Ch 7 + GraphRAG | | Agents | Ch 8 (basics) | Ch 8 (full) | Ch 8 + security | | Deployment | Ch 9 (API) | Ch 9 (full) | Ch 9 + Ch 27 | | Evaluation | Ch 15 (basics) | Ch 15 (full) | Ch 15 + custom | | Architecture | — | Ch 25 (read) | Ch 25, 26, 30 | | Leadership | — | Ch 22, 23 | Ch 29 | ### By Topic | Topic | Primary Chapters | Supporting | |-------|-----------------|------------| | Building RAG | 7 | 5, 6, 9 | | Building Agents | 8 | 6, 16 | | Production ML | 9, 15, 27 | 25, 31 | | System Design | 25, 30 | 9, 27, 31 | | Security | 16 | 8, 20 | | Career Growth | 21-24 | 29 | | Interview Prep | Part II | 5, 7, 9, 25 | --- ## Tracking Your Progress Use this tracker as you work through your chosen path: ``` Chapter Progress: □ Ch 5: LLM/NLP Foundations [___________] □ Ch 6: Prompt Engineering [___________] □ Ch 7: RAG Systems [___________] □ Ch 8: Agentic Systems [___________] □ Ch 9: LLM Deployment [___________] □ Ch 10: Orchestration Frameworks [___________] □ Ch 11: Observability/Guardrails [___________] □ Ch 12: Cloud AI Providers [___________] □ Ch 13: Multi-Cloud Patterns [___________] □ Ch 14: Backend Engineering [___________] □ Ch 15: MLOps & Evaluation [___________] □ Ch 16: Security [___________] □ Ch 17: Vision & Document AI [___________] □ Ch 18: Audio & Speech [___________] □ Ch 19: Video & Multimodal RAG [___________] □ Ch 20: Responsible AI [___________] □ Ch 21: Technical Expertise [___________] □ Ch 22: Project Ownership [___________] □ Ch 23: Technical Communication [___________] □ Ch 24: Mentorship [___________] □ Ch 25: System Design [___________] □ Ch 26: Decision Making [___________] □ Ch 27: Performance [___________] □ Ch 28: Research-to-Production [___________] □ Ch 29: Technical Leadership [___________] □ Ch 30: Data Architecture [___________] □ Ch 31: Reliability [___________] □ Ch 32: Cost Engineering [___________] Projects Completed: □ Basic RAG system □ Agent with tools □ Evaluation pipeline □ Production deployment □ Capstone project Skills Verified: □ Can explain core concepts □ Can implement from scratch □ Can debug common issues □ Can design systems □ Can lead discussions ``` --- # Part II: Interview Preparation This section provides guidance for AI/ML engineering interviews at various levels. It covers what to expect, how to prepare, and sample questions with frameworks for answering. --- ## Interview Process Overview ### Typical AI Engineering Interview Loop | Stage | Duration | Focus Areas | |-------|----------|-------------| | **Phone Screen** | 30-60 min | ML basics, experience, motivation | | **Technical Coding** | 45-60 min | DS&A, ML-specific problems, code quality | | **System Design** | 45-60 min | Architecture, scale concerns, trade-offs | | **Behavioral & Culture** | 30-45 min | Past work, leadership, collaboration | : Typical AI Engineering Interview Loop --- ## Preparation by Level ### ML Engineer / AI Engineer (Entry to Mid-Level) **Coding Focus**: Strong fundamentals, clean implementation **ML Fundamentals**: Know the basics cold - How do common algorithms work (linear regression, decision trees, neural networks)? - What is overfitting and how do you prevent it? - Explain gradient descent - What metrics do you use for classification vs. regression? - How do you handle imbalanced data? **Practical Experience**: Demonstrate you can ship ML - Tell me about an ML project you've worked on - What challenges did you face? How did you debug them? - How did you evaluate your model? **System Design**: Basic ML pipelines - Design a spam classifier - Build a recommendation system for a small app - Create a sentiment analysis API ### Senior ML Engineer (4-7 years) **Coding Focus**: Efficiency and ML-specific implementations - Implement attention mechanism - Write efficient data pipeline code - Optimize inference bottlenecks **ML Depth**: Strong understanding of at least one area - Deep dive on transformers, training dynamics, or retrieval - Trade-offs between different approaches - When to use what technique **System Design**: Production-grade systems - Design a real-time recommendation system at scale - Build a RAG system for customer support - Create an LLM-powered search feature **Leadership**: Technical decision-making - How do you evaluate build vs. buy? - Tell me about a technical decision you influenced - How do you balance quality and velocity? ### Staff Engineer (7+ years) **Technical Strategy**: Architecture and direction - How would you design an ML platform? - What's your approach to technical roadmapping? - How do you evaluate new technologies? **Cross-Functional Leadership**: Influence and alignment - Tell me about a time you drove alignment across teams - How do you communicate technical concepts to non-technical stakeholders? - Describe a cross-team initiative you led **System Design**: Complex, multi-team systems - Design a company-wide feature platform - Architect an LLM infrastructure for multiple products - Build a real-time ML serving system handling millions of QPS **Technical Judgment**: Deep trade-off analysis - When would you not use ML? - How do you decide between approaches? - What's your framework for technical debt? --- ## Core Technical Areas ### Coding Questions **Data Structures & Algorithms** (Standard) - Arrays, linked lists, trees, graphs - Sorting, searching, dynamic programming - Time and space complexity **ML-Specific Coding** - Implement softmax, cross-entropy loss - Write a simple neural network forward pass - Process and clean a dataset - Implement k-means clustering - Build a simple recommendation algorithm **Sample Questions**: 1. Implement a function to compute cosine similarity between two vectors 2. Write efficient code to find the k most similar items given embeddings 3. Implement a simple tokenizer (word-level or BPE-style) 4. Write a function to compute precision, recall, and F1 5. Implement stratified train/test split **Framework for Coding Interviews**: ``` 1. Clarify requirements (5 min) - Input/output format - Edge cases - Scale expectations 2. Design approach (5 min) - Explain your strategy - Discuss trade-offs - Get interviewer buy-in 3. Implement (25 min) - Write clean code - Handle edge cases - Test as you go 4. Test and optimize (5 min) - Walk through examples - Discuss optimizations - Consider extensions ``` ### ML System Design **Framework for ML System Design**: ``` 1. Clarify Requirements (5-10 min) ├── Business objective ├── Scale (users, requests, data) ├── Latency requirements ├── Data availability └── Success metrics 2. High-Level Architecture (10 min) ├── Data pipeline ├── Feature engineering ├── Model selection ├── Training infrastructure └── Serving system 3. Deep Dives (20-30 min) ├── Model choice and trade-offs ├── Feature design ├── Handling scale ├── Evaluation strategy └── Failure modes 4. Production Considerations (10 min) ├── Monitoring and observability ├── A/B testing ├── Iterative improvement └── Cost considerations ``` **Common ML System Design Questions**: 1. **Design a recommendation system** for [Netflix/Spotify/E-commerce] - Candidate generation vs. ranking - Cold start handling - Real-time vs. batch - Evaluation metrics 2. **Design a search ranking system** - Query understanding - Retrieval (BM25, dense, hybrid) - Ranking model - Online/offline evaluation 3. **Design a RAG-based chatbot** for customer support - Document processing and chunking - Embedding and retrieval - Generation with guardrails - Fallback handling 4. **Design a content moderation system** - Multi-class classification - Handling edge cases - Human-in-the-loop - False positive/negative trade-offs 5. **Design a fraud detection system** - Real-time vs. batch - Feature engineering - Handling class imbalance - Concept drift 6. **Design an LLM serving infrastructure** - Model selection and routing - Caching strategies - Rate limiting - Cost optimization ### ML Fundamentals **Topics to Know Deeply**: **Supervised Learning** - Linear/logistic regression - Decision trees, random forests - Gradient boosting (XGBoost, LightGBM) - Neural networks - Bias-variance trade-off - Regularization techniques **Deep Learning** - Backpropagation - Activation functions - Batch normalization, layer normalization - Attention mechanism - Transformer architecture - Training dynamics (learning rate, batch size) **NLP/LLM Specific** - Tokenization (BPE, WordPiece) - Word embeddings (Word2Vec, FastText) - Transformer architecture in detail - Pre-training objectives (MLM, CLM) - Fine-tuning approaches (full, LoRA, PEFT) - Prompting and in-context learning **Evaluation** - Classification metrics (precision, recall, F1, AUC-ROC) - Regression metrics (MSE, MAE, R²) - Ranking metrics (NDCG, MRR, MAP) - A/B testing methodology - Statistical significance **Common Questions**: 1. Explain the attention mechanism. Why does it work? 2. What's the difference between batch norm and layer norm? 3. How does dropout work? Why does it help? 4. What is the vanishing gradient problem? How do you address it? 5. Explain the transformer architecture. What are the key components? 6. What's the difference between encoder-only, decoder-only, and encoder-decoder? 7. How does RLHF work? What are alternatives? 8. What is prompt injection and how do you defend against it? --- ## Behavioral Interview Preparation ### Common Behavioral Questions **Past Projects** - Tell me about a technically challenging ML project - Describe a time when an ML approach didn't work. What did you do? - Walk me through how you improved model performance on a difficult problem **Collaboration** - Tell me about a time you disagreed with a teammate on a technical approach - How do you work with non-ML engineers? - Describe a cross-functional project you led **Leadership** - How do you mentor junior engineers? - Tell me about a time you had to make a difficult technical decision - How do you prioritize when you have competing demands? **Growth** - What's a technical area where you've grown significantly? - Tell me about a failure and what you learned - How do you stay current with ML developments? ### STAR Framework Structure behavioral answers using STAR: ``` Situation: Brief context (2-3 sentences) Task: Your specific responsibility Action: What YOU did (most important part) Result: Quantified outcome and learnings ``` **Example**: *Question: Tell me about a time an ML project failed* **Situation**: "We were building a recommendation system for a new feature, and three weeks before launch, we realized the model wasn't performing well on our key metric." **Task**: "As the ML lead, I was responsible for diagnosing the issue and getting us back on track for launch." **Action**: "I did three things: First, I analyzed our evaluation data and found we had significant data leakage in our offline metrics—our offline numbers looked great but didn't reflect real performance. Second, I designed a quick A/B test to validate, which confirmed the issue. Third, I proposed a simpler baseline model that we could ship on time, with plans to iterate." **Result**: "We launched with the simpler model, which still improved our metric by 8% over the control. The experience led me to establish better evaluation practices for the team, including mandatory online/offline correlation checks. The follow-up version, with proper evaluation, achieved 15% improvement." --- ## Sample Interview Questions by Topic ### LLM and Prompting 1. How would you improve a prompt that's giving inconsistent results? 2. What's the difference between zero-shot, few-shot, and fine-tuning? 3. How would you evaluate an LLM-based feature? 4. Explain chain-of-thought prompting. When would you use it? 5. How do you handle hallucinations in production? ### RAG Systems 1. Walk me through designing a RAG system end-to-end 2. How do you evaluate retrieval quality? 3. When would you use dense vs. sparse retrieval? 4. How do you handle documents that are too long for context? 5. What are failure modes of RAG and how do you address them? ### Training and Fine-Tuning 1. When would you fine-tune vs. prompt an LLM? 2. Explain LoRA. Why does it work? 3. How do you prevent catastrophic forgetting during fine-tuning? 4. What's the difference between SFT, RLHF, and DPO? 5. How do you decide on training data quantity and quality? ### Infrastructure and Scale 1. How would you reduce inference latency for an LLM service? 2. Explain the trade-offs of different quantization approaches 3. How would you handle 10x traffic increase? 4. Design a caching strategy for an LLM application 5. How do you monitor an ML system in production? ### Data and Features 1. How do you handle training-serving skew? 2. What is a feature store and when would you use one? 3. How do you detect and handle data drift? 4. Explain the importance of point-in-time correctness 5. How do you version ML data and models? --- ## Interview Day Tips ### Before the Interview - Review the company's ML products and published work - Refresh fundamentals (don't assume you remember) - Prepare 2-3 projects to discuss in depth - Practice explaining technical concepts aloud - Get good sleep ### During the Interview **For Coding**: - Think aloud—silence is concerning - Start with brute force, then optimize - Test your code before declaring done - Ask clarifying questions **For System Design**: - Drive the conversation—don't wait for prompts - Draw diagrams (even in virtual interviews) - Discuss trade-offs explicitly - Connect to business requirements **For Behavioral**: - Be specific—vague answers don't score well - Focus on YOUR contributions - Include quantified results when possible - Be honest about failures—they want self-awareness ### Questions to Ask Interviewers Good questions demonstrate genuine interest: - What ML infrastructure do you use? - How do you evaluate ML model success? - What's the biggest technical challenge the team faces? - How do you balance research and production? - What does the on-call look like for ML systems? --- ## Level-Specific Preparation Checklists ### Entry-Level (0-3 years) - [ ] Strong coding fundamentals (LeetCode medium level) - [ ] Can implement basic ML algorithms from scratch - [ ] Understands bias-variance trade-off - [ ] Knows common metrics and when to use them - [ ] Can explain a project end-to-end - [ ] Basic understanding of neural networks ### Mid-Level (3-5 years) - [ ] Efficient coding with ML libraries - [ ] Deep understanding of at least one ML area - [ ] Can design small-to-medium ML systems - [ ] Experience with production ML challenges - [ ] Can discuss trade-offs thoughtfully - [ ] Good collaboration and communication ### Senior (5-8 years) - [ ] Expert in specific ML domain - [ ] Can design complex ML systems - [ ] Strong production ML experience - [ ] Leadership and mentoring experience - [ ] Technical decision-making track record - [ ] Cross-functional collaboration skills ### Staff+ (8+ years) - [ ] Technical strategy and vision - [ ] Multi-team impact and influence - [ ] Deep expertise AND broad knowledge - [ ] Executive communication skills - [ ] Organizational design thinking - [ ] Business and technical alignment --- ## Resources for Preparation ### Books - "Designing Machine Learning Systems" (Chip Huyen) - "Machine Learning Engineering" (Andriy Burkov) - "Building Machine Learning Powered Applications" (Emmanuel Ameisen) ### Courses - Stanford CS229 (ML fundamentals) - Stanford CS224N (NLP) - fast.ai (Practical deep learning) ### Practice - LeetCode for coding - ML system design practice with peers - Mock interviews (Pramp, interviewing.io) ### Stay Current - Papers With Code - ML blogs (OpenAI, Anthropic, Google AI) - r/MachineLearning --- ## Detailed Answer Examples ### ML System Design Example: Design a Recommendation System Here's how to structure your answer for a recommendation system design question: **Opening (Clarify Requirements)**: "Before I dive in, let me make sure I understand the requirements. Is this for a new feature or an existing system? What's our scale in terms of users and items? What's the latency budget for serving recommendations? Do we have historical user data already?" *Assume answers: 10M users, 1M items, 100ms latency, we have 6 months of click/purchase data* **High-Level Architecture**: "I'd structure this as a two-stage system: candidate generation to narrow from 1M items to ~1000, then a ranking model to select the top 10-20. For candidate generation: - Item-based collaborative filtering for items similar to user's history - User embedding similarity to find items liked by similar users - Popularity baseline to ensure coverage For ranking: - Features: user features, item features, cross features (user-item interactions) - Model: Gradient boosted trees initially, can upgrade to neural ranking later - Training: Point-wise learning on click/purchase labels Serving architecture: - Pre-computed candidate sets refreshed hourly - Online ranking with cached user/item features - Feature store for real-time features (recent clicks)" **Deep Dives** (interviewer may ask about specific areas): *"How do you handle cold start?"* "For new users: popularity-based recommendations, quickly incorporating their first few interactions. For new items: content-based features (category, description embedding), explore/exploit to get initial signals." *"How do you evaluate?"* "Offline: We'd look at precision@k, recall@k, NDCG. But offline metrics can be misleading, so we'd run A/B tests measuring click-through rate and ultimately conversion. We'd also track diversity and novelty metrics to ensure we're not just showing popular items." *"What about serving latency?"* "The 100ms budget is tight. I'd pre-compute as much as possible: user embeddings daily, candidate sets hourly. The ranking model would be a lightweight scorer. We'd use caching for frequently requested user recommendations. If needed, we can trade off ranking model complexity for speed." ### Behavioral Answer Example: Technical Disagreement **Question**: "Tell me about a time you disagreed with a teammate on a technical approach." **Using STAR**: **Situation**: "On my last team, we were building an LLM-powered search feature. The senior engineer on the project wanted to fine-tune a model specifically for our domain, while I believed a RAG approach with a general model would be better." **Task**: "I needed to advocate for my approach while maintaining a collaborative relationship and being open to being wrong." **Action**: "First, I made sure I deeply understood their reasoning—they were concerned about domain-specific vocabulary that a general model might not handle well. That was a valid concern. I then proposed we run a quick experiment: two days to prototype both approaches on a subset of queries. I wrote up clear evaluation criteria before we ran the experiment so we'd have objective measures. The results showed RAG actually performed better on most queries, but fine-tuning won on queries with technical jargon. So we ended up with a hybrid: RAG for most queries with a fine-tuned reranker for technical terms. Throughout, I made sure to frame it as 'finding the best approach' rather than 'proving I was right.' When the hybrid emerged, we both felt ownership." **Result**: "The hybrid approach improved our search relevance by 20% over the baseline. More importantly, we established a pattern of data-driven decision making that the team used for subsequent technical debates. The other engineer and I became close collaborators afterward." ### Coding Answer Example: Implement Cosine Similarity **Question**: "Implement a function to compute cosine similarity between two vectors." **Initial Clarification**: "Are these dense vectors? Any constraints on length? Should I handle edge cases like zero vectors?" **Solution**: ```python import math from typing import List def cosine_similarity(vec1: List[float], vec2: List[float]) -> float: """ Compute cosine similarity between two vectors. Args: vec1: First vector vec2: Second vector (must be same length) Returns: Cosine similarity in range [-1, 1] Raises: ValueError: If vectors have different lengths or are empty """ if len(vec1) != len(vec2): raise ValueError("Vectors must have same length") if len(vec1) == 0: raise ValueError("Vectors cannot be empty") # Compute dot product and magnitudes dot_product = sum(a * b for a, b in zip(vec1, vec2)) magnitude1 = math.sqrt(sum(a * a for a in vec1)) magnitude2 = math.sqrt(sum(b * b for b in vec2)) # Handle zero vectors if magnitude1 == 0 or magnitude2 == 0: return 0.0 # Or could raise error depending on requirements return dot_product / (magnitude1 * magnitude2) # Test cases def test_cosine_similarity(): # Identical vectors should have similarity 1 assert abs(cosine_similarity([1, 0], [1, 0]) - 1.0) < 1e-9 # Orthogonal vectors should have similarity 0 assert abs(cosine_similarity([1, 0], [0, 1]) - 0.0) < 1e-9 # Opposite vectors should have similarity -1 assert abs(cosine_similarity([1, 0], [-1, 0]) - (-1.0)) < 1e-9 # General case assert abs(cosine_similarity([1, 2, 3], [4, 5, 6]) - 0.9746) < 0.001 print("All tests passed!") test_cosine_similarity() ``` **Optimization Discussion**: "If we're computing many similarities, we'd want to use NumPy for vectorization. We could also pre-compute magnitudes if vectors are reused. For very high-dimensional vectors at scale, we'd consider approximate methods like LSH." --- ## Common Mistakes to Avoid ### In Coding Interviews 1. **Starting to code too quickly**: Spend time understanding and planning 2. **Not testing**: Always walk through examples before saying "done" 3. **Silence**: Keep talking about your thought process 4. **Ignoring edge cases**: Empty inputs, negative numbers, etc. 5. **Over-engineering**: Start simple, optimize if asked ### In System Design 1. **Not clarifying requirements**: Don't assume—ask 2. **Jumping to details**: Start with high-level architecture 3. **Not discussing trade-offs**: Every choice has pros and cons 4. **Forgetting operations**: Monitoring, debugging, iteration 5. **Ignoring scale**: Think about 10x and 100x scenarios ### In ML Fundamentals 1. **Memorizing without understanding**: Can't handle follow-up questions 2. **Not connecting to practice**: Know when to use what 3. **Overcomplicating**: Simple explanations are better 4. **Ignoring limitations**: Every technique has failure modes ### In Behavioral 1. **Being too vague**: Specific stories are convincing 2. **Forgetting the result**: STAR requires outcomes 3. **Taking all credit**: "We" is OK, but explain YOUR role 4. **Only successes**: Failure stories show self-awareness --- ## Mock Interview Script Practice with a friend using this structure: **Round 1: Coding (45 min)** - 5 min: Introductions - 35 min: Coding problem (e.g., "Implement k-nearest neighbors") - 5 min: Questions **Round 2: ML Design (45 min)** - 5 min: Introductions - 35 min: Design problem (e.g., "Design a content recommendation system") - 5 min: Questions **Round 3: ML Depth (30 min)** - 5 min: Introductions - 20 min: Deep technical discussion (e.g., "Explain transformers in detail") - 5 min: Questions **Round 4: Behavioral (30 min)** - 5 min: Introductions - 20 min: Behavioral questions - 5 min: Questions **Feedback Session (15 min)** - What went well - Areas for improvement - Specific suggestions --- ## Week-by-Week Preparation Plan ### 4 Weeks Before Interview **Week 1: Foundations** - Review ML fundamentals (algorithms, evaluation) - Start coding practice (2 problems/day) - List your top 3-4 projects with details **Week 2: System Design** - Study ML system design patterns - Practice 2-3 design problems - Review company's ML products/papers **Week 3: Deep Dives** - Pick 1-2 areas to go deep - Read relevant papers - Prepare to explain at multiple levels **Week 4: Polish** - Mock interviews (at least 2) - Behavioral story preparation - Rest and mental preparation ### Day Before - Review your project stories - Light review of fundamentals (don't cram) - Prepare questions for interviewers - Lay out interview logistics - Get good sleep ### Day Of - Eat well, stay hydrated - Arrive/log in early - Have paper/pen ready - Deep breaths between rounds - Stay positive regardless of how rounds feel --- ## Final Advice **Be Yourself**: Authentic enthusiasm for ML comes through. If you're genuinely interested in the field, let that show. **Admit Uncertainty**: "I don't know, but here's how I'd approach finding out" is better than pretending to know. **Show Growth Mindset**: Talk about what you've learned, not just what you know. **Practice**: Interview skills improve with practice. Do mock interviews. **Learn from Rejections**: Each interview is learning. Ask for feedback when possible. Good luck with your interviews! --- ## Company-Specific Preparation ### Big Tech (Google, Meta, Amazon, Microsoft) **Focus Areas**: - Heavy emphasis on coding (expect 2+ rounds) - System design at scale (billions of users) - Leadership principles (especially Amazon) - Past project deep dives **Preparation Tips**: - Practice with LeetCode medium/hard problems - Study their published research papers - Understand their ML infrastructure (TensorFlow, PyTorch, internal tools) - Prepare stories that map to leadership principles ### AI-First Companies (OpenAI, Anthropic, Cohere) **Focus Areas**: - Deep technical understanding of LLMs - Research awareness and curiosity - Safety and alignment considerations - Novel problem-solving ability **Preparation Tips**: - Read their recent papers and blog posts - Understand current LLM limitations - Have opinions on open research questions - Think about safety implications ### ML-Heavy Startups **Focus Areas**: - End-to-end ownership - Scrappiness and velocity - Breadth across ML stack - Business impact awareness **Preparation Tips**: - Understand their product and ML use cases - Prepare for "wear many hats" scenarios - Emphasize shipping and iteration - Show flexibility in approaches ### Traditional Tech with ML Teams **Focus Areas**: - Production ML experience - Working with non-ML teams - Integrating ML into existing systems - Business metric focus **Preparation Tips**: - Understand their core business - Prepare for less ML-specific coding - Emphasize collaboration and communication - Focus on practical impact over research novelty --- ## Handling Difficult Interview Situations ### When You Don't Know the Answer **Good response**: "I'm not sure about that specific detail, but here's how I'd approach figuring it out..." or "I haven't worked with that directly, but based on my understanding of similar systems..." **Why it works**: Shows intellectual honesty and problem-solving approach ### When You Make a Mistake **Good response**: "Actually, let me reconsider that—I think I made an error. The correct approach is..." **Why it works**: Self-correction shows strong fundamentals and self-awareness ### When the Problem is Unclear **Good response**: "Let me make sure I understand the requirements. Are we optimizing for X or Y? What constraints do we have?" **Why it works**: Clarification is expected and shows good engineering practice ### When You're Stuck **Good response**: "I'm thinking about a few different approaches. Let me talk through what I'm considering..." Then explain your thought process. **Why it works**: Interviewers want to see how you think, not just the answer ### When the Interviewer Seems Unengaged **Good response**: Stay focused and professional. Ask if they have any questions or want you to go deeper on any area. **Why it works**: Some interviewers are naturally reserved; don't let it affect your performance --- ## Post-Interview ### After Each Round - Take brief notes on questions asked - Note any areas where you felt weak - Don't obsess—move on to the next round ### After the Interview Day - Send thank you notes (optional but good practice) - Reflect on what went well and what to improve - Don't second-guess your performance excessively ### If You Get an Offer - Ask for detailed leveling and compensation breakdown - Negotiate thoughtfully (it's expected) - Ask about team matching and projects ### If You Get Rejected - Ask for feedback (may or may not be provided) - Reflect honestly on areas to improve - Many successful candidates were rejected before - Most companies allow re-applying after 6-12 months --- ## Additional Practice Questions ### LLM Deep Dive Questions 1. Walk me through the transformer architecture component by component. 2. How does self-attention differ from cross-attention? 3. Explain the difference between pre-training and fine-tuning objectives. 4. What are the key challenges in training large language models? 5. How would you debug a model that's producing inconsistent outputs? 6. Explain RLHF at a technical level. What are alternatives? 7. What's the relationship between perplexity and quality? 8. How do you think about the trade-off between model size and cost? ### Production ML Questions 1. How do you ensure your offline metrics correlate with online performance? 2. Describe your approach to ML model monitoring. 3. How do you handle model drift in production? 4. What's your framework for deciding when to retrain a model? 5. How do you debug a model that's performing poorly in production? 6. Describe a production ML system you've built or maintained. 7. How do you balance technical debt and feature velocity in ML systems? 8. What are the most common causes of ML system failures you've seen? ### Leadership and Strategy Questions (for Senior+) 1. How do you evaluate whether to use ML for a problem? 2. Describe how you've influenced technical direction outside your team. 3. How do you balance investment in ML infrastructure vs. applications? 4. Tell me about a time you had to make a difficult technical trade-off. 5. How do you stay current with ML developments while being productive? 6. What's your approach to building high-performing ML teams? 7. How do you communicate ML capabilities and limitations to stakeholders? 8. What do you see as the most important trends in AI engineering?