Appendix B: Tool & Framework Reference

This appendix provides a reference guide to the major tools and frameworks used in AI engineering as of 2026. The landscape evolves rapidly—use this as a starting point and verify current versions and alternatives.

Last Updated: May 2026 Review Frequency: This appendix should be reviewed quarterly as the AI tooling landscape changes significantly every 3-6 months.

Emerging Tools and Trends (2025-2026)

Before diving into established tools, here are the notable emerging developments:

Agentic Frameworks

MCP (Model Context Protocol): Anthropic’s protocol for standardized tool integration with LLMs. Growing ecosystem of MCP-compatible tools and servers.

# MCP is becoming the standard for tool integration
# Example MCP server definition
{
    "name": "filesystem",
    "version": "1.0.0",
    "tools": [
        {
            "name": "read_file",
            "description": "Read contents of a file",
            "parameters": {"path": {"type": "string"}}
        }
    ]
}

Claude Code, Cursor, Windsurf: AI-powered coding assistants that go beyond autocomplete to full agentic coding. These tools can:

Edit multiple files simultaneously
Run terminal commands
Understand full repository context
Execute multi-step coding tasks

CrewAI, AutoGen, LangGraph: Multi-agent orchestration frameworks for complex workflows.

Inference Optimization

SGLang: Alternative to vLLM with focus on structured generation and complex prompting patterns. Worth evaluating for applications with heavy structured output needs.

Medusa, EAGLE: Speculative decoding implementations achieving 2-3x speedups without quality loss.

FP8 Quantization: Native support in H100/H200 GPUs enabling faster inference with minimal quality loss compared to FP16.

Development Tools

LiteLLM: Unified API across 100+ LLM providers. Increasingly standard for multi-provider applications.

from litellm import completion

# Same interface for any provider
response = completion(
    model="gpt-5",  # or "claude-opus-4-8", "anthropic/claude-sonnet-4-6", etc.
    messages=[{"role": "user", "content": "Hello!"}]
)

promptfoo: Prompt testing and evaluation framework gaining adoption for systematic prompt engineering.

Weights & Biases Prompts: Prompt versioning and experimentation tracking.

Evaluation and Safety

HuggingFace Lighteval: Standardized evaluation harness for LLM benchmarks.

Inspect AI: Anthropic’s framework for AI safety evaluations.

Patronus AI, Galileo, Langfuse: Commercial LLM observability and evaluation platforms.

LLM Inference Engines

vLLM

Purpose: High-throughput LLM inference engine

Key Features:

PagedAttention for efficient memory management
Continuous batching for high throughput
Support for most popular open-source LLMs
OpenAI-compatible API

When to Use: Production LLM serving requiring high throughput

Installation:

pip install vllm

Example:

from vllm import LLM, SamplingParams

llm = LLM(model="meta-llama/Llama-4-Scout-17B-16E")
outputs = llm.generate(["Hello, how are you?"], SamplingParams(temperature=0.7))

Resources: https://docs.vllm.ai

Text Generation Inference (TGI)

Purpose: Hugging Face’s production inference server

Key Features:

Optimized for Hugging Face models
Flash Attention support
Quantization support (AWQ, GPTQ, bitsandbytes)
Prometheus metrics built-in

When to Use: Hugging Face model deployment, especially in containers

Docker Launch:

docker run --gpus all -p 8080:80 ghcr.io/huggingface/text-generation-inference \
    --model-id meta-llama/Llama-4-Scout-17B-16E

Resources: https://huggingface.co/docs/text-generation-inference

Ollama

Purpose: Local LLM inference for development

Key Features:

Simple installation and model management
Runs on consumer hardware
Good for development and testing
REST API compatible with OpenAI format

When to Use: Local development, prototyping, privacy-sensitive applications

Installation:

curl -fsSL https://ollama.com/install.sh | sh
ollama run llama4

Resources: https://ollama.com

TensorRT-LLM

Purpose: NVIDIA’s optimized LLM inference

Key Features:

Maximum performance on NVIDIA GPUs
INT8/FP8 quantization
Multi-GPU support
Inflight batching

When to Use: Production serving requiring maximum GPU efficiency

Resources: https://github.com/NVIDIA/TensorRT-LLM

ML Frameworks

PyTorch

Purpose: Primary deep learning framework

Key Features:

Dynamic computation graphs
Extensive ecosystem
Strong GPU support
Industry standard for research and production

When to Use: Almost all deep learning development

Installation:

pip install torch torchvision torchaudio

Resources: https://pytorch.org

Hugging Face Transformers

Purpose: Pre-trained model library and utilities

Key Features:

Thousands of pre-trained models
Unified API for different architectures
Training utilities (Trainer, PEFT)
Integration with Hugging Face Hub

When to Use: Working with transformer models

Installation:

pip install transformers

Example:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")

Resources: https://huggingface.co/docs/transformers

LangChain

Purpose: Framework for LLM application development

Key Features:

Chain composition for complex workflows
Built-in prompts and output parsers
Tool and agent support
Memory and context management

When to Use: Building LLM applications with complex workflows

Installation:

pip install langchain langchain-openai

Example:

from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate

llm = ChatOpenAI(model="gpt-5")
prompt = ChatPromptTemplate.from_template("Summarize: {text}")
chain = prompt | llm
result = chain.invoke({"text": "Long document..."})

Resources: https://python.langchain.com

LlamaIndex

Purpose: Data framework for LLM applications

Key Features:

Document loading and indexing
Various retrieval strategies
Query engines and chat engines
Strong RAG support

When to Use: RAG systems and document-based applications

Installation:

pip install llama-index

Resources: https://docs.llamaindex.ai

Vector Databases

Pinecone

Purpose: Managed vector database

Key Features:

Fully managed service
Low-latency queries at scale
Metadata filtering
Hybrid search support

When to Use: Production vector search without infrastructure management

Installation:

pip install pinecone

Resources: https://docs.pinecone.io

Weaviate

Purpose: Open-source vector database

Key Features:

Self-hosted or cloud
GraphQL API
Built-in vectorization modules
Hybrid search

When to Use: Self-hosted vector search with flexibility

Resources: https://weaviate.io/developers/weaviate

Milvus

Purpose: Open-source vector database for scale

Key Features:

Designed for billion-scale
Multiple index types
Kubernetes-native
GPU acceleration

When to Use: Large-scale vector search requiring fine control

Resources: https://milvus.io

Qdrant

Purpose: High-performance vector database

Key Features:

Rust-based for performance
Rich filtering
Quantization support
Simple deployment

When to Use: Performance-critical vector search

Installation:

docker run -p 6333:6333 qdrant/qdrant

Resources: https://qdrant.tech/documentation

Chroma

Purpose: Lightweight embedding database

Key Features:

Simple API
Embedded or client-server mode
Good for prototyping
Python-native

When to Use: Prototyping, small-scale applications

Installation:

pip install chromadb

Resources: https://docs.trychroma.com

Feature Stores

Feast

Purpose: Open-source feature store

Key Features:

Cloud-agnostic
Online and offline stores
Point-in-time joins
Python SDK

When to Use: Feature management with flexibility

Installation:

pip install feast

Resources: https://feast.dev

Tecton

Purpose: Enterprise feature platform

Key Features:

Real-time feature serving
Streaming feature computation
Enterprise governance
Managed service

When to Use: Enterprise ML with real-time features

Resources: https://www.tecton.ai

Experiment Tracking

MLflow

Purpose: ML lifecycle management

Key Features:

Experiment tracking
Model registry
Model serving
Open source

When to Use: End-to-end ML lifecycle management

Installation:

pip install mlflow

Example:

import mlflow

mlflow.start_run()
mlflow.log_param("learning_rate", 0.01)
mlflow.log_metric("accuracy", 0.95)
mlflow.end_run()

Resources: https://mlflow.org

Weights & Biases (W&B)

Purpose: Experiment tracking and visualization

Key Features:

Beautiful visualizations
Team collaboration
Hyperparameter sweeps
Model registry

When to Use: Team-based ML development

Installation:

pip install wandb

Resources: https://docs.wandb.ai

Data Processing

Apache Spark

Purpose: Distributed data processing

Key Features:

Large-scale data processing
SQL, streaming, ML libraries
Widely deployed
Databricks integration

When to Use: Large-scale batch processing

Resources: https://spark.apache.org

Databricks

Purpose: Unified analytics platform

Key Features:

Managed Spark
Unity Catalog
MLflow integration
Feature store

When to Use: Enterprise data and ML platform

Resources: https://docs.databricks.com

Ray

Purpose: Distributed computing framework

Key Features:

Simple distributed Python
ML-focused libraries (Ray Train, Ray Serve)
Scales from laptop to cluster
Good for hyperparameter tuning

When to Use: Distributed ML training and serving

Installation:

pip install ray[default]

Resources: https://docs.ray.io

Orchestration

Airflow

Purpose: Workflow orchestration

Key Features:

DAG-based workflows
Extensive operator library
Monitoring and alerting
Industry standard

When to Use: Batch ML pipeline orchestration

Installation:

pip install apache-airflow

Resources: https://airflow.apache.org

Prefect

Purpose: Modern workflow orchestration

Key Features:

Python-native
Easier than Airflow
Cloud or self-hosted
Good debugging

When to Use: Python-centric teams wanting simpler orchestration

Installation:

pip install prefect

Resources: https://docs.prefect.io

Kubeflow

Purpose: ML on Kubernetes

Key Features:

Kubernetes-native ML
Pipelines, notebooks, training
Model serving (KServe)
Multi-tenancy

When to Use: Kubernetes-based ML platforms

Resources: https://www.kubeflow.org

Monitoring & Observability

Prometheus + Grafana

Purpose: Metrics collection and visualization

Key Features:

Industry standard metrics
Rich visualization
Alerting
Large ecosystem

When to Use: Infrastructure and application monitoring

Resources: https://prometheus.io, https://grafana.com

Datadog

Purpose: Cloud monitoring platform

Key Features:

Unified monitoring
ML monitoring capabilities
Log management
APM

When to Use: Enterprise observability

Resources: https://docs.datadoghq.com

Arize AI

Purpose: ML observability platform

Key Features:

Model monitoring
Drift detection
Explainability
Embedding analysis

When to Use: Production ML monitoring

Resources: https://docs.arize.com

Evaluation

RAGAS

Purpose: RAG evaluation framework

Key Features:

RAG-specific metrics
Faithfulness, relevancy, context
LLM-based evaluation
Easy integration

When to Use: Evaluating RAG systems

Installation:

pip install ragas

Resources: https://docs.ragas.io

DeepEval

Purpose: LLM evaluation framework

Key Features:

Multiple evaluation metrics
Test-driven development for LLMs
CI/CD integration
Benchmarking

When to Use: Systematic LLM evaluation

Installation:

pip install deepeval

Resources: https://docs.confident-ai.com

API Providers

OpenAI

Models: GPT-5.5, GPT-5.4, DALL-E, Whisper

Strengths: Leading capability, extensive ecosystem, reliable API

Pricing: Pay-per-token, varies by model

Resources: https://platform.openai.com

Anthropic

Models: Claude Opus 4.8, Sonnet 4.6, Haiku 4.5

Strengths: Strong reasoning, long context, safety focus

Pricing: Pay-per-token

Resources: https://docs.anthropic.com

Google (Vertex AI)

Models: Gemini family, specialized models (Imagen, Veo, embedding models)

Strengths: Multimodal, GCP integration, enterprise features

Pricing: Pay-per-token and character

Resources: https://cloud.google.com/vertex-ai

Amazon Bedrock

Models: Claude, Llama, Titan, Stable Diffusion

Strengths: AWS integration, multiple model providers

Pricing: Pay-per-token/image

Resources: https://docs.aws.amazon.com/bedrock

Cohere

Models: Command, Embed, Rerank

Strengths: Enterprise focus, strong embeddings

Pricing: Pay-per-token/embedding

Resources: https://docs.cohere.com

Development Tools

LangSmith

Purpose: LLM application development platform

Key Features:

Tracing and debugging
Testing and evaluation
Monitoring
Prompt management

When to Use: LangChain application development

Resources: https://docs.smith.langchain.com

OpenLLMetry

Purpose: Open-source LLM observability

Key Features:

OpenTelemetry-based
Multiple framework integrations
Vendor-agnostic
Trace visualization

When to Use: LLM observability with OpenTelemetry

Installation:

pip install traceloop-sdk

Resources: https://github.com/traceloop/openllmetry

Fine-Tuning & Training

PEFT (Parameter-Efficient Fine-Tuning)

Purpose: Efficient fine-tuning of large models

Key Features:

LoRA, QLoRA, adapters
Minimal additional parameters
Works with Transformers
Memory efficient

When to Use: Fine-tuning large models with limited resources

Installation:

pip install peft

Example:

from peft import get_peft_model, LoraConfig, TaskType

config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    r=8,
    lora_alpha=32,
    lora_dropout=0.1
)
model = get_peft_model(base_model, config)

Resources: https://huggingface.co/docs/peft

Axolotl

Purpose: Streamlined LLM fine-tuning

Key Features:

YAML-based configuration
Multiple training techniques
Dataset preprocessing
Good defaults

When to Use: Quick LLM fine-tuning without deep expertise

Resources: https://github.com/axolotl-ai-cloud/axolotl

DeepSpeed

Purpose: Deep learning optimization library

Key Features:

ZeRO memory optimization
Mixed precision training
Model parallelism
Microsoft-backed

When to Use: Training large models across multiple GPUs

Installation:

pip install deepspeed

Resources: https://www.deepspeed.ai

FSDP (Fully Sharded Data Parallel)

Purpose: PyTorch distributed training

Key Features:

Built into PyTorch
Memory-efficient training
Simpler than DeepSpeed for some cases
Native PyTorch integration

When to Use: Distributed training with PyTorch

Resources: https://pytorch.org/docs/stable/fsdp.html

Data Quality & Validation

Great Expectations

Purpose: Data quality validation

Key Features:

Expectation-based testing
Data documentation
CI/CD integration
Rich ecosystem

When to Use: Data quality in pipelines

Installation:

pip install great_expectations

Example:

import great_expectations as gx

context = gx.get_context()
validator = context.get_validator(batch_request=batch_request)
validator.expect_column_values_to_not_be_null("user_id")

Resources: https://docs.greatexpectations.io

Pandera

Purpose: Statistical data validation

Key Features:

DataFrame validation
Statistical checks
Schema inference
Pandas/Polars support

When to Use: DataFrame validation in Python

Installation:

pip install pandera

Resources: https://pandera.readthedocs.io

DVC (Data Version Control)

Purpose: Version control for ML data and models

Key Features:

Git-like data versioning
Pipeline reproducibility
Cloud storage backends
Experiment tracking

When to Use: Versioning data and models alongside code

Installation:

pip install dvc

Resources: https://dvc.org

Testing & Quality

pytest

Purpose: Python testing framework

Key Features:

Simple and powerful
Fixtures and parameterization
Plugin ecosystem
Industry standard

When to Use: All Python testing

Installation:

pip install pytest

Resources: https://docs.pytest.org

hypothesis

Purpose: Property-based testing

Key Features:

Automatic test case generation
Edge case discovery
Shrinking failing examples
Useful for data processing

When to Use: Testing data processing and transformations

Installation:

pip install hypothesis

Resources: https://hypothesis.readthedocs.io

Locust

Purpose: Load testing

Key Features:

Python-based test scripts
Distributed testing
Real-time metrics
Web UI

When to Use: Load testing inference endpoints

Installation:

pip install locust

Resources: https://locust.io

Security

Guardrails AI

Purpose: LLM output validation

Key Features:

Structured output enforcement
Content filtering
Type validation
Retry mechanisms

When to Use: Ensuring LLM outputs meet requirements

Installation:

pip install guardrails-ai

Resources: https://docs.guardrailsai.com

NeMo Guardrails

Purpose: Conversational AI safety

Key Features:

Programmable guardrails
Topical rails
Dialogue management
NVIDIA-backed

When to Use: Building safe conversational AI

Resources: https://github.com/NVIDIA/NeMo-Guardrails

Embeddings & Retrieval

Sentence Transformers

Purpose: Sentence embeddings library

Key Features:

Pre-trained embedding models
Training utilities
Multiple loss functions
Simple API

When to Use: Text embeddings for search and similarity

Installation:

pip install sentence-transformers

Example:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = model.encode(["Hello world", "How are you"])

Resources: https://www.sbert.net

FAISS

Purpose: Vector similarity search

Key Features:

Billion-scale search
GPU support
Multiple index types
Facebook/Meta research

When to Use: Efficient vector search in Python

Installation:

pip install faiss-cpu  # or faiss-gpu

Resources: https://faiss.ai

BM25 (rank_bm25)

Purpose: Sparse retrieval

Key Features:

Classic keyword search
No embeddings needed
Fast and interpretable
Good baseline

When to Use: Hybrid search, keyword matching

Installation:

pip install rank_bm25

Resources: https://github.com/dorianbrown/rank_bm25

Prompt Engineering

Guidance

Purpose: Structured LLM generation

Key Features:

Constrained generation
Template language
Control flow
Output parsing

When to Use: Complex structured outputs

Installation:

pip install guidance

Resources: https://github.com/guidance-ai/guidance

DSPy

Purpose: Programming with foundation models

Key Features:

Declarative programming
Automatic prompt optimization
Module composition
Stanford research

When to Use: Systematic prompt engineering

Installation:

pip install dspy-ai

Resources: https://dspy-docs.vercel.app

Container & Deployment

Docker

Purpose: Containerization

Key Features:

Reproducible environments
Industry standard
Rich ecosystem
Essential for deployment

When to Use: Packaging and deploying all applications

Resources: https://docs.docker.com

Kubernetes

Purpose: Container orchestration

Key Features:

Automatic scaling
Self-healing
Service discovery
Industry standard for production

When to Use: Production deployment at scale

Resources: https://kubernetes.io

KServe

Purpose: ML model serving on Kubernetes

Key Features:

Serverless inference
Autoscaling
A/B testing
Multiple frameworks

When to Use: Model serving on Kubernetes

Resources: https://kserve.github.io

BentoML

Purpose: ML model serving framework

Key Features:

Easy model packaging
Multiple frameworks
Adaptive batching
Cloud deployment

When to Use: Packaging models for deployment

Installation:

pip install bentoml

Resources: https://docs.bentoml.org

Cloud Platforms

AWS SageMaker

Purpose: AWS ML platform

Key Features:

Managed training and inference
Jupyter notebooks
Model registry
MLOps features

When to Use: ML on AWS

Resources: https://docs.aws.amazon.com/sagemaker

Google Cloud Vertex AI

Purpose: GCP ML platform

Key Features:

Managed ML pipelines
Model serving
AutoML
Gemini integration

When to Use: ML on GCP

Resources: https://cloud.google.com/vertex-ai/docs

Azure Machine Learning

Purpose: Azure ML platform

Key Features:

End-to-end ML lifecycle
MLOps automation
Responsible AI tools
Azure OpenAI integration

When to Use: ML on Azure

Resources: https://docs.microsoft.com/azure/machine-learning

Quick Selection Guide

Use Case	Recommended Tools
LLM Inference (High Volume)	vLLM, TGI, TensorRT-LLM
LLM Inference (Local Dev)	Ollama
RAG Application	LlamaIndex + Pinecone/Qdrant
Complex LLM Workflows	LangChain
Experiment Tracking	MLflow, W&B
Feature Store	Feast (open source), Tecton (enterprise)
Vector Database (Managed)	Pinecone
Vector Database (Self-Hosted)	Milvus, Qdrant
Pipeline Orchestration	Airflow, Prefect
ML on Kubernetes	Kubeflow
LLM Monitoring	Arize AI, LangSmith
API Provider (General)	OpenAI, Anthropic
API Provider (Enterprise/AWS)	Amazon Bedrock
API Provider (Enterprise/GCP)	Vertex AI

Version Compatibility Notes

The AI tool landscape evolves rapidly. When this book was written (early 2026):

PyTorch 2.x (2.12) is standard, with torch.compile providing significant speedups
Transformers 5.x (5.9, a major version) supports most modern models
vLLM and TGI are the dominant open-source inference engines
Vector databases have largely converged on similar feature sets

Always check documentation for current versions and breaking changes.

Additional Tools by Category

Model Optimization

Tool	Purpose	When to Use
ONNX Runtime	Cross-platform inference	Deploying models across different hardware
TensorRT	NVIDIA GPU optimization	Maximum performance on NVIDIA hardware
OpenVINO	Intel optimization	Inference on Intel CPUs/GPUs
CoreML	Apple device deployment	iOS/macOS applications
TFLite	Mobile deployment	Android/embedded devices
llama.cpp	CPU-optimized LLM inference	Running LLMs on CPU
ExLLamaV2	Quantized LLM inference	Memory-efficient GPU inference

Data Processing

Tool	Purpose	When to Use
Polars	Fast DataFrame operations	Large-scale data processing
Dask	Parallel computing	Out-of-core and distributed pandas
DuckDB	Embedded analytics	SQL queries on local data
Delta Lake	ACID data lake storage	Reliable data lake operations
Apache Arrow	Columnar memory format	High-performance data interchange

Annotation & Labeling

Tool	Purpose	When to Use
Label Studio	Open-source labeling	Self-hosted data annotation
Prodigy	Efficient annotation	Active learning-based labeling
Scale AI	Enterprise labeling	Large-scale human labeling
Amazon SageMaker Ground Truth	AWS labeling	Labeling with AWS integration
Snorkel	Programmatic labeling	Weak supervision approaches

Distributed Training

Tool	Purpose	When to Use
Horovod	Distributed training	Multi-GPU/multi-node training
PyTorch Lightning	Training framework	Organized PyTorch training
Mosaic ML Composer	Efficient training	Optimized training recipes
ColossalAI	Large model training	Training very large models

AutoML & Hyperparameter Tuning

Tool	Purpose	When to Use
Optuna	Hyperparameter optimization	Automatic hyperparameter tuning
Ray Tune	Distributed HPO	Scalable hyperparameter search
Auto-sklearn	AutoML	Automated model selection
H2O AutoML	Enterprise AutoML	Business-focused AutoML

Explainability

Tool	Purpose	When to Use
SHAP	Feature importance	Understanding model decisions
LIME	Local explanations	Explaining individual predictions
Captum	PyTorch interpretability	Deep learning explainability
InterpretML	Unified interpretability	Glass-box and black-box models

Natural Language Processing

Tool	Purpose	When to Use
spaCy	Industrial NLP	Production NLP pipelines
NLTK	Educational NLP	Learning and prototyping
Stanza	Research NLP	Stanford NLP models
Flair	State-of-art NLP	Sequence labeling

Computer Vision

Tool	Purpose	When to Use
OpenCV	Computer vision basics	Image processing operations
Ultralytics	Object detection	YOLO models
Detectron2	Object detection	Facebook research models
timm	Vision models	Pre-trained image models

Audio & Speech

Tool	Purpose	When to Use
Whisper	Speech-to-text	Transcription
TTS	Text-to-speech	Voice synthesis
Librosa	Audio processing	Audio feature extraction
PyAnnote	Speaker diarization	Who spoke when

Tool Selection Framework

When selecting tools for a new project, consider these factors:

1. Scale Requirements

Small Scale (< 1M requests/month):

Ollama for local LLM
Chroma for vectors
SQLite/DuckDB for data
Simple deployment (single server)

Medium Scale (1M-100M requests/month):

vLLM/TGI for inference
Qdrant/Weaviate for vectors
PostgreSQL/Redis for data
Kubernetes deployment

Large Scale (> 100M requests/month):

TensorRT-LLM for inference
Pinecone/Milvus for vectors
Specialized data infrastructure
Multi-region deployment

2. Team Expertise

Data Science Background:

Jupyter notebooks
MLflow for experiments
Scikit-learn patterns
Pandas-centric workflow

Software Engineering Background:

Containerized workflows
API-first design
Proper testing
CI/CD integration

Mixed Teams:

Unified platforms (Databricks, SageMaker)
Clear interfaces between components
Documentation emphasis

3. Cloud Strategy

AWS-Native:

SageMaker for ML
Bedrock for LLMs
Lambda for serverless
S3 for storage

GCP-Native:

Vertex AI for ML
Gemini for LLMs
Cloud Functions
BigQuery for data

Multi-Cloud / Agnostic:

Open-source tools (MLflow, Kubeflow)
Portable formats (ONNX)
Terraform for infrastructure

4. Compliance Requirements

Regulated Industries:

Self-hosted models
Private deployments
Audit logging
Data residency controls

Standard Business:

Managed services acceptable
API providers viable
Standard security practices

Migration Paths

Moving from Prototyping to Production

Prototype	Production Alternative
OpenAI API	vLLM + open model
Chroma	Qdrant, Pinecone
Jupyter notebooks	Python packages
Local files	Feature store
Manual deployment	CI/CD + Kubernetes

Scaling Existing Systems

Current State	Scaled State
Single GPU	Multi-GPU with DeepSpeed
Single node	Kubernetes cluster
Monolithic	Microservices
Synchronous	Async + queues
Single region	Multi-region

Cost Considerations

Free/Open Source Options

Inference: vLLM, TGI, Ollama
Vectors: Qdrant, Milvus, Chroma
Experiment Tracking: MLflow
Orchestration: Airflow
Feature Store: Feast

Managed Service Trade-offs

Aspect	Self-Hosted	Managed
Initial cost	Higher (setup)	Lower
Ongoing cost	Lower at scale	Higher at scale
Operational burden	Higher	Lower
Customization	Full control	Limited
Scaling	Manual	Automatic

Getting Started Recommendations

For Individual Developers

If you’re learning or building personal projects:

Start with: Ollama (local LLMs), Chroma (vectors), LangChain (orchestration)
Add as needed: Hugging Face for models, MLflow for experiments
Deploy with: Docker, any cloud provider

For Startups

Building an MVP with limited resources:

Start with: OpenAI/Anthropic APIs, Pinecone (managed vectors), Vercel/Railway (deployment)
Graduate to: vLLM + open models when API costs become significant
Focus on: Product-market fit over infrastructure optimization

For Enterprise Teams

Building production systems at scale:

Infrastructure: Kubernetes, Terraform, GitOps
ML Platform: Databricks/SageMaker/Vertex AI or build with MLflow + Kubeflow
Monitoring: Arize AI or Evidently + Prometheus/Grafana
Security: Air-gapped options, self-hosted models for sensitive data

For Research Teams

Running experiments and publishing papers:

Training: PyTorch + W&B + DeepSpeed
Compute: Cloud spot instances, academic GPU clusters
Reproducibility: DVC, Docker, experiment configs as code

Tool Ecosystem Compatibility

Many tools work better together. Here are common compatible stacks:

The Hugging Face Stack

Transformers (models)
Datasets (data loading)
PEFT (fine-tuning)
TGI (serving)
Hub (model hosting)

The LangChain Stack

LangChain (orchestration)
LangSmith (debugging)
LangServe (deployment)
LangGraph (agents)

The AWS Stack

SageMaker (training/serving)
Bedrock (LLM APIs)
S3 (storage)
Lambda (serverless)
Step Functions (orchestration)

The Open Source Stack

PyTorch (training)
vLLM (serving)
MLflow (experiments)
Feast (features)
Airflow (orchestration)
Prometheus + Grafana (monitoring)

Version Pinning Recommendations

AI tools evolve rapidly. For production systems:

# Example requirements.txt with pinning strategy

# Pin major versions
torch>=2.12,<3.0
transformers>=5.9,<6.0  # transformers v5 is a major version

# Pin exact versions for critical inference
vllm==0.19.0

# Allow patch updates for utilities
langchain>=0.3,<0.4

Best Practices:

Pin exact versions in production
Test upgrades in staging environments
Monitor release notes for breaking changes
Keep a changelog of version updates
Use virtual environments or containers for isolation

Deprecated and Legacy Tools

The following tools were popular but are now less recommended:

Tool	Status	Replacement
TensorFlow 1.x	Deprecated	PyTorch or TF 2.x
Keras (standalone)	Merged	tf.keras or PyTorch
GPT-2	Outdated	Modern LLMs
BERT-base (original)	Outdated	Sentence Transformers, E5
Apache Zeppelin	Less active	Jupyter
Dask-ML	Limited development	Ray, Spark
Haystack (older versions)	Major changes	LlamaIndex, LangChain

When encountering tutorials using deprecated tools, look for updated alternatives or modern equivalents.

Resources for Staying Current

Official Documentation

Always prefer official documentation over tutorials, as it reflects current versions.

Release Tracking

GitHub release pages for major projects
PyPI for Python package updates
Hugging Face blog for ecosystem news

Community Sources

r/MachineLearning and r/LocalLLaMA for discussions
Discord servers for major projects (LangChain, vLLM, etc.)
Twitter/X ML community for breaking developments

This Book’s Companion

Check the book’s repository for updated tool recommendations and errata: the AI landscape in 2027 will differ from what’s documented here in 2026

# Appendix B: Tool & Framework Reference {.unnumbered} This appendix provides a reference guide to the major tools and frameworks used in AI engineering as of 2026. The landscape evolves rapidly—use this as a starting point and verify current versions and alternatives. **Last Updated:** May 2026 **Review Frequency:** This appendix should be reviewed quarterly as the AI tooling landscape changes significantly every 3-6 months. --- ## Emerging Tools and Trends (2025-2026) Before diving into established tools, here are the notable emerging developments: ### Agentic Frameworks **MCP (Model Context Protocol)**: Anthropic's protocol for standardized tool integration with LLMs. Growing ecosystem of MCP-compatible tools and servers. ```python # MCP is becoming the standard for tool integration # Example MCP server definition { "name": "filesystem", "version": "1.0.0", "tools": [ { "name": "read_file", "description": "Read contents of a file", "parameters": {"path": {"type": "string"}} } ] } ``` **Claude Code, Cursor, Windsurf**: AI-powered coding assistants that go beyond autocomplete to full agentic coding. These tools can: - Edit multiple files simultaneously - Run terminal commands - Understand full repository context - Execute multi-step coding tasks **CrewAI, AutoGen, LangGraph**: Multi-agent orchestration frameworks for complex workflows. ### Inference Optimization **SGLang**: Alternative to vLLM with focus on structured generation and complex prompting patterns. Worth evaluating for applications with heavy structured output needs. **Medusa, EAGLE**: Speculative decoding implementations achieving 2-3x speedups without quality loss. **FP8 Quantization**: Native support in H100/H200 GPUs enabling faster inference with minimal quality loss compared to FP16. ### Development Tools **LiteLLM**: Unified API across 100+ LLM providers. Increasingly standard for multi-provider applications. ```python from litellm import completion # Same interface for any provider response = completion( model="gpt-5", # or "claude-opus-4-8", "anthropic/claude-sonnet-4-6", etc. messages=[{"role": "user", "content": "Hello!"}] ) ``` **promptfoo**: Prompt testing and evaluation framework gaining adoption for systematic prompt engineering. **Weights & Biases Prompts**: Prompt versioning and experimentation tracking. ### Evaluation and Safety **HuggingFace Lighteval**: Standardized evaluation harness for LLM benchmarks. **Inspect AI**: Anthropic's framework for AI safety evaluations. **Patronus AI, Galileo, Langfuse**: Commercial LLM observability and evaluation platforms. --- ## LLM Inference Engines ### vLLM **Purpose**: High-throughput LLM inference engine **Key Features**: - PagedAttention for efficient memory management - Continuous batching for high throughput - Support for most popular open-source LLMs - OpenAI-compatible API **When to Use**: Production LLM serving requiring high throughput **Installation**: ```bash pip install vllm ``` **Example**: ```python from vllm import LLM, SamplingParams llm = LLM(model="meta-llama/Llama-4-Scout-17B-16E") outputs = llm.generate(["Hello, how are you?"], SamplingParams(temperature=0.7)) ``` **Resources**: https://docs.vllm.ai --- ### Text Generation Inference (TGI) **Purpose**: Hugging Face's production inference server **Key Features**: - Optimized for Hugging Face models - Flash Attention support - Quantization support (AWQ, GPTQ, bitsandbytes) - Prometheus metrics built-in **When to Use**: Hugging Face model deployment, especially in containers **Docker Launch**: ```bash docker run --gpus all -p 8080:80 ghcr.io/huggingface/text-generation-inference \ --model-id meta-llama/Llama-4-Scout-17B-16E ``` **Resources**: https://huggingface.co/docs/text-generation-inference --- ### Ollama **Purpose**: Local LLM inference for development **Key Features**: - Simple installation and model management - Runs on consumer hardware - Good for development and testing - REST API compatible with OpenAI format **When to Use**: Local development, prototyping, privacy-sensitive applications **Installation**: ```bash curl -fsSL https://ollama.com/install.sh | sh ollama run llama4 ``` **Resources**: https://ollama.com --- ### TensorRT-LLM **Purpose**: NVIDIA's optimized LLM inference **Key Features**: - Maximum performance on NVIDIA GPUs - INT8/FP8 quantization - Multi-GPU support - Inflight batching **When to Use**: Production serving requiring maximum GPU efficiency **Resources**: https://github.com/NVIDIA/TensorRT-LLM --- ## ML Frameworks ### PyTorch **Purpose**: Primary deep learning framework **Key Features**: - Dynamic computation graphs - Extensive ecosystem - Strong GPU support - Industry standard for research and production **When to Use**: Almost all deep learning development **Installation**: ```bash pip install torch torchvision torchaudio ``` **Resources**: https://pytorch.org --- ### Hugging Face Transformers **Purpose**: Pre-trained model library and utilities **Key Features**: - Thousands of pre-trained models - Unified API for different architectures - Training utilities (Trainer, PEFT) - Integration with Hugging Face Hub **When to Use**: Working with transformer models **Installation**: ```bash pip install transformers ``` **Example**: ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("gpt2") tokenizer = AutoTokenizer.from_pretrained("gpt2") ``` **Resources**: https://huggingface.co/docs/transformers --- ### LangChain **Purpose**: Framework for LLM application development **Key Features**: - Chain composition for complex workflows - Built-in prompts and output parsers - Tool and agent support - Memory and context management **When to Use**: Building LLM applications with complex workflows **Installation**: ```bash pip install langchain langchain-openai ``` **Example**: ```python from langchain_openai import ChatOpenAI from langchain.prompts import ChatPromptTemplate llm = ChatOpenAI(model="gpt-5") prompt = ChatPromptTemplate.from_template("Summarize: {text}") chain = prompt | llm result = chain.invoke({"text": "Long document..."}) ``` **Resources**: https://python.langchain.com --- ### LlamaIndex **Purpose**: Data framework for LLM applications **Key Features**: - Document loading and indexing - Various retrieval strategies - Query engines and chat engines - Strong RAG support **When to Use**: RAG systems and document-based applications **Installation**: ```bash pip install llama-index ``` **Resources**: https://docs.llamaindex.ai --- ## Vector Databases ### Pinecone **Purpose**: Managed vector database **Key Features**: - Fully managed service - Low-latency queries at scale - Metadata filtering - Hybrid search support **When to Use**: Production vector search without infrastructure management **Installation**: ```bash pip install pinecone ``` **Resources**: https://docs.pinecone.io --- ### Weaviate **Purpose**: Open-source vector database **Key Features**: - Self-hosted or cloud - GraphQL API - Built-in vectorization modules - Hybrid search **When to Use**: Self-hosted vector search with flexibility **Resources**: https://weaviate.io/developers/weaviate --- ### Milvus **Purpose**: Open-source vector database for scale **Key Features**: - Designed for billion-scale - Multiple index types - Kubernetes-native - GPU acceleration **When to Use**: Large-scale vector search requiring fine control **Resources**: https://milvus.io --- ### Qdrant **Purpose**: High-performance vector database **Key Features**: - Rust-based for performance - Rich filtering - Quantization support - Simple deployment **When to Use**: Performance-critical vector search **Installation**: ```bash docker run -p 6333:6333 qdrant/qdrant ``` **Resources**: https://qdrant.tech/documentation --- ### Chroma **Purpose**: Lightweight embedding database **Key Features**: - Simple API - Embedded or client-server mode - Good for prototyping - Python-native **When to Use**: Prototyping, small-scale applications **Installation**: ```bash pip install chromadb ``` **Resources**: https://docs.trychroma.com --- ## Feature Stores ### Feast **Purpose**: Open-source feature store **Key Features**: - Cloud-agnostic - Online and offline stores - Point-in-time joins - Python SDK **When to Use**: Feature management with flexibility **Installation**: ```bash pip install feast ``` **Resources**: https://feast.dev --- ### Tecton **Purpose**: Enterprise feature platform **Key Features**: - Real-time feature serving - Streaming feature computation - Enterprise governance - Managed service **When to Use**: Enterprise ML with real-time features **Resources**: https://www.tecton.ai --- ## Experiment Tracking ### MLflow **Purpose**: ML lifecycle management **Key Features**: - Experiment tracking - Model registry - Model serving - Open source **When to Use**: End-to-end ML lifecycle management **Installation**: ```bash pip install mlflow ``` **Example**: ```python import mlflow mlflow.start_run() mlflow.log_param("learning_rate", 0.01) mlflow.log_metric("accuracy", 0.95) mlflow.end_run() ``` **Resources**: https://mlflow.org --- ### Weights & Biases (W&B) **Purpose**: Experiment tracking and visualization **Key Features**: - Beautiful visualizations - Team collaboration - Hyperparameter sweeps - Model registry **When to Use**: Team-based ML development **Installation**: ```bash pip install wandb ``` **Resources**: https://docs.wandb.ai --- ## Data Processing ### Apache Spark **Purpose**: Distributed data processing **Key Features**: - Large-scale data processing - SQL, streaming, ML libraries - Widely deployed - Databricks integration **When to Use**: Large-scale batch processing **Resources**: https://spark.apache.org --- ### Databricks **Purpose**: Unified analytics platform **Key Features**: - Managed Spark - Unity Catalog - MLflow integration - Feature store **When to Use**: Enterprise data and ML platform **Resources**: https://docs.databricks.com --- ### Ray **Purpose**: Distributed computing framework **Key Features**: - Simple distributed Python - ML-focused libraries (Ray Train, Ray Serve) - Scales from laptop to cluster - Good for hyperparameter tuning **When to Use**: Distributed ML training and serving **Installation**: ```bash pip install ray[default] ``` **Resources**: https://docs.ray.io --- ## Orchestration ### Airflow **Purpose**: Workflow orchestration **Key Features**: - DAG-based workflows - Extensive operator library - Monitoring and alerting - Industry standard **When to Use**: Batch ML pipeline orchestration **Installation**: ```bash pip install apache-airflow ``` **Resources**: https://airflow.apache.org --- ### Prefect **Purpose**: Modern workflow orchestration **Key Features**: - Python-native - Easier than Airflow - Cloud or self-hosted - Good debugging **When to Use**: Python-centric teams wanting simpler orchestration **Installation**: ```bash pip install prefect ``` **Resources**: https://docs.prefect.io --- ### Kubeflow **Purpose**: ML on Kubernetes **Key Features**: - Kubernetes-native ML - Pipelines, notebooks, training - Model serving (KServe) - Multi-tenancy **When to Use**: Kubernetes-based ML platforms **Resources**: https://www.kubeflow.org --- ## Monitoring & Observability ### Prometheus + Grafana **Purpose**: Metrics collection and visualization **Key Features**: - Industry standard metrics - Rich visualization - Alerting - Large ecosystem **When to Use**: Infrastructure and application monitoring **Resources**: https://prometheus.io, https://grafana.com --- ### Datadog **Purpose**: Cloud monitoring platform **Key Features**: - Unified monitoring - ML monitoring capabilities - Log management - APM **When to Use**: Enterprise observability **Resources**: https://docs.datadoghq.com --- ### Arize AI **Purpose**: ML observability platform **Key Features**: - Model monitoring - Drift detection - Explainability - Embedding analysis **When to Use**: Production ML monitoring **Resources**: https://docs.arize.com --- ## Evaluation ### RAGAS **Purpose**: RAG evaluation framework **Key Features**: - RAG-specific metrics - Faithfulness, relevancy, context - LLM-based evaluation - Easy integration **When to Use**: Evaluating RAG systems **Installation**: ```bash pip install ragas ``` **Resources**: https://docs.ragas.io --- ### DeepEval **Purpose**: LLM evaluation framework **Key Features**: - Multiple evaluation metrics - Test-driven development for LLMs - CI/CD integration - Benchmarking **When to Use**: Systematic LLM evaluation **Installation**: ```bash pip install deepeval ``` **Resources**: https://docs.confident-ai.com --- ## API Providers ### OpenAI **Models**: GPT-5.5, GPT-5.4, DALL-E, Whisper **Strengths**: Leading capability, extensive ecosystem, reliable API **Pricing**: Pay-per-token, varies by model **Resources**: https://platform.openai.com --- ### Anthropic **Models**: Claude Opus 4.8, Sonnet 4.6, Haiku 4.5 **Strengths**: Strong reasoning, long context, safety focus **Pricing**: Pay-per-token **Resources**: https://docs.anthropic.com --- ### Google (Vertex AI) **Models**: Gemini family, specialized models (Imagen, Veo, embedding models) **Strengths**: Multimodal, GCP integration, enterprise features **Pricing**: Pay-per-token and character **Resources**: https://cloud.google.com/vertex-ai --- ### Amazon Bedrock **Models**: Claude, Llama, Titan, Stable Diffusion **Strengths**: AWS integration, multiple model providers **Pricing**: Pay-per-token/image **Resources**: https://docs.aws.amazon.com/bedrock --- ### Cohere **Models**: Command, Embed, Rerank **Strengths**: Enterprise focus, strong embeddings **Pricing**: Pay-per-token/embedding **Resources**: https://docs.cohere.com --- ## Development Tools ### LangSmith **Purpose**: LLM application development platform **Key Features**: - Tracing and debugging - Testing and evaluation - Monitoring - Prompt management **When to Use**: LangChain application development **Resources**: https://docs.smith.langchain.com --- ### OpenLLMetry **Purpose**: Open-source LLM observability **Key Features**: - OpenTelemetry-based - Multiple framework integrations - Vendor-agnostic - Trace visualization **When to Use**: LLM observability with OpenTelemetry **Installation**: ```bash pip install traceloop-sdk ``` **Resources**: https://github.com/traceloop/openllmetry --- ## Fine-Tuning & Training ### PEFT (Parameter-Efficient Fine-Tuning) **Purpose**: Efficient fine-tuning of large models **Key Features**: - LoRA, QLoRA, adapters - Minimal additional parameters - Works with Transformers - Memory efficient **When to Use**: Fine-tuning large models with limited resources **Installation**: ```bash pip install peft ``` **Example**: ```python from peft import get_peft_model, LoraConfig, TaskType config = LoraConfig( task_type=TaskType.CAUSAL_LM, r=8, lora_alpha=32, lora_dropout=0.1 ) model = get_peft_model(base_model, config) ``` **Resources**: https://huggingface.co/docs/peft --- ### Axolotl **Purpose**: Streamlined LLM fine-tuning **Key Features**: - YAML-based configuration - Multiple training techniques - Dataset preprocessing - Good defaults **When to Use**: Quick LLM fine-tuning without deep expertise **Resources**: https://github.com/axolotl-ai-cloud/axolotl --- ### DeepSpeed **Purpose**: Deep learning optimization library **Key Features**: - ZeRO memory optimization - Mixed precision training - Model parallelism - Microsoft-backed **When to Use**: Training large models across multiple GPUs **Installation**: ```bash pip install deepspeed ``` **Resources**: https://www.deepspeed.ai --- ### FSDP (Fully Sharded Data Parallel) **Purpose**: PyTorch distributed training **Key Features**: - Built into PyTorch - Memory-efficient training - Simpler than DeepSpeed for some cases - Native PyTorch integration **When to Use**: Distributed training with PyTorch **Resources**: https://pytorch.org/docs/stable/fsdp.html --- ## Data Quality & Validation ### Great Expectations **Purpose**: Data quality validation **Key Features**: - Expectation-based testing - Data documentation - CI/CD integration - Rich ecosystem **When to Use**: Data quality in pipelines **Installation**: ```bash pip install great_expectations ``` **Example**: ```python import great_expectations as gx context = gx.get_context() validator = context.get_validator(batch_request=batch_request) validator.expect_column_values_to_not_be_null("user_id") ``` **Resources**: https://docs.greatexpectations.io --- ### Pandera **Purpose**: Statistical data validation **Key Features**: - DataFrame validation - Statistical checks - Schema inference - Pandas/Polars support **When to Use**: DataFrame validation in Python **Installation**: ```bash pip install pandera ``` **Resources**: https://pandera.readthedocs.io --- ### DVC (Data Version Control) **Purpose**: Version control for ML data and models **Key Features**: - Git-like data versioning - Pipeline reproducibility - Cloud storage backends - Experiment tracking **When to Use**: Versioning data and models alongside code **Installation**: ```bash pip install dvc ``` **Resources**: https://dvc.org --- ## Testing & Quality ### pytest **Purpose**: Python testing framework **Key Features**: - Simple and powerful - Fixtures and parameterization - Plugin ecosystem - Industry standard **When to Use**: All Python testing **Installation**: ```bash pip install pytest ``` **Resources**: https://docs.pytest.org --- ### hypothesis **Purpose**: Property-based testing **Key Features**: - Automatic test case generation - Edge case discovery - Shrinking failing examples - Useful for data processing **When to Use**: Testing data processing and transformations **Installation**: ```bash pip install hypothesis ``` **Resources**: https://hypothesis.readthedocs.io --- ### Locust **Purpose**: Load testing **Key Features**: - Python-based test scripts - Distributed testing - Real-time metrics - Web UI **When to Use**: Load testing inference endpoints **Installation**: ```bash pip install locust ``` **Resources**: https://locust.io --- ## Security ### Guardrails AI **Purpose**: LLM output validation **Key Features**: - Structured output enforcement - Content filtering - Type validation - Retry mechanisms **When to Use**: Ensuring LLM outputs meet requirements **Installation**: ```bash pip install guardrails-ai ``` **Resources**: https://docs.guardrailsai.com --- ### NeMo Guardrails **Purpose**: Conversational AI safety **Key Features**: - Programmable guardrails - Topical rails - Dialogue management - NVIDIA-backed **When to Use**: Building safe conversational AI **Resources**: https://github.com/NVIDIA/NeMo-Guardrails --- ## Embeddings & Retrieval ### Sentence Transformers **Purpose**: Sentence embeddings library **Key Features**: - Pre-trained embedding models - Training utilities - Multiple loss functions - Simple API **When to Use**: Text embeddings for search and similarity **Installation**: ```bash pip install sentence-transformers ``` **Example**: ```python from sentence_transformers import SentenceTransformer model = SentenceTransformer('all-MiniLM-L6-v2') embeddings = model.encode(["Hello world", "How are you"]) ``` **Resources**: https://www.sbert.net --- ### FAISS **Purpose**: Vector similarity search **Key Features**: - Billion-scale search - GPU support - Multiple index types - Facebook/Meta research **When to Use**: Efficient vector search in Python **Installation**: ```bash pip install faiss-cpu # or faiss-gpu ``` **Resources**: https://faiss.ai --- ### BM25 (rank_bm25) **Purpose**: Sparse retrieval **Key Features**: - Classic keyword search - No embeddings needed - Fast and interpretable - Good baseline **When to Use**: Hybrid search, keyword matching **Installation**: ```bash pip install rank_bm25 ``` **Resources**: https://github.com/dorianbrown/rank_bm25 --- ## Prompt Engineering ### Guidance **Purpose**: Structured LLM generation **Key Features**: - Constrained generation - Template language - Control flow - Output parsing **When to Use**: Complex structured outputs **Installation**: ```bash pip install guidance ``` **Resources**: https://github.com/guidance-ai/guidance --- ### DSPy **Purpose**: Programming with foundation models **Key Features**: - Declarative programming - Automatic prompt optimization - Module composition - Stanford research **When to Use**: Systematic prompt engineering **Installation**: ```bash pip install dspy-ai ``` **Resources**: https://dspy-docs.vercel.app --- ## Container & Deployment ### Docker **Purpose**: Containerization **Key Features**: - Reproducible environments - Industry standard - Rich ecosystem - Essential for deployment **When to Use**: Packaging and deploying all applications **Resources**: https://docs.docker.com --- ### Kubernetes **Purpose**: Container orchestration **Key Features**: - Automatic scaling - Self-healing - Service discovery - Industry standard for production **When to Use**: Production deployment at scale **Resources**: https://kubernetes.io --- ### KServe **Purpose**: ML model serving on Kubernetes **Key Features**: - Serverless inference - Autoscaling - A/B testing - Multiple frameworks **When to Use**: Model serving on Kubernetes **Resources**: https://kserve.github.io --- ### BentoML **Purpose**: ML model serving framework **Key Features**: - Easy model packaging - Multiple frameworks - Adaptive batching - Cloud deployment **When to Use**: Packaging models for deployment **Installation**: ```bash pip install bentoml ``` **Resources**: https://docs.bentoml.org --- ## Cloud Platforms ### AWS SageMaker **Purpose**: AWS ML platform **Key Features**: - Managed training and inference - Jupyter notebooks - Model registry - MLOps features **When to Use**: ML on AWS **Resources**: https://docs.aws.amazon.com/sagemaker --- ### Google Cloud Vertex AI **Purpose**: GCP ML platform **Key Features**: - Managed ML pipelines - Model serving - AutoML - Gemini integration **When to Use**: ML on GCP **Resources**: https://cloud.google.com/vertex-ai/docs --- ### Azure Machine Learning **Purpose**: Azure ML platform **Key Features**: - End-to-end ML lifecycle - MLOps automation - Responsible AI tools - Azure OpenAI integration **When to Use**: ML on Azure **Resources**: https://docs.microsoft.com/azure/machine-learning --- ## Quick Selection Guide | Use Case | Recommended Tools | |----------|-------------------| | LLM Inference (High Volume) | vLLM, TGI, TensorRT-LLM | | LLM Inference (Local Dev) | Ollama | | RAG Application | LlamaIndex + Pinecone/Qdrant | | Complex LLM Workflows | LangChain | | Experiment Tracking | MLflow, W&B | | Feature Store | Feast (open source), Tecton (enterprise) | | Vector Database (Managed) | Pinecone | | Vector Database (Self-Hosted) | Milvus, Qdrant | | Pipeline Orchestration | Airflow, Prefect | | ML on Kubernetes | Kubeflow | | LLM Monitoring | Arize AI, LangSmith | | API Provider (General) | OpenAI, Anthropic | | API Provider (Enterprise/AWS) | Amazon Bedrock | | API Provider (Enterprise/GCP) | Vertex AI | --- ## Version Compatibility Notes The AI tool landscape evolves rapidly. When this book was written (early 2026): - **PyTorch 2.x** (2.12) is standard, with torch.compile providing significant speedups - **Transformers 5.x** (5.9, a major version) supports most modern models - **vLLM and TGI** are the dominant open-source inference engines - **Vector databases** have largely converged on similar feature sets Always check documentation for current versions and breaking changes. --- ## Additional Tools by Category ### Model Optimization | Tool | Purpose | When to Use | |------|---------|-------------| | ONNX Runtime | Cross-platform inference | Deploying models across different hardware | | TensorRT | NVIDIA GPU optimization | Maximum performance on NVIDIA hardware | | OpenVINO | Intel optimization | Inference on Intel CPUs/GPUs | | CoreML | Apple device deployment | iOS/macOS applications | | TFLite | Mobile deployment | Android/embedded devices | | llama.cpp | CPU-optimized LLM inference | Running LLMs on CPU | | ExLLamaV2 | Quantized LLM inference | Memory-efficient GPU inference | ### Data Processing | Tool | Purpose | When to Use | |------|---------|-------------| | Polars | Fast DataFrame operations | Large-scale data processing | | Dask | Parallel computing | Out-of-core and distributed pandas | | DuckDB | Embedded analytics | SQL queries on local data | | Delta Lake | ACID data lake storage | Reliable data lake operations | | Apache Arrow | Columnar memory format | High-performance data interchange | ### Annotation & Labeling | Tool | Purpose | When to Use | |------|---------|-------------| | Label Studio | Open-source labeling | Self-hosted data annotation | | Prodigy | Efficient annotation | Active learning-based labeling | | Scale AI | Enterprise labeling | Large-scale human labeling | | Amazon SageMaker Ground Truth | AWS labeling | Labeling with AWS integration | | Snorkel | Programmatic labeling | Weak supervision approaches | ### Distributed Training | Tool | Purpose | When to Use | |------|---------|-------------| | Horovod | Distributed training | Multi-GPU/multi-node training | | PyTorch Lightning | Training framework | Organized PyTorch training | | Mosaic ML Composer | Efficient training | Optimized training recipes | | ColossalAI | Large model training | Training very large models | ### AutoML & Hyperparameter Tuning | Tool | Purpose | When to Use | |------|---------|-------------| | Optuna | Hyperparameter optimization | Automatic hyperparameter tuning | | Ray Tune | Distributed HPO | Scalable hyperparameter search | | Auto-sklearn | AutoML | Automated model selection | | H2O AutoML | Enterprise AutoML | Business-focused AutoML | ### Explainability | Tool | Purpose | When to Use | |------|---------|-------------| | SHAP | Feature importance | Understanding model decisions | | LIME | Local explanations | Explaining individual predictions | | Captum | PyTorch interpretability | Deep learning explainability | | InterpretML | Unified interpretability | Glass-box and black-box models | ### Natural Language Processing | Tool | Purpose | When to Use | |------|---------|-------------| | spaCy | Industrial NLP | Production NLP pipelines | | NLTK | Educational NLP | Learning and prototyping | | Stanza | Research NLP | Stanford NLP models | | Flair | State-of-art NLP | Sequence labeling | ### Computer Vision | Tool | Purpose | When to Use | |------|---------|-------------| | OpenCV | Computer vision basics | Image processing operations | | Ultralytics | Object detection | YOLO models | | Detectron2 | Object detection | Facebook research models | | timm | Vision models | Pre-trained image models | ### Audio & Speech | Tool | Purpose | When to Use | |------|---------|-------------| | Whisper | Speech-to-text | Transcription | | TTS | Text-to-speech | Voice synthesis | | Librosa | Audio processing | Audio feature extraction | | PyAnnote | Speaker diarization | Who spoke when | --- ## Tool Selection Framework When selecting tools for a new project, consider these factors: ### 1. Scale Requirements **Small Scale (< 1M requests/month)**: - Ollama for local LLM - Chroma for vectors - SQLite/DuckDB for data - Simple deployment (single server) **Medium Scale (1M-100M requests/month)**: - vLLM/TGI for inference - Qdrant/Weaviate for vectors - PostgreSQL/Redis for data - Kubernetes deployment **Large Scale (> 100M requests/month)**: - TensorRT-LLM for inference - Pinecone/Milvus for vectors - Specialized data infrastructure - Multi-region deployment ### 2. Team Expertise **Data Science Background**: - Jupyter notebooks - MLflow for experiments - Scikit-learn patterns - Pandas-centric workflow **Software Engineering Background**: - Containerized workflows - API-first design - Proper testing - CI/CD integration **Mixed Teams**: - Unified platforms (Databricks, SageMaker) - Clear interfaces between components - Documentation emphasis ### 3. Cloud Strategy **AWS-Native**: - SageMaker for ML - Bedrock for LLMs - Lambda for serverless - S3 for storage **GCP-Native**: - Vertex AI for ML - Gemini for LLMs - Cloud Functions - BigQuery for data **Multi-Cloud / Agnostic**: - Open-source tools (MLflow, Kubeflow) - Portable formats (ONNX) - Terraform for infrastructure ### 4. Compliance Requirements **Regulated Industries**: - Self-hosted models - Private deployments - Audit logging - Data residency controls **Standard Business**: - Managed services acceptable - API providers viable - Standard security practices --- ## Migration Paths ### Moving from Prototyping to Production | Prototype | Production Alternative | |-----------|----------------------| | OpenAI API | vLLM + open model | | Chroma | Qdrant, Pinecone | | Jupyter notebooks | Python packages | | Local files | Feature store | | Manual deployment | CI/CD + Kubernetes | ### Scaling Existing Systems | Current State | Scaled State | |---------------|--------------| | Single GPU | Multi-GPU with DeepSpeed | | Single node | Kubernetes cluster | | Monolithic | Microservices | | Synchronous | Async + queues | | Single region | Multi-region | --- ## Cost Considerations ### Free/Open Source Options - **Inference**: vLLM, TGI, Ollama - **Vectors**: Qdrant, Milvus, Chroma - **Experiment Tracking**: MLflow - **Orchestration**: Airflow - **Feature Store**: Feast ### Managed Service Trade-offs | Aspect | Self-Hosted | Managed | |--------|-------------|---------| | Initial cost | Higher (setup) | Lower | | Ongoing cost | Lower at scale | Higher at scale | | Operational burden | Higher | Lower | | Customization | Full control | Limited | | Scaling | Manual | Automatic | --- ## Getting Started Recommendations ### For Individual Developers If you're learning or building personal projects: 1. **Start with**: Ollama (local LLMs), Chroma (vectors), LangChain (orchestration) 2. **Add as needed**: Hugging Face for models, MLflow for experiments 3. **Deploy with**: Docker, any cloud provider ### For Startups Building an MVP with limited resources: 1. **Start with**: OpenAI/Anthropic APIs, Pinecone (managed vectors), Vercel/Railway (deployment) 2. **Graduate to**: vLLM + open models when API costs become significant 3. **Focus on**: Product-market fit over infrastructure optimization ### For Enterprise Teams Building production systems at scale: 1. **Infrastructure**: Kubernetes, Terraform, GitOps 2. **ML Platform**: Databricks/SageMaker/Vertex AI or build with MLflow + Kubeflow 3. **Monitoring**: Arize AI or Evidently + Prometheus/Grafana 4. **Security**: Air-gapped options, self-hosted models for sensitive data ### For Research Teams Running experiments and publishing papers: 1. **Training**: PyTorch + W&B + DeepSpeed 2. **Compute**: Cloud spot instances, academic GPU clusters 3. **Reproducibility**: DVC, Docker, experiment configs as code --- ## Tool Ecosystem Compatibility Many tools work better together. Here are common compatible stacks: ### The Hugging Face Stack - Transformers (models) - Datasets (data loading) - PEFT (fine-tuning) - TGI (serving) - Hub (model hosting) ### The LangChain Stack - LangChain (orchestration) - LangSmith (debugging) - LangServe (deployment) - LangGraph (agents) ### The AWS Stack - SageMaker (training/serving) - Bedrock (LLM APIs) - S3 (storage) - Lambda (serverless) - Step Functions (orchestration) ### The Open Source Stack - PyTorch (training) - vLLM (serving) - MLflow (experiments) - Feast (features) - Airflow (orchestration) - Prometheus + Grafana (monitoring) --- ## Version Pinning Recommendations AI tools evolve rapidly. For production systems: ```txt # Example requirements.txt with pinning strategy # Pin major versions torch>=2.12,<3.0 transformers>=5.9,<6.0 # transformers v5 is a major version # Pin exact versions for critical inference vllm==0.19.0 # Allow patch updates for utilities langchain>=0.3,<0.4 ``` **Best Practices**: - Pin exact versions in production - Test upgrades in staging environments - Monitor release notes for breaking changes - Keep a changelog of version updates - Use virtual environments or containers for isolation --- ## Deprecated and Legacy Tools The following tools were popular but are now less recommended: | Tool | Status | Replacement | |------|--------|-------------| | TensorFlow 1.x | Deprecated | PyTorch or TF 2.x | | Keras (standalone) | Merged | tf.keras or PyTorch | | GPT-2 | Outdated | Modern LLMs | | BERT-base (original) | Outdated | Sentence Transformers, E5 | | Apache Zeppelin | Less active | Jupyter | | Dask-ML | Limited development | Ray, Spark | | Haystack (older versions) | Major changes | LlamaIndex, LangChain | When encountering tutorials using deprecated tools, look for updated alternatives or modern equivalents. --- ## Resources for Staying Current ### Official Documentation Always prefer official documentation over tutorials, as it reflects current versions. ### Release Tracking - GitHub release pages for major projects - PyPI for Python package updates - Hugging Face blog for ecosystem news ### Community Sources - r/MachineLearning and r/LocalLLaMA for discussions - Discord servers for major projects (LangChain, vLLM, etc.) - Twitter/X ML community for breaking developments ### This Book's Companion Check the book's repository for updated tool recommendations and errata: the AI landscape in 2027 will differ from what's documented here in 2026