Appendix B: Tool & Framework Reference

This appendix provides a reference guide to the major tools and frameworks used in AI engineering as of 2026. The landscape evolves rapidly—use this as a starting point and verify current versions and alternatives.

Last Updated: May 2026 Review Frequency: This appendix should be reviewed quarterly as the AI tooling landscape changes significantly every 3-6 months.


LLM Inference Engines

vLLM

Purpose: High-throughput LLM inference engine

Key Features:

  • PagedAttention for efficient memory management
  • Continuous batching for high throughput
  • Support for most popular open-source LLMs
  • OpenAI-compatible API

When to Use: Production LLM serving requiring high throughput

Installation:

pip install vllm

Example:

from vllm import LLM, SamplingParams

llm = LLM(model="meta-llama/Llama-4-Scout-17B-16E")
outputs = llm.generate(["Hello, how are you?"], SamplingParams(temperature=0.7))

Resources: https://docs.vllm.ai


Text Generation Inference (TGI)

Purpose: Hugging Face’s production inference server

Key Features:

  • Optimized for Hugging Face models
  • Flash Attention support
  • Quantization support (AWQ, GPTQ, bitsandbytes)
  • Prometheus metrics built-in

When to Use: Hugging Face model deployment, especially in containers

Docker Launch:

docker run --gpus all -p 8080:80 ghcr.io/huggingface/text-generation-inference \
    --model-id meta-llama/Llama-4-Scout-17B-16E

Resources: https://huggingface.co/docs/text-generation-inference


Ollama

Purpose: Local LLM inference for development

Key Features:

  • Simple installation and model management
  • Runs on consumer hardware
  • Good for development and testing
  • REST API compatible with OpenAI format

When to Use: Local development, prototyping, privacy-sensitive applications

Installation:

curl -fsSL https://ollama.com/install.sh | sh
ollama run llama4

Resources: https://ollama.com


TensorRT-LLM

Purpose: NVIDIA’s optimized LLM inference

Key Features:

  • Maximum performance on NVIDIA GPUs
  • INT8/FP8 quantization
  • Multi-GPU support
  • Inflight batching

When to Use: Production serving requiring maximum GPU efficiency

Resources: https://github.com/NVIDIA/TensorRT-LLM


ML Frameworks

PyTorch

Purpose: Primary deep learning framework

Key Features:

  • Dynamic computation graphs
  • Extensive ecosystem
  • Strong GPU support
  • Industry standard for research and production

When to Use: Almost all deep learning development

Installation:

pip install torch torchvision torchaudio

Resources: https://pytorch.org


Hugging Face Transformers

Purpose: Pre-trained model library and utilities

Key Features:

  • Thousands of pre-trained models
  • Unified API for different architectures
  • Training utilities (Trainer, PEFT)
  • Integration with Hugging Face Hub

When to Use: Working with transformer models

Installation:

pip install transformers

Example:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")

Resources: https://huggingface.co/docs/transformers


LangChain

Purpose: Framework for LLM application development

Key Features:

  • Chain composition for complex workflows
  • Built-in prompts and output parsers
  • Tool and agent support
  • Memory and context management

When to Use: Building LLM applications with complex workflows

Installation:

pip install langchain langchain-openai

Example:

from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate

llm = ChatOpenAI(model="gpt-5")
prompt = ChatPromptTemplate.from_template("Summarize: {text}")
chain = prompt | llm
result = chain.invoke({"text": "Long document..."})

Resources: https://python.langchain.com


LlamaIndex

Purpose: Data framework for LLM applications

Key Features:

  • Document loading and indexing
  • Various retrieval strategies
  • Query engines and chat engines
  • Strong RAG support

When to Use: RAG systems and document-based applications

Installation:

pip install llama-index

Resources: https://docs.llamaindex.ai


Vector Databases

Pinecone

Purpose: Managed vector database

Key Features:

  • Fully managed service
  • Low-latency queries at scale
  • Metadata filtering
  • Hybrid search support

When to Use: Production vector search without infrastructure management

Installation:

pip install pinecone

Resources: https://docs.pinecone.io


Weaviate

Purpose: Open-source vector database

Key Features:

  • Self-hosted or cloud
  • GraphQL API
  • Built-in vectorization modules
  • Hybrid search

When to Use: Self-hosted vector search with flexibility

Resources: https://weaviate.io/developers/weaviate


Milvus

Purpose: Open-source vector database for scale

Key Features:

  • Designed for billion-scale
  • Multiple index types
  • Kubernetes-native
  • GPU acceleration

When to Use: Large-scale vector search requiring fine control

Resources: https://milvus.io


Qdrant

Purpose: High-performance vector database

Key Features:

  • Rust-based for performance
  • Rich filtering
  • Quantization support
  • Simple deployment

When to Use: Performance-critical vector search

Installation:

docker run -p 6333:6333 qdrant/qdrant

Resources: https://qdrant.tech/documentation


Chroma

Purpose: Lightweight embedding database

Key Features:

  • Simple API
  • Embedded or client-server mode
  • Good for prototyping
  • Python-native

When to Use: Prototyping, small-scale applications

Installation:

pip install chromadb

Resources: https://docs.trychroma.com


Feature Stores

Feast

Purpose: Open-source feature store

Key Features:

  • Cloud-agnostic
  • Online and offline stores
  • Point-in-time joins
  • Python SDK

When to Use: Feature management with flexibility

Installation:

pip install feast

Resources: https://feast.dev


Tecton

Purpose: Enterprise feature platform

Key Features:

  • Real-time feature serving
  • Streaming feature computation
  • Enterprise governance
  • Managed service

When to Use: Enterprise ML with real-time features

Resources: https://www.tecton.ai


Experiment Tracking

MLflow

Purpose: ML lifecycle management

Key Features:

  • Experiment tracking
  • Model registry
  • Model serving
  • Open source

When to Use: End-to-end ML lifecycle management

Installation:

pip install mlflow

Example:

import mlflow

mlflow.start_run()
mlflow.log_param("learning_rate", 0.01)
mlflow.log_metric("accuracy", 0.95)
mlflow.end_run()

Resources: https://mlflow.org


Weights & Biases (W&B)

Purpose: Experiment tracking and visualization

Key Features:

  • Beautiful visualizations
  • Team collaboration
  • Hyperparameter sweeps
  • Model registry

When to Use: Team-based ML development

Installation:

pip install wandb

Resources: https://docs.wandb.ai


Data Processing

Apache Spark

Purpose: Distributed data processing

Key Features:

  • Large-scale data processing
  • SQL, streaming, ML libraries
  • Widely deployed
  • Databricks integration

When to Use: Large-scale batch processing

Resources: https://spark.apache.org


Databricks

Purpose: Unified analytics platform

Key Features:

  • Managed Spark
  • Unity Catalog
  • MLflow integration
  • Feature store

When to Use: Enterprise data and ML platform

Resources: https://docs.databricks.com


Ray

Purpose: Distributed computing framework

Key Features:

  • Simple distributed Python
  • ML-focused libraries (Ray Train, Ray Serve)
  • Scales from laptop to cluster
  • Good for hyperparameter tuning

When to Use: Distributed ML training and serving

Installation:

pip install ray[default]

Resources: https://docs.ray.io


Orchestration

Airflow

Purpose: Workflow orchestration

Key Features:

  • DAG-based workflows
  • Extensive operator library
  • Monitoring and alerting
  • Industry standard

When to Use: Batch ML pipeline orchestration

Installation:

pip install apache-airflow

Resources: https://airflow.apache.org


Prefect

Purpose: Modern workflow orchestration

Key Features:

  • Python-native
  • Easier than Airflow
  • Cloud or self-hosted
  • Good debugging

When to Use: Python-centric teams wanting simpler orchestration

Installation:

pip install prefect

Resources: https://docs.prefect.io


Kubeflow

Purpose: ML on Kubernetes

Key Features:

  • Kubernetes-native ML
  • Pipelines, notebooks, training
  • Model serving (KServe)
  • Multi-tenancy

When to Use: Kubernetes-based ML platforms

Resources: https://www.kubeflow.org


Monitoring & Observability

Prometheus + Grafana

Purpose: Metrics collection and visualization

Key Features:

  • Industry standard metrics
  • Rich visualization
  • Alerting
  • Large ecosystem

When to Use: Infrastructure and application monitoring

Resources: https://prometheus.io, https://grafana.com


Datadog

Purpose: Cloud monitoring platform

Key Features:

  • Unified monitoring
  • ML monitoring capabilities
  • Log management
  • APM

When to Use: Enterprise observability

Resources: https://docs.datadoghq.com


Arize AI

Purpose: ML observability platform

Key Features:

  • Model monitoring
  • Drift detection
  • Explainability
  • Embedding analysis

When to Use: Production ML monitoring

Resources: https://docs.arize.com


Evaluation

RAGAS

Purpose: RAG evaluation framework

Key Features:

  • RAG-specific metrics
  • Faithfulness, relevancy, context
  • LLM-based evaluation
  • Easy integration

When to Use: Evaluating RAG systems

Installation:

pip install ragas

Resources: https://docs.ragas.io


DeepEval

Purpose: LLM evaluation framework

Key Features:

  • Multiple evaluation metrics
  • Test-driven development for LLMs
  • CI/CD integration
  • Benchmarking

When to Use: Systematic LLM evaluation

Installation:

pip install deepeval

Resources: https://docs.confident-ai.com


API Providers

OpenAI

Models: GPT-5.5, GPT-5.4, DALL-E, Whisper

Strengths: Leading capability, extensive ecosystem, reliable API

Pricing: Pay-per-token, varies by model

Resources: https://platform.openai.com


Anthropic

Models: Claude Opus 4.8, Sonnet 4.6, Haiku 4.5

Strengths: Strong reasoning, long context, safety focus

Pricing: Pay-per-token

Resources: https://docs.anthropic.com


Google (Vertex AI)

Models: Gemini family, specialized models (Imagen, Veo, embedding models)

Strengths: Multimodal, GCP integration, enterprise features

Pricing: Pay-per-token and character

Resources: https://cloud.google.com/vertex-ai


Amazon Bedrock

Models: Claude, Llama, Titan, Stable Diffusion

Strengths: AWS integration, multiple model providers

Pricing: Pay-per-token/image

Resources: https://docs.aws.amazon.com/bedrock


Cohere

Models: Command, Embed, Rerank

Strengths: Enterprise focus, strong embeddings

Pricing: Pay-per-token/embedding

Resources: https://docs.cohere.com


Development Tools

LangSmith

Purpose: LLM application development platform

Key Features:

  • Tracing and debugging
  • Testing and evaluation
  • Monitoring
  • Prompt management

When to Use: LangChain application development

Resources: https://docs.smith.langchain.com


OpenLLMetry

Purpose: Open-source LLM observability

Key Features:

  • OpenTelemetry-based
  • Multiple framework integrations
  • Vendor-agnostic
  • Trace visualization

When to Use: LLM observability with OpenTelemetry

Installation:

pip install traceloop-sdk

Resources: https://github.com/traceloop/openllmetry


Fine-Tuning & Training

PEFT (Parameter-Efficient Fine-Tuning)

Purpose: Efficient fine-tuning of large models

Key Features:

  • LoRA, QLoRA, adapters
  • Minimal additional parameters
  • Works with Transformers
  • Memory efficient

When to Use: Fine-tuning large models with limited resources

Installation:

pip install peft

Example:

from peft import get_peft_model, LoraConfig, TaskType

config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    r=8,
    lora_alpha=32,
    lora_dropout=0.1
)
model = get_peft_model(base_model, config)

Resources: https://huggingface.co/docs/peft


Axolotl

Purpose: Streamlined LLM fine-tuning

Key Features:

  • YAML-based configuration
  • Multiple training techniques
  • Dataset preprocessing
  • Good defaults

When to Use: Quick LLM fine-tuning without deep expertise

Resources: https://github.com/axolotl-ai-cloud/axolotl


DeepSpeed

Purpose: Deep learning optimization library

Key Features:

  • ZeRO memory optimization
  • Mixed precision training
  • Model parallelism
  • Microsoft-backed

When to Use: Training large models across multiple GPUs

Installation:

pip install deepspeed

Resources: https://www.deepspeed.ai


FSDP (Fully Sharded Data Parallel)

Purpose: PyTorch distributed training

Key Features:

  • Built into PyTorch
  • Memory-efficient training
  • Simpler than DeepSpeed for some cases
  • Native PyTorch integration

When to Use: Distributed training with PyTorch

Resources: https://pytorch.org/docs/stable/fsdp.html


Data Quality & Validation

Great Expectations

Purpose: Data quality validation

Key Features:

  • Expectation-based testing
  • Data documentation
  • CI/CD integration
  • Rich ecosystem

When to Use: Data quality in pipelines

Installation:

pip install great_expectations

Example:

import great_expectations as gx

context = gx.get_context()
validator = context.get_validator(batch_request=batch_request)
validator.expect_column_values_to_not_be_null("user_id")

Resources: https://docs.greatexpectations.io


Pandera

Purpose: Statistical data validation

Key Features:

  • DataFrame validation
  • Statistical checks
  • Schema inference
  • Pandas/Polars support

When to Use: DataFrame validation in Python

Installation:

pip install pandera

Resources: https://pandera.readthedocs.io


DVC (Data Version Control)

Purpose: Version control for ML data and models

Key Features:

  • Git-like data versioning
  • Pipeline reproducibility
  • Cloud storage backends
  • Experiment tracking

When to Use: Versioning data and models alongside code

Installation:

pip install dvc

Resources: https://dvc.org


Testing & Quality

pytest

Purpose: Python testing framework

Key Features:

  • Simple and powerful
  • Fixtures and parameterization
  • Plugin ecosystem
  • Industry standard

When to Use: All Python testing

Installation:

pip install pytest

Resources: https://docs.pytest.org


hypothesis

Purpose: Property-based testing

Key Features:

  • Automatic test case generation
  • Edge case discovery
  • Shrinking failing examples
  • Useful for data processing

When to Use: Testing data processing and transformations

Installation:

pip install hypothesis

Resources: https://hypothesis.readthedocs.io


Locust

Purpose: Load testing

Key Features:

  • Python-based test scripts
  • Distributed testing
  • Real-time metrics
  • Web UI

When to Use: Load testing inference endpoints

Installation:

pip install locust

Resources: https://locust.io


Security

Guardrails AI

Purpose: LLM output validation

Key Features:

  • Structured output enforcement
  • Content filtering
  • Type validation
  • Retry mechanisms

When to Use: Ensuring LLM outputs meet requirements

Installation:

pip install guardrails-ai

Resources: https://docs.guardrailsai.com


NeMo Guardrails

Purpose: Conversational AI safety

Key Features:

  • Programmable guardrails
  • Topical rails
  • Dialogue management
  • NVIDIA-backed

When to Use: Building safe conversational AI

Resources: https://github.com/NVIDIA/NeMo-Guardrails


Embeddings & Retrieval

Sentence Transformers

Purpose: Sentence embeddings library

Key Features:

  • Pre-trained embedding models
  • Training utilities
  • Multiple loss functions
  • Simple API

When to Use: Text embeddings for search and similarity

Installation:

pip install sentence-transformers

Example:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = model.encode(["Hello world", "How are you"])

Resources: https://www.sbert.net


FAISS

Purpose: Vector similarity search

Key Features:

  • Billion-scale search
  • GPU support
  • Multiple index types
  • Facebook/Meta research

When to Use: Efficient vector search in Python

Installation:

pip install faiss-cpu  # or faiss-gpu

Resources: https://faiss.ai


BM25 (rank_bm25)

Purpose: Sparse retrieval

Key Features:

  • Classic keyword search
  • No embeddings needed
  • Fast and interpretable
  • Good baseline

When to Use: Hybrid search, keyword matching

Installation:

pip install rank_bm25

Resources: https://github.com/dorianbrown/rank_bm25


Prompt Engineering

Guidance

Purpose: Structured LLM generation

Key Features:

  • Constrained generation
  • Template language
  • Control flow
  • Output parsing

When to Use: Complex structured outputs

Installation:

pip install guidance

Resources: https://github.com/guidance-ai/guidance


DSPy

Purpose: Programming with foundation models

Key Features:

  • Declarative programming
  • Automatic prompt optimization
  • Module composition
  • Stanford research

When to Use: Systematic prompt engineering

Installation:

pip install dspy-ai

Resources: https://dspy-docs.vercel.app


Container & Deployment

Docker

Purpose: Containerization

Key Features:

  • Reproducible environments
  • Industry standard
  • Rich ecosystem
  • Essential for deployment

When to Use: Packaging and deploying all applications

Resources: https://docs.docker.com


Kubernetes

Purpose: Container orchestration

Key Features:

  • Automatic scaling
  • Self-healing
  • Service discovery
  • Industry standard for production

When to Use: Production deployment at scale

Resources: https://kubernetes.io


KServe

Purpose: ML model serving on Kubernetes

Key Features:

  • Serverless inference
  • Autoscaling
  • A/B testing
  • Multiple frameworks

When to Use: Model serving on Kubernetes

Resources: https://kserve.github.io


BentoML

Purpose: ML model serving framework

Key Features:

  • Easy model packaging
  • Multiple frameworks
  • Adaptive batching
  • Cloud deployment

When to Use: Packaging models for deployment

Installation:

pip install bentoml

Resources: https://docs.bentoml.org


Cloud Platforms

AWS SageMaker

Purpose: AWS ML platform

Key Features:

  • Managed training and inference
  • Jupyter notebooks
  • Model registry
  • MLOps features

When to Use: ML on AWS

Resources: https://docs.aws.amazon.com/sagemaker


Google Cloud Vertex AI

Purpose: GCP ML platform

Key Features:

  • Managed ML pipelines
  • Model serving
  • AutoML
  • Gemini integration

When to Use: ML on GCP

Resources: https://cloud.google.com/vertex-ai/docs


Azure Machine Learning

Purpose: Azure ML platform

Key Features:

  • End-to-end ML lifecycle
  • MLOps automation
  • Responsible AI tools
  • Azure OpenAI integration

When to Use: ML on Azure

Resources: https://docs.microsoft.com/azure/machine-learning


Quick Selection Guide

Use Case Recommended Tools
LLM Inference (High Volume) vLLM, TGI, TensorRT-LLM
LLM Inference (Local Dev) Ollama
RAG Application LlamaIndex + Pinecone/Qdrant
Complex LLM Workflows LangChain
Experiment Tracking MLflow, W&B
Feature Store Feast (open source), Tecton (enterprise)
Vector Database (Managed) Pinecone
Vector Database (Self-Hosted) Milvus, Qdrant
Pipeline Orchestration Airflow, Prefect
ML on Kubernetes Kubeflow
LLM Monitoring Arize AI, LangSmith
API Provider (General) OpenAI, Anthropic
API Provider (Enterprise/AWS) Amazon Bedrock
API Provider (Enterprise/GCP) Vertex AI

Version Compatibility Notes

The AI tool landscape evolves rapidly. When this book was written (early 2026):

  • PyTorch 2.x (2.12) is standard, with torch.compile providing significant speedups
  • Transformers 5.x (5.9, a major version) supports most modern models
  • vLLM and TGI are the dominant open-source inference engines
  • Vector databases have largely converged on similar feature sets

Always check documentation for current versions and breaking changes.


Additional Tools by Category

Model Optimization

Tool Purpose When to Use
ONNX Runtime Cross-platform inference Deploying models across different hardware
TensorRT NVIDIA GPU optimization Maximum performance on NVIDIA hardware
OpenVINO Intel optimization Inference on Intel CPUs/GPUs
CoreML Apple device deployment iOS/macOS applications
TFLite Mobile deployment Android/embedded devices
llama.cpp CPU-optimized LLM inference Running LLMs on CPU
ExLLamaV2 Quantized LLM inference Memory-efficient GPU inference

Data Processing

Tool Purpose When to Use
Polars Fast DataFrame operations Large-scale data processing
Dask Parallel computing Out-of-core and distributed pandas
DuckDB Embedded analytics SQL queries on local data
Delta Lake ACID data lake storage Reliable data lake operations
Apache Arrow Columnar memory format High-performance data interchange

Annotation & Labeling

Tool Purpose When to Use
Label Studio Open-source labeling Self-hosted data annotation
Prodigy Efficient annotation Active learning-based labeling
Scale AI Enterprise labeling Large-scale human labeling
Amazon SageMaker Ground Truth AWS labeling Labeling with AWS integration
Snorkel Programmatic labeling Weak supervision approaches

Distributed Training

Tool Purpose When to Use
Horovod Distributed training Multi-GPU/multi-node training
PyTorch Lightning Training framework Organized PyTorch training
Mosaic ML Composer Efficient training Optimized training recipes
ColossalAI Large model training Training very large models

AutoML & Hyperparameter Tuning

Tool Purpose When to Use
Optuna Hyperparameter optimization Automatic hyperparameter tuning
Ray Tune Distributed HPO Scalable hyperparameter search
Auto-sklearn AutoML Automated model selection
H2O AutoML Enterprise AutoML Business-focused AutoML

Explainability

Tool Purpose When to Use
SHAP Feature importance Understanding model decisions
LIME Local explanations Explaining individual predictions
Captum PyTorch interpretability Deep learning explainability
InterpretML Unified interpretability Glass-box and black-box models

Natural Language Processing

Tool Purpose When to Use
spaCy Industrial NLP Production NLP pipelines
NLTK Educational NLP Learning and prototyping
Stanza Research NLP Stanford NLP models
Flair State-of-art NLP Sequence labeling

Computer Vision

Tool Purpose When to Use
OpenCV Computer vision basics Image processing operations
Ultralytics Object detection YOLO models
Detectron2 Object detection Facebook research models
timm Vision models Pre-trained image models

Audio & Speech

Tool Purpose When to Use
Whisper Speech-to-text Transcription
TTS Text-to-speech Voice synthesis
Librosa Audio processing Audio feature extraction
PyAnnote Speaker diarization Who spoke when

Tool Selection Framework

When selecting tools for a new project, consider these factors:

1. Scale Requirements

Small Scale (< 1M requests/month):

  • Ollama for local LLM
  • Chroma for vectors
  • SQLite/DuckDB for data
  • Simple deployment (single server)

Medium Scale (1M-100M requests/month):

  • vLLM/TGI for inference
  • Qdrant/Weaviate for vectors
  • PostgreSQL/Redis for data
  • Kubernetes deployment

Large Scale (> 100M requests/month):

  • TensorRT-LLM for inference
  • Pinecone/Milvus for vectors
  • Specialized data infrastructure
  • Multi-region deployment

2. Team Expertise

Data Science Background:

  • Jupyter notebooks
  • MLflow for experiments
  • Scikit-learn patterns
  • Pandas-centric workflow

Software Engineering Background:

  • Containerized workflows
  • API-first design
  • Proper testing
  • CI/CD integration

Mixed Teams:

  • Unified platforms (Databricks, SageMaker)
  • Clear interfaces between components
  • Documentation emphasis

3. Cloud Strategy

AWS-Native:

  • SageMaker for ML
  • Bedrock for LLMs
  • Lambda for serverless
  • S3 for storage

GCP-Native:

  • Vertex AI for ML
  • Gemini for LLMs
  • Cloud Functions
  • BigQuery for data

Multi-Cloud / Agnostic:

  • Open-source tools (MLflow, Kubeflow)
  • Portable formats (ONNX)
  • Terraform for infrastructure

4. Compliance Requirements

Regulated Industries:

  • Self-hosted models
  • Private deployments
  • Audit logging
  • Data residency controls

Standard Business:

  • Managed services acceptable
  • API providers viable
  • Standard security practices

Migration Paths

Moving from Prototyping to Production

Prototype Production Alternative
OpenAI API vLLM + open model
Chroma Qdrant, Pinecone
Jupyter notebooks Python packages
Local files Feature store
Manual deployment CI/CD + Kubernetes

Scaling Existing Systems

Current State Scaled State
Single GPU Multi-GPU with DeepSpeed
Single node Kubernetes cluster
Monolithic Microservices
Synchronous Async + queues
Single region Multi-region

Cost Considerations

Free/Open Source Options

  • Inference: vLLM, TGI, Ollama
  • Vectors: Qdrant, Milvus, Chroma
  • Experiment Tracking: MLflow
  • Orchestration: Airflow
  • Feature Store: Feast

Managed Service Trade-offs

Aspect Self-Hosted Managed
Initial cost Higher (setup) Lower
Ongoing cost Lower at scale Higher at scale
Operational burden Higher Lower
Customization Full control Limited
Scaling Manual Automatic

Getting Started Recommendations

For Individual Developers

If you’re learning or building personal projects:

  1. Start with: Ollama (local LLMs), Chroma (vectors), LangChain (orchestration)
  2. Add as needed: Hugging Face for models, MLflow for experiments
  3. Deploy with: Docker, any cloud provider

For Startups

Building an MVP with limited resources:

  1. Start with: OpenAI/Anthropic APIs, Pinecone (managed vectors), Vercel/Railway (deployment)
  2. Graduate to: vLLM + open models when API costs become significant
  3. Focus on: Product-market fit over infrastructure optimization

For Enterprise Teams

Building production systems at scale:

  1. Infrastructure: Kubernetes, Terraform, GitOps
  2. ML Platform: Databricks/SageMaker/Vertex AI or build with MLflow + Kubeflow
  3. Monitoring: Arize AI or Evidently + Prometheus/Grafana
  4. Security: Air-gapped options, self-hosted models for sensitive data

For Research Teams

Running experiments and publishing papers:

  1. Training: PyTorch + W&B + DeepSpeed
  2. Compute: Cloud spot instances, academic GPU clusters
  3. Reproducibility: DVC, Docker, experiment configs as code

Tool Ecosystem Compatibility

Many tools work better together. Here are common compatible stacks:

The Hugging Face Stack

  • Transformers (models)
  • Datasets (data loading)
  • PEFT (fine-tuning)
  • TGI (serving)
  • Hub (model hosting)

The LangChain Stack

  • LangChain (orchestration)
  • LangSmith (debugging)
  • LangServe (deployment)
  • LangGraph (agents)

The AWS Stack

  • SageMaker (training/serving)
  • Bedrock (LLM APIs)
  • S3 (storage)
  • Lambda (serverless)
  • Step Functions (orchestration)

The Open Source Stack

  • PyTorch (training)
  • vLLM (serving)
  • MLflow (experiments)
  • Feast (features)
  • Airflow (orchestration)
  • Prometheus + Grafana (monitoring)

Version Pinning Recommendations

AI tools evolve rapidly. For production systems:

# Example requirements.txt with pinning strategy

# Pin major versions
torch>=2.12,<3.0
transformers>=5.9,<6.0  # transformers v5 is a major version

# Pin exact versions for critical inference
vllm==0.19.0

# Allow patch updates for utilities
langchain>=0.3,<0.4

Best Practices:

  • Pin exact versions in production
  • Test upgrades in staging environments
  • Monitor release notes for breaking changes
  • Keep a changelog of version updates
  • Use virtual environments or containers for isolation

Deprecated and Legacy Tools

The following tools were popular but are now less recommended:

Tool Status Replacement
TensorFlow 1.x Deprecated PyTorch or TF 2.x
Keras (standalone) Merged tf.keras or PyTorch
GPT-2 Outdated Modern LLMs
BERT-base (original) Outdated Sentence Transformers, E5
Apache Zeppelin Less active Jupyter
Dask-ML Limited development Ray, Spark
Haystack (older versions) Major changes LlamaIndex, LangChain

When encountering tutorials using deprecated tools, look for updated alternatives or modern equivalents.


Resources for Staying Current

Official Documentation

Always prefer official documentation over tutorials, as it reflects current versions.

Release Tracking

  • GitHub release pages for major projects
  • PyPI for Python package updates
  • Hugging Face blog for ecosystem news

Community Sources

  • r/MachineLearning and r/LocalLLaMA for discussions
  • Discord servers for major projects (LangChain, vLLM, etc.)
  • Twitter/X ML community for breaking developments

This Book’s Companion

Check the book’s repository for updated tool recommendations and errata: the AI landscape in 2027 will differ from what’s documented here in 2026