The Pragmatic Programmer for Machine Learning

Engineering Analytics and Data Science Solutions

M. Scutari and M. Malvestio (2023).
Texts in Machine Learning & Pattern Recognition, Chapman & Hall/CRC.
ISBN-10: 0367263508
ISBN-13: 978-0367263508
CRC Website
Amazon Website

Online Materials

The code for the case study in Chapter 12, “Recommending Recommendations”, is available on GitHub at https://github.com/pragprogml/ppml-rr.

The book itself is available at ppml.dev.

Some material from Chapters 5, 6 and 7 has been included in Wiley StatsRef: Statistics Reference Online in the articles:

Machine Learning Software and Pipelines [ link ].
Developing and Running Machine Learning Software: Machine Learning Operations (MLOps) [ link ].

Errata Corrige

None yet.

What is This Book About?
1. Machine Learning
2. Data Science
3. Software Engineering
4. How Do They Go Together?
Hardware Architectures
1. Types of Hardware
  1. Compute
  2. Memory
  3. Connections
2. Making Hardware Live Up to Expectations
3. Local and Remote Hardware
4. Choosing the Right Hardware for the Job
Variable Types and Data Structures
1. Variable Types
  1. Integers
  2. Floating Point
  3. Strings
2. Data Structures
  1. Vectors and Lists
  2. Representing Data with Data Frames
  3. Dense and Sparse Matrices
3. Choosing the Right Variable Type for the Job
4. Choosing the Right Data Structures for the Job
Analysis of Algorithms
1. Writing Pseudocode
2. Computational Complexity and Big-O Notation
3. Big-O Notation and Benchmarking
4. Some Examples of Algorithm Analysis
  1. Estimating Linear Regression Models
  2. Sparse Matrices Representation
  3. Uniform Simulations of Directed Acylic Graphs
5. Big-O Notation and Real-World Performance
Designing and Structuring Pipelines
1. Data as Code
2. Technical Debt
  1. At the Data Level
  2. At the Model Level
  3. At the Architecture (Design) Level
  4. At the Code Level
3. Machine Learning Pipeline
  1. Project Scoping
  2. Producing a Baseline Implementation
  3. Data Ingestion and Preparation
  4. Model Training, Evaluation and Validation
  5. Deployment, Serving and Inference
  6. Monitoring, Logging and Reporting
Writing Machine Learning Code
1. Choosing Languages and Libraries
2. Naming Things
3. Coding Styles and Coding Standards
4. Filesystem Structure
5. Effective Versioning
6. Code Review
7. Refactoring
8. Reworking Academic Code: An Example
Packaging and Deploying Pipelines
1. Model Packaging
  1. Standalone Packaging
  2. Programming Languages Package Managers
  3. Virtual Machines
  4. Containers
2. Model Deployment: Strategies
3. Model Deployment: Infrastructure
4. Model Deployment: Monitoring and Logging
5. What Could Possibly Go Wrong?
6. Rolling Back
Documenting Pipelines
1. Comments
2. Documenting Public Interfaces
3. Documenting Architecture and Design
4. Documenting Algorithms and Business Cases
5. Illustrating Practical Use Cases
Troubleshooting and Testing Pipelines
1. Data are the Problem
  1. Large Data
  2. Heterogeneous Data
  3. Dynamic Data
2. Model are the Problem
  1. Large Models
  2. Black-Box Models
  3. Costly Models
  4. Many Models
3. Common Signs That Something is Up
4. Tests are the Solution
  1. What Do We Want to Achieve?
  2. What Should We Test?
  3. Online and Offline Data
  4. Testing Local and Testing Global
  5. Conceptual and Implementation Errors
  6. Code Coverage and Test Prioritisation
Tools for Developing Pipelines
1. Data Exploration and Experiment Tracking
2. Code Development
  1. Code Editors and IDEs
  2. Notebooks
  3. Accessing Data and Documentation
3. Build, Test and Documentation Tools
Tools to Manage Pipelines in Production
1. Infrastructure Management
2. Machine Learning Software Management
3. Dashboards, Visualisation and Reporting
Recommending Recommendations: A Recommender System Using Natural Language Understanding
1. The Domain Problem
2. The Machine Learning Model
3. The Infrastructure
4. The Architecture of the Pipelin
  1. Data Ingestion and Data Preparation
  2. Data Tracking and Versioning
  3. Training and Experiment Tracking
  4. Model Packaging
  5. Deployment and Inference

The Pragmatic Programmer for Machine Learning

Engineering Analytics and Data Science Solutions

Online Materials

Errata Corrige

Table of Contents

What is This Book About?

Hardware Architectures

Variable Types and Data Structures

Analysis of Algorithms

Designing and Structuring Pipelines

Writing Machine Learning Code

Packaging and Deploying Pipelines

Documenting Pipelines

Troubleshooting and Testing Pipelines

Tools for Developing Pipelines

Tools to Manage Pipelines in Production

Recommending Recommendations: A Recommender System Using Natural Language Understanding