The Pragmatic Programmer for Machine Learning

Engineering Analytics and Data Science Solutions


M. Scutari and M. Malvestio (2023).
Texts in Machine Learning & Pattern Recognition, Chapman & Hall/CRC.
ISBN-10: 0367263508
ISBN-13: 978-0367263508
CRC Website
Amazon Website
photo

Online Materials

The code for the case study in Chapter 12, “Recommending Recommendations”, is available on GitHub at https://github.com/pragprogml/ppml-rr.

The book itself is available at ppml.dev.

Some material from Chapters 5, 6 and 7 has been included in Wiley StatsRef: Statistics Reference Online in the articles:

  • Machine Learning Software and Pipelines [ link ].
  • Developing and Running Machine Learning Software: Machine Learning Operations (MLOps) [ link ].

Errata Corrige

None yet.

Table of Contents

  1. What is This Book About?

    1. Machine Learning
    2. Data Science
    3. Software Engineering
    4. How Do They Go Together?
  2. Hardware Architectures


    1. Types of Hardware
      1. Compute
      2. Memory
      3. Connections
    2. Making Hardware Live Up to Expectations
    3. Local and Remote Hardware
    4. Choosing the Right Hardware for the Job
  3. Variable Types and Data Structures

    1. Variable Types
      1. Integers
      2. Floating Point
      3. Strings
    2. Data Structures
      1. Vectors and Lists
      2. Representing Data with Data Frames
      3. Dense and Sparse Matrices
    3. Choosing the Right Variable Type for the Job
    4. Choosing the Right Data Structures for the Job
  4. Analysis of Algorithms

    1. Writing Pseudocode
    2. Computational Complexity and Big-O Notation
    3. Big-O Notation and Benchmarking
    4. Some Examples of Algorithm Analysis
      1. Estimating Linear Regression Models
      2. Sparse Matrices Representation
      3. Uniform Simulations of Directed Acylic Graphs
    5. Big-O Notation and Real-World Performance
  5. Designing and Structuring Pipelines

    1. Data as Code
    2. Technical Debt
      1. At the Data Level
      2. At the Model Level
      3. At the Architecture (Design) Level
      4. At the Code Level
    3. Machine Learning Pipeline
      1. Project Scoping
      2. Producing a Baseline Implementation
      3. Data Ingestion and Preparation
      4. Model Training, Evaluation and Validation
      5. Deployment, Serving and Inference
      6. Monitoring, Logging and Reporting
  6. Writing Machine Learning Code

    1. Choosing Languages and Libraries
    2. Naming Things
    3. Coding Styles and Coding Standards
    4. Filesystem Structure
    5. Effective Versioning
    6. Code Review
    7. Refactoring
    8. Reworking Academic Code: An Example
  7. Packaging and Deploying Pipelines

    1. Model Packaging
      1. Standalone Packaging
      2. Programming Languages Package Managers
      3. Virtual Machines
      4. Containers
    2. Model Deployment: Strategies
    3. Model Deployment: Infrastructure
    4. Model Deployment: Monitoring and Logging
    5. What Could Possibly Go Wrong?
    6. Rolling Back
  8. Documenting Pipelines

    1. Comments
    2. Documenting Public Interfaces
    3. Documenting Architecture and Design
    4. Documenting Algorithms and Business Cases
    5. Illustrating Practical Use Cases
  9. Troubleshooting and Testing Pipelines

    1. Data are the Problem
      1. Large Data
      2. Heterogeneous Data
      3. Dynamic Data
    2. Model are the Problem
      1. Large Models
      2. Black-Box Models
      3. Costly Models
      4. Many Models
    3. Common Signs That Something is Up
    4. Tests are the Solution
      1. What Do We Want to Achieve?
      2. What Should We Test?
      3. Online and Offline Data
      4. Testing Local and Testing Global
      5. Conceptual and Implementation Errors
      6. Code Coverage and Test Prioritisation
  10. Tools for Developing Pipelines

    1. Data Exploration and Experiment Tracking
    2. Code Development
      1. Code Editors and IDEs
      2. Notebooks
      3. Accessing Data and Documentation
    3. Build, Test and Documentation Tools
  11. Tools to Manage Pipelines in Production

    1. Infrastructure Management
    2. Machine Learning Software Management
    3. Dashboards, Visualisation and Reporting
  12. Recommending Recommendations: A Recommender System Using Natural Language Understanding

    1. The Domain Problem
    2. The Machine Learning Model
    3. The Infrastructure
    4. The Architecture of the Pipelin
      1. Data Ingestion and Data Preparation
      2. Data Tracking and Versioning
      3. Training and Experiment Tracking
      4. Model Packaging
      5. Deployment and Inference