The Pragmatic Programmer for Machine Learning
Engineering Analytics and Data Science Solutions
M. Scutari and M. Malvestio (2023).
Texts in Machine Learning & Pattern Recognition, Chapman & Hall/CRC.
ISBN-10: 0367263508
ISBN-13: 978-0367263508
CRC Website
Amazon Website
Online Materials
The code for the case study in Chapter 12, “Recommending Recommendations”, is available on GitHub at https://github.com/pragprogml/ppml-rr.
The book itself is available at ppml.dev.
Errata Corrige
None yet.
Table of Contents
What is This Book About?
- Machine Learning
- Data Science
- Software Engineering
- How Do They Go Together?
Hardware Architectures
- Types of Hardware
- Compute
- Memory
- Connections
- Making Hardware Live Up to Expectations
- Local and Remote Hardware
- Choosing the Right Hardware for the Job
- Types of Hardware
Variable Types and Data Structures
- Variable Types
- Integers
- Floating Point
- Strings
- Data Structures
- Vectors and Lists
- Representing Data with Data Frames
- Dense and Sparse Matrices
- Choosing the Right Variable Type for the Job
- Choosing the Right Data Structures for the Job
- Variable Types
Analysis of Algorithms
- Writing Pseudocode
- Computational Complexity and Big-O Notation
- Big-O Notation and Benchmarking
- Some Examples of Algorithm Analysis
- Estimating Linear Regression Models
- Sparse Matrices Representation
- Uniform Simulations of Directed Acylic Graphs
- Big-O Notation and Real-World Performance
Designing and Structuring Pipelines
- Data as Code
- Technical Debt
- At the Data Level
- At the Model Level
- At the Architecture (Design) Level
- At the Code Level
- Machine Learning Pipeline
- Project Scoping
- Producing a Baseline Implementation
- Data Ingestion and Preparation
- Model Training, Evaluation and Validation
- Deployment, Serving and Inference
- Monitoring, Logging and Reporting
Writing Machine Learning Code
- Choosing Languages and Libraries
- Naming Things
- Coding Styles and Coding Standards
- Filesystem Structure
- Effective Versioning
- Code Review
- Refactoring
- Reworking Academic Code: An Example
Packaging and Deploying Pipelines
- Model Packaging
- Standalone Packaging
- Programming Languages Package Managers
- Virtual Machines
- Containers
- Model Deployment: Strategies
- Model Deployment: Infrastructure
- Model Deployment: Monitoring and Logging
- What Could Possibly Go Wrong?
- Rolling Back
- Model Packaging
Documenting Pipelines
- Comments
- Documenting Public Interfaces
- Documenting Architecture and Design
- Documenting Algorithms and Business Cases
- Illustrating Practical Use Cases
Troubleshooting and Testing Pipelines
- Data are the Problem
- Large Data
- Heterogeneous Data
- Dynamic Data
- Model are the Problem
- Large Models
- Black-Box Models
- Costly Models
- Many Models
- Common Signs That Something is Up
- Tests are the Solution
- What Do We Want to Achieve?
- What Should We Test?
- Online and Offline Data
- Testing Local and Testing Global
- Conceptual and Implementation Errors
- Code Coverage and Test Prioritisation
- Data are the Problem
Tools for Developing Pipelines
- Data Exploration and Experiment Tracking
- Code Development
- Code Editors and IDEs
- Notebooks
- Accessing Data and Documentation
- Build, Test and Documentation Tools
Tools to Manage Pipelines in Production
- Infrastructure Management
- Machine Learning Software Management
- Dashboards, Visualisation and Reporting
Recommending Recommendations: A Recommender System Using Natural Language Understanding
- The Domain Problem
- The Machine Learning Model
- The Infrastructure
- The Architecture of the Pipelin
- Data Ingestion and Data Preparation
- Data Tracking and Versioning
- Training and Experiment Tracking
- Model Packaging
- Deployment and Inference