Subscribe to our newsletter
📬 Receive new lessons straight to your inbox (once a month) and join 40K+ developers in learning how to responsibly deliver value with ML.
In the previous lesson, we covered the product design process for our ML application. In this lesson, we'll cover the systems design process where we'll learn how to design the ML system that will address our product objectives.
The template below is designed to guide machine learning product development. It involves both the product and systems design aspects of our application:
Product design (What & Why) → Systems design (How)
👉 Download a PDF of the ML canvas to use for your own products → ml-canvas.pdf (right click the link and hit "Save Link As...")
How can we engineer our approach for building the product? We need to account for everything from data ingestion to model serving.
Describe the training and production (batches/streams) sources of data.
Our task
| All of our incoming data is only machine learning related (no spam). | We would need a filter to remove spam content that's not ML related. | To simplify our ML task, we will assume all the data is ML content. |
Describe the labeling process (ingestions, QA, etc.) and how we decided on the features and labels.
Our task
Labels: categories of machine learning (for simplification, we've restricted the label space to the following tags: natural-language-processing, computer-vision, mlops and other).
Features: text features (title and description) that describe the content.
| Content can only belong to one category (multiclass). | Content can belong to more than one category (multilabel). | For simplicity and many libraries don't support or complicate multilabel scenarios. |
One of the hardest challenges with ML systems is tying our core objectives, many of which may be qualitative, with quantitative metrics that our model can optimize towards.
Our task
For our task, we want to have both high precision and recall, so we'll optimize for f1 score (weighted combination of precision and recall). We'll determine these metrics for the overall dataset, as well as specific classes or slices of data.
What are our priorities
How do we decide which metrics to prioritize?
Show answerIt entirely depends on the specific task. For example, in an email spam detector, precision is very important because it's better than we some spam then completely miss an important email. Overtime, we need to iterate on our solution so all evaluation metrics improve but it's important to know which one's we can't comprise on from the get-go.
Once we have our metrics defined, we need to think about when and how we'll evaluate our model.
Offline evaluation requires a gold standard holdout dataset that we can use to benchmark all of our models.
Our task
We'll be using this holdout dataset for offline evaluation. We'll also be creating slices of data that we want to evaluate in isolation.
Online evaluation ensures that our model continues to perform well in production and can be performed using labels or, in the event we don't readily have labels, proxy signals.
Our task
It's important that we measure real-time performance before committing to replace our existing version of the system.
Not all releases have to be high stakes and external facing. We can always include internal releases, gather feedback and iterate until we’re ready to increase the scope.
While the specific methodology we employ can differ based on the problem, there are core principles we always want to follow:
Our task
| Solution needs to involve ML due to unstructured data and ineffectiveness of rule-based systems for this task. | An iterative approach where we start with simple rule-based solutions and slowly add complexity. | This course is about responsibly delivering value with ML, so we'll jump to it right away. |
Utility in starting simple
Some of the earlier, simpler, approaches may not deliver on a certain performance objective. What are some advantages of still starting simple?
Show answerOnce we have a model we're satisfied with, we need to think about whether we want to perform batch (offline) or real-time (online) inference.
We can use our models to make batch predictions on a finite set of inputs which are then written to a database for low latency inference. When a user or downstream service makes an inference request, cached results from the database are returned. In this scenario, our trained model can directly be loaded and used for inference in the code. It doesn't have to be served as a separate service.
Batch serving tasks
What are some tasks where batch serving is ideal?
Show answerRecommend content that existing users will like based on their viewing history. However, new users may just receive some generic recommendations based on their explicit interests until we process their history the next day. And even if we're not doing batch serving, it might still be useful to cache very popular sets of input features (ex. combination of explicit interests leads to certain recommended content) so that we can serve those predictions faster.
We can also serve real-time predictions where input features are fed to the model to retrieve predictions. In this scenario, our model will need to be served as a separate service (ex. api endpoint) that can handle incoming requests.
Online inference tasks
In our example task for batch inference above, how can online inference significantly improve content recommendations?
Show answerWith batch processing, we generate content recommendations for users offline using their history. These recommendations won't change until we process the batch the next day using the updated user features. But what is the user's taste significantly changes during the day (ex. user is searching for horror movies to watch). With real-time serving, we can use these recent features to recommend highly relevant content based on the immediate searches.
Our task
For our task, we'll be serving our model as a separate service to handle real-time requests. We want to be able to perform online inference so that we can quickly categorize ML content as they become available. However, we will also demonstrate how to do batch inference for the sake of completeness.
How do we receive feedback on our system and incorporate it into the next iteration? This can involve both human-in-the-loop feedback as well as automatic feedback via monitoring, etc.
Our task
Always return to the value proposition
While it's important to iterate and optimize on our models, it's even more important to ensure that our ML systems are actually making an impact. We need to constantly engage with our users to iterate on why our ML system exists and how it can be made better.
Upcoming live cohorts
Sign up for our upcoming live cohort, where we'll provide live lessons + QA, compute (GPUs) and community to learn everything in one day.
To cite this content, please use:
1
2
3
4
5
6 | @article{madewithml,
author = {Goku Mohandas},
title = { Systems - Made With ML },
howpublished = {\url{https://madewithml.com/}},
year = {2023}
}
|