The marketing pages of the three hyper scalers Amazon, Azure and Google promise a lot. What is really behind them?
Author: Andri Lareida
Together with our customer we develop a data pipeline for the classification of e-mails based on Amazon Web Services (AWS). For processing the data, we rely on serverless Lambda functions, which we decouple using Kinesis Data Streams (Figure 1). The Machine Learning (ML) component of our pipeline is trained and provided in SageMaker.
Our customer receives thousands of emails from his customers every day with feedback on the service or requests for quotations. This presents our partner with two fundamental problems:
Today, incoming e-mails are processed by employees and categorized accordingly. This can take up to three days. To be able to analyze the categories more quickly, we are developing an ML algorithm that categorizes incoming e-mails. In a future step, employees will be given suggestions for the response.
Our ML model goes through its life cycle (chart 2) again and again:
This cycle is repeated to adapt the model to changes in its environment. If we observe a deterioration in performance, the cycle is restarted and the model is trained or even further developed with new data.
Amazon SageMaker supports our models in training and deployment. As a fully managed cloud service, SageMaker offers the benefits of the cloud. In our case:
No fixed costs
Training from our current model (BERT) requires powerful GPUs. Thanks to the flexibility of cloud resources, there is no need to purchase expensive hardware, which in our case would only be needed for training.
Our workloads can be easily distributed across multiple GPUs or even nodes. We can accelerate training by getting more cloud resources for a shorter period of time. In other words, while we have twice the cost of twice the resources, we only need them for half the time. So the cost of training remains almost constant.
SageMaker provides the entire platform, so there is no need to manage hardware and operating systems. Our team can fully concentrate on the development, as there are almost no maintenance tasks to be performed.
SageMaker was developed to make machine learning in the cloud easier. SageMaker relieves data scientists and engineers of repetitive work so that they can concentrate on the essentials.
In our specific project we use SageMaker for training our algorithm and providing the fully trained model. This relieves our data scientists and engineers of work that they can put back into improving the algorithm and the data pipeline.
For the development, SageMaker provides us with docker images that are already equipped with the necessary drivers and ML libraries. The developers only have to take care of the most important thing, their own model, and loading the data.
Once a development step is completed, the training is started via script. For the training, the data set to be used, limits of the hyperparameter optimization and the storage of the resulting artifacts must be defined. SageMaker runs the training several times until the best parameter combination is found. The trained model and the evaluation of the test results are stored in an S3 Bucket.
If we decide to move a new model into production, the trained model is loaded from S3 into a special serving image and transferred to SageMaker. SageMaker provides an instance on EC2, installs the image and configures a REST endpoint from which the model can be accessed. If desired, the model can be scaled to multiple EC2 instances at the push of a button. The endpoint does not change and remains available.
Essentially we see two disadvantages in the project:
SageMaker is the AWS framework for machine learning and provides a platform and tools to support the entire lifecycle of an ML project. In our e-mail classification project we rely on SageMaker to train and provide models without investment and without integration effort. This allows our development team to fully concentrate on the value-adding tasks. We therefore believe that the advantages of the Cloud and SageMaker clearly outweigh the disadvantages.