Why Machine Learning in the Cloud - the example of Amazon SageMaker

The marketing pages of the three hyper scalers Amazon, Azure and Google promise a lot. What is really behind them?

Author: Andri Lareida

Together with our customer we develop a data pipeline for the classification of e-mails based on Amazon Web Services (AWS). For processing the data, we rely on serverless Lambda functions, which we decouple using Kinesis Data Streams (Figure 1). The Machine Learning (ML) component of our pipeline is trained and provided in SageMaker.

20200305_Sagemaker_Architektur_ALR_Webseite.png
Graph 1: Data pipeline

Der Use Case

Our customer receives thousands of emails from his customers every day with feedback on the service or requests for quotations. This presents our partner with two fundamental problems: 

  1. The volume of e-mails is constantly increasing
  2. The end customer expects a quick response

Today, incoming e-mails are processed by employees and categorized accordingly. This can take up to three days. To be able to analyze the categories more quickly, we are developing an ML algorithm that categorizes incoming e-mails. In a future step, employees will be given suggestions for the response.

Machine Learning in the Cloud

Our ML model goes through its life cycle (chart 2) again and again:

  • Gather data
  • Develop model
  • Train model
  • Evaluate model
  • Provide model
  • Offer model as a service
  • Collect feedback
  • Monitor performance

This cycle is repeated to adapt the model to changes in its environment. If we observe a deterioration in performance, the cycle is restarted and the model is trained or even further developed with new data.

20200306_Sagemaker_LifeCycle_de_ALR_Webseite.png
Figure 2: Machine Learning Life Cycle

Amazon SageMaker supports our models in training and deployment. As a fully managed cloud service, SageMaker offers the benefits of the cloud. In our case:

  • No fixed costs

    Training from our current model (BERT) requires powerful GPUs. Thanks to the flexibility of cloud resources, there is no need to purchase expensive hardware, which in our case would only be needed for training.

  • Scaling

    Our workloads can be easily distributed across multiple GPUs or even nodes. We can accelerate training by getting more cloud resources for a shorter period of time. In other words, while we have twice the cost of twice the resources, we only need them for half the time. So the cost of training remains almost constant.

  • No maintenance

    SageMaker provides the entire platform, so there is no need to manage hardware and operating systems. Our team can fully concentrate on the development, as there are almost no maintenance tasks to be performed.

SageMaker was developed to make machine learning in the cloud easier. SageMaker relieves data scientists and engineers of repetitive work so that they can concentrate on the essentials.

«SageMaker offers features that go beyond the general benefits of the cloud.»
Andri Lareida Senior Consultant ipt
  • Rapid Developement
    SageMaker provides fully developed models that only need to be trained for a specific task. Thus, proof-of-concepts can be implemented in the shortest possible time. For your own models, SageMaker provides containers that come with frequently required libraries and can be flexibly expanded.
  • Training
    SageMaker offers support for the administration of training data. The training can be started with a click. All you need to do is specify the location where the data is saved. SageMaker offers hyperparameter optimization, using Bayesian optimization to find the best out of thousands of model parameter combinations.
  • Make available
    Once a model has been trained, it can be made available with a click. SageMaker installs the model on an EC2 instance and provides an HTTP REST endpoint through which the model can be accessed.

How SageMaker helps in a project

In our specific project we use SageMaker for training our algorithm and providing the fully trained model. This relieves our data scientists and engineers of work that they can put back into improving the algorithm and the data pipeline.

For the development, SageMaker provides us with docker images that are already equipped with the necessary drivers and ML libraries. The developers only have to take care of the most important thing, their own model, and loading the data.

Once a development step is completed, the training is started via script. For the training, the data set to be used, limits of the hyperparameter optimization and the storage of the resulting artifacts must be defined. SageMaker runs the training several times until the best parameter combination is found. The trained model and the evaluation of the test results are stored in an S3 Bucket.

If we decide to move a new model into production, the trained model is loaded from S3 into a special serving image and transferred to SageMaker. SageMaker provides an instance on EC2, installs the image and configures a REST endpoint from which the model can be accessed. If desired, the model can be scaled to multiple EC2 instances at the push of a button. The endpoint does not change and remains available.

Disadvantages of SageMaker

Essentially we see two disadvantages in the project:

  1. Variable costs
    EC2 instances with GPUs cost many times more than comparable instances without GPU. In our case, this drives up the costs considerably at times. If a lot of experiments are done in one month, the SageMaker costs quickly exceed the costs for the rest of the data pipeline. Of course, in every project it has to be considered whether the purchase and internal operation of the hardware pays off.
  2. Technical corsetWith our BERT model and SageMaker we had difficulties in the beginning, because no image with the required Tensorflow-2 library was available. We then had to build our own on the basis of the existing images in order to use our model on SageMaker. Currently there are also no images available that include Python in version 3.7, which caused us a little extra work. We therefore recommend to check the compatibility and technical specifications at the beginning of a project.

Conclusion

SageMaker is the AWS framework for machine learning and provides a platform and tools to support the entire lifecycle of an ML project. In our e-mail classification project we rely on SageMaker to train and provide models without investment and without integration effort. This allows our development team to fully concentrate on the value-adding tasks. We therefore believe that the advantages of the Cloud and SageMaker clearly outweigh the disadvantages.

Partner

amazon-web-services-1.svg AWS Consulting Partner