Successful machine learning (ML) requires that we know the life cycle of ML applications. That's why all major hyperscalers map it.
Author: Andreas Schneider
In this article we present the ML lifecycle of Microsoft Azure ML.
Azure provides a platform to develop, train, validate and deploy machine learning models. Azure ML supports versioning of models and data and allows recording of metrics and results. So the creation of ML models can be traced and the models themselves can be reproduced. We can use Azure ML to assemble accelerated and simplified ML pipelines, taking into account best practices.
Specifically for ML, the cloud can score with scaling of resources when training models. This accelerates product development and shortens the time to market.
The typical ML life cycle ranges from the creation of models to their operation (Figure 1). It includes data collection, training, evaluation, deployment and monitoring during the operation of models. Azure ML covers all of these areas.
We start by reading in the data and start with an exploratory phase. We look at the data more closely and test the first models.
Azure ML solves the problem of how to efficiently get the data from the storage location to the computing instances in the cloud. As developers, all we have to do is feed the data into Azure ML. Within Azure ML, there are two concepts for data access:
○ Tabular: For data in tables.
○ Files: For less structured data. Access works like via an ordinary file system.
A dataset can be read in on computing instances, for example within an experiment. If several processing steps are needed, it is a good idea to use pipelines. Since pipelines remember the results of the intermediate steps, not all steps have to be executed again if only one part has been changed.
In Azure ML, models are created within experiments. Models and their corresponding quality metrics are tracked by Azure ML for each experiment run. The results can be viewed in the web interface or accessed via the API (Figure 2). We can review different runs and then select the models and parameters that give the best results. So we alternate between training and evaluating models until we are satisfied. The training takes place on dedicated Azure training clusters, which we can resize as needed.
For training, we only use the training data. We test the performance of the selected model before deployment on a test dataset that we did not use during the training. This dataset should be as close as possible to the actual productive data. If the steps from the data to the model are implemented as a pipeline, we can easily reproduce the models or train them with new data.
Once we have selected a model, we can make it available to the general public relatively easily as a web service using Azure ML. The service can be used by other applications as a secure REST endpoint. For this purpose, there are special inference clusters on Azure whose capacity adapts to the demand. We just need to create an inference script that takes the input from the web service and the stored model and returns a model prediction.
After deployment, it is a matter of ensuring the quality of the results that a model delivers. It is quite possible that the input data will gradually differ from the model training data. Therefore, we store the input data and the results of the web service to compare them with the training set. If we find a discrepancy between the production data and the training data, we detect a «data drift». Azure ML automatically monitors this drift and reports it so we can intervene if necessary.
We can use the collected productive data for future model training to continuously improve the models. Since we have implemented all the processing steps from the data to the model as a pipeline, we can automatically train a new model on new data and deploy it again as a web service.
Azure ML is ready for use in companies. Governance issues such as access permissions to data and models and traceability are addressed. There are concepts for responsible ML that address issues such as biases, fairness and explainability. For a company-wide inventory of ML potentials, there is a tool for cataloguing data sources - a data catalogue. Developers can continue to use their existing ML skills in the cloud, as common ML frameworks and libraries are supported.
With Azure ML, Microsoft offers a sophisticated system for scalable machine learning in the cloud that covers the entire ML lifecycle. This makes it possible to quickly and comprehensibly develop reproducible models that deliver high-quality results.