当前位置:主页 > 资料 >

Continuous Integration for ML Projects
栏目分类:资料   发布日期:2017-10-31   浏览次数:

导读:本文为去找网小编(www.7zhao.net)为您推荐的Continuous Integration for ML Projects,希望对您有所帮助,谢谢! Over last year we have deployed quite a few services which contain Machine Learning components. This post sh

本文为去找网小编(www.7zhao.net)为您推荐的Continuous Integration for ML Projects,希望对您有所帮助,谢谢! 本文来自去找www.7zhao.net



Over last year we have deployed quite a few services which contain Machine Learning components. This post shares what we learnt in that process and what helped us to minimise risks and time to production.

欢迎访问www.7zhao.net

What does our usual development cycle look like?

We use many different programming languages at Onfido but the most common ones are Ruby, Python, Elixir and Javascript. The development process of each service/application is quite unified:

www.7zhao.net

  • Use some version of Gitflow to manage branches
  • Containerize application using Docker
  • Deploy application to Kubernetes cluster

It’s important to mention that Docker is not only our choice of packaging format but also our development environment. This helps us minimise the risk of “works on my machine” situations and version dependency management (especially useful for science libraries in python). 本文来自去找www.7zhao.net

Jenkins is our continuous integration system of choice and we use it for building and deploying applications. All of our repos contain a Jenkinsfile which defines the pipeline that Jenkins will execute. Most common steps of the pipeline are:

内容来自www.7zhao.net

  • Build a Docker image
  • Run our unit/integration test (within a instance of that Docker image)
  • Run acceptance tests (end-to-end, may require some orchestration)
  • Deploy to staging and production

Every time developer pushes their code to remote git server, Jenkins reads the pipeline file and follows the appropriate steps. Having this pipeline define for each repository has proven to give us a lot of flexibility to each service steps.

内容来自www.7zhao.net

Here is a visualization of these processes: 本文来自去找www.7zhao.net

Typical release cycle

Enter Machine Learning in the service

Adding machine learning components to new or existing services means that now we need to resolve a few things:

本文来自去找www.7zhao.net

  • How do associate the code and (usually) large files needed for the models?
  • How can we increase confidence on changes to models or inference-related code?
  • Where does training fit into our development lifecycle?

To answer the first question, one of our engineers we took. To summarise: our models live in S3 and are linked to the code using a dependency file, which is easily staged in the Git repo.

本文来自去找www.7zhao.net

Testing models before deploying

As we do with all other software, before we release changes to ML models we want to have a certain degree of confidence that our changes have not negatively impacted how our system behaves (at least in unexpected ways). But how can we be confident that our models perform as we expect them?

www.7zhao.net

The solution we found is to introduce a new type of test in our test suite: accuracy tests .

欢迎访问www.7zhao.net

An accuracy test exercises the inference code for a test data sample for a model to verify that the output is above an expected threshold. 内容来自www.7zhao.net

If you remember our CI lifecycle above, we made the following changes: 本文来自去找www.7zhao.net

  • Allow docker image building step to resolve the model dependencies
  • Run unit/integration tests (fast to fail)
  • Run acceptance test (usually slower than the previous set)
  • Download the test dataset (currently using S3 to store this information)
  • Trigger the accuracy tests (speed can vary greatly depending on hardware, sample size, etc.)

With this setup, we have a high degree of confidence when making changes in our services that the performance of models have been unaffected — enabling us to move a lot faster. copyright www.7zhao.net

As this runs inside a container, we can also run these tests locally.

去找(www.7zhao.net欢迎您

In some scenarios, we have had to make different compromises: 内容来自www.7zhao.net

  • For services using small and fast inference models, we can use small (but still statistically significant) test datasets which — with some parallelising — can run on every build and finish in seconds/minutes
  • For other services using models with higher resource needs/slower inference time, we chose to run them on a scheduled basis on integration branches. This balances time for feedback with certainty that we’ll still catch regressions before they reach production

At this point we have a system that allows us to add new models, make change with confidence and deploy to production in a streamlined way. The pipeline looks similar to before:

去找(www.7zhao.net欢迎您

Pipeline after the introduction of ML models

Fitting training on this flow

From the perspective of the CI pipeline, adding new models can simply fit in modifying the code and dependencies files we mentioned, and we are good to go.

本文来自去找www.7zhao.net

One thing that we are interested in systematically training new models, lowering the knowledge barrier to do so and ultimately making this process automated. So far what we have found helps create a process is: 欢迎访问www.7zhao.net

  • Move all code required for training into the same git repository
  • Use a dedicated docker image for training
  • Structuring your training steps with libraries like Luigi or Airflow, makes it a lot simpler to refactor later on, apart other goodies like ability to resume on a failed step.

Moving code into the same repo meant that we can use the accuracy test as one of the steps in the pipeline, and manage code share (at least to a certain degree). 内容来自www.7zhao.net

Docker was a natural decision based on our existing workflows, and provided us with the flexibility to:

www.7zhao.net

  • Run training locally, specially during early stages of the project.
  • Ensure the right dependencies are installed (images will vary if you need to use GPUs or CPUs between inference and training).
  • Being able to take advantage of services like AWS Batch, which can handle all the infrastructure management (including GPU nodes) with little to no effort.

We will soon be writing more into this specific topic since there is a lot more learning from continuously train models with little to none human intervention. 去找(www.7zhao.net欢迎您

Summary

Handling ML is not too much different to other components in your systems, but it requires solving some particular problems to get you going. 内容来自www.7zhao.net

I believe that it’s extremely important to streamline this process right from the start and setting up CI that combines all these components will make it a lot easier to keep growing your solutions and improving them in production.

www.7zhao.net

欢迎访问www.7zhao.net


本文原文地址:https://medium.com/onfido-tech/continuous-integration-for-ml-projects-e11bc1a4d34f

以上为Continuous Integration for ML Projects文章的全部内容,若您也有好的文章,欢迎与我们分享!

本文来自去找www.7zhao.net

下一篇:没有了
Copyright ©2008-2017去找网版权所有   皖ICP备12002049号-2 皖公网安备 34088102000435号   关于我们|联系我们| 免责声明|友情链接|网站地图|手机版