Principal Recruiter
Data Platform & Architecture
View profileAs part of our ‘Ask The Expert‘ blog series, we had an in depth chat with Sam Jenkins, MLOps Lead at Ultraleap.
In this article, we explore Sam’s journey from chemical engineering to MLOps, discussing effective strategies for scaling machine learning models, monitoring model performance, and integrating ethical considerations.
Hi there! I’m Sam Jenkins and I’m the MLOps Lead at Ultraleap, a User Interface company that uses Computer Vision Machine Learning models to create hand tracking and haptics technologies. Here I’m responsible for the design, implementation and maintenance of the MLOps system and underlying infrastructure.
Prior to this I worked as a data scientist at a scientific instrumentation company called Malvern Panalytical, who were looking to implement machine learning solutions on their IoT data. At that time several companies were pushing for AI solutions on their existing products but hadn’t quite considered the full extent of the infrastructure requirements, so given my background, I worked on the engineering work for the data platform in support of the modelling.
My background originally was in chemical / process engineering (now I think about it you could consider MLOps being the “process engineering” of the AI industry…). My experience in that field, as well as a large amount of systems engineering and modelling in MATLAB, definitely helped me along the way!
To this I would say that you can’t patch over bad system design. Having an understanding of the fundamentals of distributed systems is key to making appropriate decisions on technologies relative to your specific use case, which will allow you to scale more easily.
At Ultraleap we have to maintain the training and deployment / packaging of over 80 models, which makes for some very complex pipelines! Maintaining the lineage of data through to model over many experiments across a large team is important, so we make sure to properly track experiments, maintain metadata across pipelines for historical purposes, and maintain good API interfaces with our software teams to package models downstream.
More broadly I think a good grounding in DevOps practices is very important. Infrastructure as code, solid CI / CD pipelines, good environment segregation and containerisation will make maintaining your infrastructure easier in support of model training and deployment.
If we’re considering specific tools, kubernetes obviously works great for controlling scale (more so for deployment than training). If you want good experiment tracking, metadata storage and model registries, look towards ClearML or MLFlow, or AzureML / Sagemaker if you want to keep consistent with a cloud provider. Picking a stack appropriate for your use case is particularly important, so make sure you consider your businesses unique challenges and edge cases!
Drift can be a complex problem. Is the model drifting due to the production data falling outside of its training data distribution? Or have the external parameters of what is considered to be a quality prediction changed? i.e. sentiment / concept drift. I think having a very good understanding of the domain you’re working in is essential here, which is where talented data scientists are worth their weight in gold! This is the starting point, then you can think about setting up monitoring / alerting for model drift based on your KPIs.
Again we come back to having solid infrastructure, and part of this is having really good monitoring and observability of your system. If you have a distributed MLOps system running on kubernetes for example, utilising Prometheus and Grafana, as well as a good centralised logging service (e.g. cloudwatch if you’re using EKS), you’re going in the right direction.
The LLM boom means we’re seeing drastically increasing computational requirements, either for training foundational models or fine-tuning existing ones. They also have requirements specific to the field, e.g. transfer learning, RLHF and vector dbs, so much so that the industry seems to be coining the term LLMOps, which seems to refer to the extra bits on top of MLOps to support LLMs in production. Everyone loves a buzzword, LLMOps seems to be the new one.
Tooling surrounding data management in ML has become more mature, maintaining versions of datasets and updating them can be very important in CV use cases. One other interesting aspect of CV is image annotation, and one area I have noticed in MLOps tooling that has developed is an increased number of data annotation tools.
It’s hard to put a finger on the direct impact of MLOps upon industry and societal change, but companies who are building out their AI solutions are now taking the underlying infrastructure and system design more seriously. If you have great MLOps you have a much more efficient process for delivering your product. Going back to my process engineering comparison, you can liken it to an increasingly efficient factory to produce cars. As humans we have vast experience in improving manufacturing efficiency for producing objects in the physical world, and MLOps applies those principles to the digital world. I think we’ll see rapid advancements in certain domains as they’ll be able to train and deploy models much more effectively as we get better and better tooling, reducing the time from research to product.