How we arrived at the reproducibility of the ML - model (2023)

When working in the real worldML ProjectsYou are face to face with a series of obstacles.The problem of reproducibility of the ML model is one of them.

This article will lead you to an experience -based stage approach to solving the ML reproducibility challenge adopted by my ML team working on a fraud detection system for insurance domain.

You will learn:

  • 1Why is reproducibility in automatic learning important?
  • 2What were the challenges the team faced?
  • 3What was the solution?(Tool battery and a checklist)

Let's start from the beginning!

Why is reproducibility in automatic learning important?

To better understand this concept, I will share with you the trip of me and my team.

Project Background

Before discussing the details, let you tell you a little about the project.This ML -based project was a fraud detection system for insurance domain, where a classification model was used to classify whether a person is prone to commit fraud or not given the necessary details as entry.

Initially, when we start working on any project, we do not think about implementing the model, reproducibility, models, etc. "We tend to spend a lot of time on data exploration, pre -processing and modeling.Natural research conducted in 2016.

According to this research, 1,500 scientists were chosen for reproducibility testing, however 70% of them could not double the experiments of other scientists, and more than 50% could not double their own experiments.Taking this and some other details in mind, we created a reproducible project and successfully implanted it in production.

(Video) How to Track ML Model Training: LightGBM + Integration

When working on this classification project, we realize that reproducibility is not only essential for consistent results, but also for these reasons:

  • Stable results and practices of ML:To ensure that the results of the fraud detection model are easily reliable by customers, we had to ensure that we had stable results.Reproducibility is the -chave factor when it comes to stabilizing the results of any mL tube.For reproducibility, we use an identical data and set of pipes for anyone on our team to perform the same results as the model performing.But to ensure that our pipe data and components were kept them during the executions, we had to track them using different mlop tools.

For example, we use code versions tools, models version tools and data version tools that helped us monitor everything in ML.They ensured that best practices were followed during development.

  • Promotes precision and efficiency:One thing we emphasized most was that we wanted our model to generate the same results repeatedly, regardless of when we executed it.Since any reproducible model provides the same results in each execution, we just had to ensure that we did not make any change in the configuration of the model and hyperparmeters every time we performed the model.This helped us identify the best model of everything we tried.
  • Avoid duplication of efforts:An important challenge that we have before we develop this classification project was that we had to ensure that every time one of our team members perform a project, they do not need to do all the settings from scratch to achieve the same results every time.Also, if any new developer joins our project, you can easily understand the tube to generate the same model.This is where version control tools and documentation helped us as team members, and the new carpentry had access to specific code versions, data models and ml.
  • Allows the development of ml tubes without error:There were times when the same classification model did not produce the same results, which helped us find errors and errors easily in our tube.Once identified, we were able to solve these problems quickly to make our stable tube.

Each ML reproducibility challenge we face

Now that you know about reproducibility and its different benefits, it's time to discuss the main reproducibility problems that my team and I face during the development of this ML project.The important part is that all these challenges are very common for any type of ML or DL use case.

1. Lack of clear documentation

An important part that we were losing at the beginning was the documentation.Initially, when we had no documentation, the performance of our team members, as they took more time than expected to understand the requirements and implement new characteristics.Very difficult for the new developers of our team to understand the entire project.

of the project.

2. Different computer environments

Often, it is possible that different developers of a team have different environments, such as operating systems (OSS), language versions, library versions, etc. We have been the same scenario while working on the project.This has affected our reproducibility, as each environment has some significant changes in others in terms of different versions of the library or different forms of packet implementation, etc.

It is a common practice to share code and artifacts among different team members to any ML project.Therefore, a slight change in the computer environment can create problems to execute the existing project, and finally developers will spend unnecessary time purifying the same code repeatedly.

3. Do not track data, code and workflow

ML reproducibility is only possible when using the same data, coding and pre -processing.But not monitoring these things can lead to different configurations used to run the same model that can result in different exits in each execution. At some point in your project, you must store all this information to be able to recover it when necessary.

When working on the classification project, at first, we did not monitor all models and their different hyperparmeters, which turned out to be a barrier for our project to reach reproducibility.

(Video) Smart ML Experiment tracking and model registry with Platform

4. Lack of metric and standard evaluation protocols

Selecting the correct evaluation metric is one of the possible challenges while working in any case to use the classification.The model could not predict many false negatives for which we tried to improve the removal of the general system.Do not use a pattern the metric can reduce clarity between team members on the goal and can finally affect reproducibility.

Finally, we had to ensure that all members of our team follow the same protocols and code standards, so that there was uniformity in the code that caused the code to be more readable and understandable.

read more

How to solve reproducibility in ml

Automatic Learning Reproducibility Verification List: solutions that we adapt

As ML engineers, we ensure that each problem has one or more possible solutions, such as the reproducibility challenges of ML.Although there were many challenges for reproducibility in our project, we were able to solve them all with the correct strategy and a fair selection of tools.Now let's look at the automatic learning reproducibility checklist we use.

1. Clear solution documentation

Our fraud detection project was the combination of multiple individual technical components and the integration between them.It was very difficult for us to remember in words when and how the component would be used by what process.So, for our project, we created a document that contains information on each specific module in which we work, for example, data collection, pre -processing and data exploration, modeling, implementation, monitoring, etc.

Documenting which solution strategies we tried or test, which tools and technologies we would be to use throughout the project, which implementation decisions were made, etc. They helped our ML developers better understand ML.With this appropriate documentation, it could follow the standard recommended practices and step procedure to perform the tube and finally knew what error I needed what kind of resolution.This is the result reproduces the same results every time our team members performed the model and helped us improve overall efficiency.

In addition, this helped us improve the efficiency of our team, as we do not need to spend time explaining all the workflow for new woodworkers and other developers, as everything was mentioned in the document.

2. Use of the same computer environments

The development of the classification solution needed our ML developers to collaborate and work in different ML sections.And since most of our developers used different computing environments, it was difficult for them to produce the same results due to various dependence changes.Then, for reproducibility, we had to ensure that each developer was using the same computer environment, library versions, language versions, etc.

How we arrived at the reproducibility of the ML - model (1)
(Video) AWS on Air 2020: AWS What’s Next ft. Amazon Neptune ML

Using aContainer in loopo Creando Un CompatibleAtmosfera virtualThey are two of the best solutions to use the same computer environments.In our team, people worked in Windows and Unix environments and different versions of language and library, the use of docker containers solved our problem and helped us achieve reproducibility.

3. Data monitoring, code and workflow

Version and workflow data

As we knew, the data were the skeleton of our fraud detection case, if we made a slight change in the data set, this could affect the reproducibility of our model.The data we were using for our use case were not in the form and format necessary to train the model.Therefore, we had to apply different steps of pre -processing data, asNo rowing, Like this,Generation of characteristics, Like this,Codification of characteristics, Like this,Characteristics scale, make this data compatible with the selected model.

For this reason, we had to use data version tools,, Like this,Pachyderm, oDVCThis can help us systematically manage our data.You can see this tutorial to see how it is solved in Neptune:Such as versions and compare data sets.

In addition, we did not want to repeat all data processing steps whenever we performed the ml tube;Therefore, the use of these data management and work tools helped us recover any specific version of pre -processed data for the ML tube execution.

To know more

The 7 best data version control tools that improve your workflow with automatic learning projects

Code and Administration version

During development, we had to make several changes in the code for the implementation of ML modules, implementation of new features, integration, testing, etc. For reproducibility, we had to ensure that we used the same version of code every time we performed a tube.

There are several tools to control the entire code, some of the popular areGithubyBitbucket.We used Github for our use case to control the base of the entire code base, also this tool made the equipment collaboration quite easy, as developers had access to each confirmation by other developers.It facilitated the use of the same code every time we executed an ml tube.

Monitoring of ML Experiments

Finally, the most important part of making our ml tube reproduced was to track all the models and experiments we tested throughout the mL.When we worked on the classification project, we tested different ML models and hyperparmeter values, it was very difficult to monitor them manually or with documentation.To solve this problem, we decided a different tool for each of these tasks,neptuno.aiIt seemed the correct solution.

(Video) Webinar: From Training to Production. How to Fit in Your ML Model Lifecycle?

It is a cloud -based platform designed to help data scientistsExperiment Monitoring, Like this,Data version, Like this,Model versions, ymetadata storeProvide a centralized place for all these activities, which makes it easier for teams to collaborate on projects and ensure that everyone is working with the most up -to -date information.

How we arrived at the reproducibility of the ML - model (3)

Tools, Like this,Comet, Like this,Mlflow, etc., allows developers to access any specific version of the model so that they can decide which algorithm worked best for them and with the hyperparimeters.It recently depends on your use case and your equipment dynamics, with what the tool decides to continue.

4. Decide standard assessment metrics and protocols

While we were working on a classification project and we also had an unbalanced data set, we had to decide on the metrics that could work well for us.Precision is not a good measure for the imbalance database so that we could not use that we had to decide betweenAccuracy, memory, Curva AUC-ROP, etc.

In a case of fraud detection, the accuracy and memory of both are important.This is because false positives can cause customer inconvenience and discomfort and potentially damage business reputation.However, the false negative can be much more harmful and result in significant financial losses.Therefore, we decided to maintain retirement as our main metric for the use case.

In addition, we decided to use the PEP8 standard to code, as we wanted our code to be uniform among all the components we were developing.Sore a single metric to focus and PEP8 for standard coding practices have helped us write an easily reproducible code.


After reading this article, he now knows that reproducibility is an important factor when working in cases of ML use.Without reproducibility, it can be difficult for someone to trust their findings and results.I also guided him for the importance of reproducibility with a personal experience, and I also shared some of the challenges that my team and I face and the proposed solutions.

If you need to remember one thing in this article, it would be to use specialized tools and services to control versions of everything that is possible, such as data, tubes, model and different experiments.This allows you to use any specific version and perform the entire tube to get the same results each time.




1. Setting Up MLOps at a Healthcare Startup With Vishnu Rachakonda
2. How to Reproduce Previously Tracked Experiments
3. Enable Production ML with Databricks Feature Store
4. Emeli Dral: The day after deployment: how to set up your model monitoring
(ODS AI Global)
5. B2C Lecture Series - 1 | Operationalizing AI Models in production
6. Your First MLOps System: What Does Good Look Like? With Andy McMahon
Top Articles
Latest Posts
Article information

Author: Terence Hammes MD

Last Updated: 03/02/2023

Views: 6422

Rating: 4.9 / 5 (69 voted)

Reviews: 92% of readers found this page helpful

Author information

Name: Terence Hammes MD

Birthday: 1992-04-11

Address: Suite 408 9446 Mercy Mews, West Roxie, CT 04904

Phone: +50312511349175

Job: Product Consulting Liaison

Hobby: Jogging, Motor sports, Nordic skating, Jigsaw puzzles, Bird watching, Nordic skating, Sculpting

Introduction: My name is Terence Hammes MD, I am a inexpensive, energetic, jolly, faithful, cheerful, proud, rich person who loves writing and wants to share my knowledge and understanding with you.