How I Stopped Worrying and Embraced Docker Microservices

July 15, 2021

Hello, world!

‍

If you are like us here at Prove, then you’re really passionate about programming, programming languages, and their runtimes. You will argue passionately about how Erlang has the best Distributed Systems model (2M TCP connections in one box), Haskell has the best type system, and how all our ML backend should be written in Lua (Torch). If you are like me and you start a company with other people, you will argue for hours, and nobody’s feelings are gonna be left intact.

‍

That was the first problem we had in the design phase of our Machine Learning backend. The second problem will become obvious when you get a short introduction to what we do at Prove:

‍

We data-mine a lot of sensors on your phone, do some signal processing and encryption on the phone, then opportunistically send the data from everybody’s phone into our Deep-Learning backend, where the rest of the processing and actual authentication take place.

‍

This way, the processing load is shared between the mobile device and our Deep Learning backend. Multiple GPU machines power our Deep Learning, running our proprietary Machine Learning algorithms across all users’ data.

‍

These are expensive machines, and we’re a startup with finite money, so here’s the second problem; Scalability.We don’t want these machines sitting around when no jobs are scheduled, and we also don’t want them struggling when a traffic spike hits. This is a classic auto-scaling problem.

‍

This post describes how we killed two birds;

‍

1. Many programming runtimes for DL

2. Many machines

‍

With one stone. By utilizing the sweeping force of Docker microservices! This has been the next big thing in distributed systems for a while, Twitter and Netflix use this heavily, and this talk is a great place to start. Since we have a lot of factors we verify against, like FacialRecognition, Gait Analysis, and Keystroke Analysis, it made sense to make them modular. We packaged each one in its own container, wrote a small HTTP server that satisfies the following REST API, and done!

This API can be useful because everyMachine Learning algorithm has pretty much the same API; training inputs, normal inputs, and outputs. It’s so useful we decided to open-source our microservice wrapper for Torch v7/Lua and for Python. Hopefully, more people can use it, and we can all start forking and pushing entire machine learning services in Dockerhub.

‍

But wait, there’s more! Now that we containerized our ML code, the scalability problem has moved from a development problem to an infrastructure problem. To handle scaling each microservice according to their GPU and Network usage, we rely on Amazon ECS. We looked into Kubernetes as a way to load-balance containers; however, its support for NVIDIA GPU-based load-balancing is not there yet(There’s an MR and some people who claim they made it work). Mesos was the other alternative, with NVIDIA support, but we just didn’t like all the Java.

‍

In the end, this is how our ML infrastructure looks like.

Top-down approach to scalable ML microservices

Those EB trapezoids represent Amazon EB (ElasticBeanstalk), another Amazon service that can replicate machines (even GPU heavy machines!) using custom-set rules. The inspiration for load-balancing our GPU cluster with ECS and EB came from this article from Amazon’s Personalization team.

‍

For our database, we use a mix of Amazon S3 and a traditionalPostgreSQL database linked and used as a local cache for each container. This way, shared data becomes as easy as sharing S3 paths, while each container can modularly keep its own state in PostgreSQL.

‍

So there you have it, both birds killed. Our ML people are happy since they can write in whatever runtime they want as long as there is an HTTP server library for it. We don’t really worry about scalability as all our services are small and nicely containerized. We’re ready to scale to as many as 100,000users, and I doubt our microservices fleet would even flinch. We’ll be presenting our setup in the coming Dockercon 2017 (hopefully, waiting for theCFP to open), and we’re looking to hire new ML and full-stack engineers. So come help us bring the vision of passwordless implicit authentication to everyone!

To learn about Prove’s identity solutions and how to accelerate revenue while mitigating fraud, schedule a demo today.

Tags:

North America

Keep reading

See all blogs

AI, Fraud, and the Fight for Trust: Highlights from improve Connect

Industry leaders gathered at Prove's Improve Connect summit to discuss balancing frictionless digital experiences with the threat of AI-powered fraud. Experts from companies like Coinbase, Bluevine, and Google shared insights on navigating the challenges and opportunities of emerging technologies.

Kelley Vallone

October 16, 2024

Developer Blogs

Beyond Patches: Secure by Design

Digital identity theft is the new frontier of crime, where criminals steal our online "keys" to unlock financial accounts, social media, and even medical records, causing widespread damage and eroding trust in the digital world. Companies inadvertently contribute to this problem by failing to adequately protect user data with secure software development practices.

Nicholas Dewald

October 15, 2024

Developer Blogs

The Rise of the Trust & Safety Officer: Safeguarding Businesses in the Digital Age

As organizations in the UK prioritize business growth through online transactions, establishing trust & safety with users is rapidly becoming the critical element that offers a competitive advantage.

Charlie Rowland

October 10, 2024

Let us Prove it &
talk to an expert today

Let's talk

Trusted by 1,000+ leading companies to reduce fraud and improve consumer experiences, Prove is the world’s most accurate identity verification and authentication platform.