What a Decade of ML Infrastructure Taught Me About LLMs
What carries over from classical ML infrastructure into LLM systems, where the operational problems get harder, and which new failure surfaces appear in production.
ML Systems Architect
Profile
With a decade of experience across enterprise-scale organizations and startups, I design production AI platforms spanning data pipelines, model training and evaluation, serving infrastructure, observability, and cloud architecture.
What carries over from classical ML infrastructure into LLM systems, where the operational problems get harder, and which new failure surfaces appear in production.
This post walks through an architecture developed at HADI Technology for clients including Joinable and others running self-hosted LLM inference in production.
This post distills a reference architecture we built to address that. The context is a pipeline for categorizing and analyzing GitHub pull requests using a PyTorch autoencoder, developed as a reusable blueprint based on what we have seen fail repeatedly in production ML systems.
Think back to the last time you worked in a distributed system, did you consider using something other than RESTful HTTP calls as the method of communication between components in this system?
Think back to the last time you worked in a distributed system, did you consider using something other than RESTful HTTP calls as the method of communication between components in this system?
Even with the best solution architects, developers, and financial resources available, an application’s microservices migration journey will be a nightmare if the code not object oriented to some degree.
Serverless is the new Buzz word in town, selling developers the ability to focus on writing applications instead of managing servers. This is true for the most part, but Serverless apps also have a certain property that can make their deployment and maintenance time consuming. That is, they depend on...
In an effort to better understand code reviewing discussions, we’re going to create an SVM classifier to classify over 30 000 GitHub review comments based on the main topic addressed by each comment.
It’s well known that APIs need developer-friendly docs in order to gain widespread adoption. However, little if any improvement in the REST API visualization methods have been made over the past few years. Nowadays, most API docs and developer portals adopt a Swagger UI type of format where all the...
The Object Oriented Programming [OOP] paradigm is often associated with many great programming concepts including polymorphism, encapsulation and composition to name a few. However, even with a little experience in a functional programming language like haskell for example, you would quickly realize that these techniques are not exclusive to OOP...
Integrations are what takes Slack from a normal online instant messaging and collaboration system to a solution that enables you to centralize all your notifications, from sales to tech support, social media and more, into one searchable place where your team can discuss and take action on each. In this...
Computers have been collecting and storing data in relational/schema systems for many years. However, digital storage growth outpaces that of computing processing power by leaps and bounds. Additionally, the amount of unstructured that is collected greatly exceeds that of structured data, further limiting the utility of tradional database systems. For...
Clarpse is a multi-language source code analysis tool designed for extracting deep relationships between entities in a codebase through a clean API. Clarpse makes developer tools like code search and static analyzers better. It supports the development of features like jump to definition, find usages, type inference, and documentation generation....
ElasticSearch is an open-source, broadly-distributable, readily-scalable, enterprise-grade search engine. Accessible through an extensive and elaborate API, Elasticsearch can power extremely fast searches that support your data discovery applications. I recently worked on implementing a multi-language search engine using Elastic Search and found that for certain use cases, rolling Elastic Search...
The Google Closure Compiler is a tool for making JavaScript download and run faster. Instead of compiling from a source language to machine code, it compiles from JavaScript to better JavaScript. It parses your JavaScript, analyzes it, removes dead code and rewrites and minimizes what’s left. It also checks syntax,...