1. 01 Latest 19 min read

    What a Decade of ML Infrastructure Taught Me About LLMs

    After close to a decade working on ML infrastructure, including GPU clusters, autoscaling pipelines, and model serving systems, the transition into LLM-based production systems turned out to be less of a clean break than the hype suggests. The problems do not change so much as evolve, and they get harder in specific ways. This post works through the areas where classical ML intuitions transfer directly into LLM operations, where they break down and need updating, and where the failure surfaces are genuinely new. Covering latency, reproducibility, data lineage, cost modeling, observability, and the unique challenges of agent systems, written for engineers who have operated traditional ML infrastructure and want an honest map of what carries over.

    Read article ↗
  2. 02 9 min read

    Self-Hosting LLMs in Production: The vLLM + KubeAI Stack

    Deploying a large language model is not the hard part — deploying one that is safe to operate, cost-effective to scale, and straightforward to reason about under load is where most teams run into trouble. This post walks through an architecture developed at HADI Technology for running self-hosted LLM inference in production, using vLLM as the inference engine and KubeAI for model lifecycle management. Rather than a step-by-step tutorial, it explains the tradeoffs that led to this architecture and where it fits compared to alternatives like managed API endpoints or simpler single-instance deployments. The reference implementation is open-source and available on GitHub.

    Read article ↗
  3. 03 9 min read

    Designing a Production MLOps Pipeline: The Decisions That Actually Determine Reliability

    Teams that invest heavily in model development often ship the surrounding infrastructure as an afterthought — and pay for it later in operational failures that are difficult to diagnose and expensive to fix. This post distills a reference MLOps architecture built from repeated production engagements, using a GitHub pull request categorization pipeline with a PyTorch autoencoder as the concrete example. It covers the decisions that determine whether a production ML system holds up over time: anomaly detection before classification, experiment tracking, model registry integration, containerized training, and CI/CD for model deployment. The full implementation is available on GitHub as a reusable blueprint.

    Read article ↗
  4. 04 1 min read

    Deploy A Production-Ready E-Commerce Solution on AWS with CloudFormation

    Shopify works until it doesn’t — vendor lock-in and transaction fees compound at scale, and the platform gives you limited control over infrastructure when it matters. This post walks through a CloudFormation-based reference architecture for running PrestaShop on AWS with the operational properties of a production system: auto-scaling application tier, managed RDS database, ElastiCache for session and object caching, and a CDN layer for static assets. Every resource is defined as code, so the entire stack can be reproduced across environments with a single command. The architecture was developed following a real client engagement where the cost and control tradeoffs of hosted e-commerce platforms became untenable at scale.

    Read article ↗
  5. 05 11 min read

    3 Common Misunderstandings of Inter-Service Communication in MicroServices

    REST and message queues are the two dominant approaches to inter-service communication in distributed systems, and teams frequently choose between them based on assumptions that do not hold under scrutiny. Synchronous HTTP calls are not always simpler or more reliable than async messaging, and message queues are not always the right choice when decoupling is the goal. This post examines three specific misconceptions that lead teams to make this decision poorly — around coupling, reliability, and operational complexity — and replaces them with a more grounded analysis of what each approach actually trades off. The goal is not to recommend one over the other but to help teams make the call with clear eyes.

    Read article ↗
  6. 06 8 min read

    Why Object Oriented Code Accelerates Microservices Adoption

    Migrating a monolith to microservices is a difficult undertaking regardless of technical quality, but the difficulty scales dramatically with how coupled and procedural the source code is. When a codebase lacks clear object boundaries, the decomposition process becomes a guessing game about which pieces can be extracted without breaking everything else. This post demonstrates how four core OOP principles — single responsibility, encapsulation, dependency inversion, and composition — directly reduce the mechanical effort of splitting a legacy system into services. The argument is not that OOP is required for microservices, but that investing in it before a migration begins pays back measurably during the decomposition.

    Read article ↗
  7. 07 10 min read

    4 Elements of A Great Serverless Application Deployment Strategy

    Serverless apps depend on many managed services — storage, caches, load balancers, execution environments — which makes deployment automation non-trivial compared to a single application binary. Without structure, provisioning infrastructure and deploying code across dev, staging, and production environments becomes a manual, error-prone process. This post covers four practices for keeping that process automated and low-risk: separating environments properly, using infrastructure as code to provision resources, packaging application code independently of infrastructure, and automating deployment pipelines end to end. If any part of your release still requires you to look at the cloud console, this post is for you.

    Read article ↗
  8. 08 16 min read

    Dissecting GitHub Code Reviews: A Text Classification Experiment

    Code review comments on GitHub contain a wealth of signal about what engineers care about — naming, logic, performance, test coverage — but that signal is buried in unstructured free text. This post builds an SVM classifier to categorize over 30,000 GitHub pull request review comments by the main technical topic each addresses. The dataset, feature engineering approach, and model evaluation are walked through in a Jupyter notebook available on GitHub. The results reveal which topics dominate code review discussions and how that distribution shifts across different types of repositories.

    Read article ↗
  9. 09 4 min read

    API Documentation Using Tables

    Swagger UI communicates endpoint details well but fails at conveying the shape of a complex API at a glance. When an API exposes dozens of resources and hundreds of operations, developers need a high-level map before they can navigate the detail. This post proposes a table-based documentation format that presents resources, operations, and their relationships in a compact, scannable structure. The approach is complementary to existing spec-driven tooling and can be generated directly from an OpenAPI definition — a live demo built against the GitHub API is included.

    Read article ↗
  10. 10 7 min read

    The Roots of Object Oriented Programming

    Most OOP languages claim the label but miss what Alan Kay actually meant when he coined the term. Polymorphism, encapsulation, and inheritance are frequently cited as its pillars — but these exist in functional languages too. Kay’s actual vision was rooted in biology: autonomous objects communicating exclusively through message passing, with no direct access to each other’s internal state. This post traces that original conception through Kay’s early work and asks why almost no modern software actually practices it, and what we lose as a result.

    Read article ↗
  11. 11 3 min read

    Monitor Disk Usage Levels on Slack

    Disk space exhaustion is a quiet failure mode — systems degrade gradually until they stop working entirely, often at the worst possible moment. This post shares a bash script that monitors local disk storage levels and reports them to a Slack channel at a configurable interval, color-coded by usage severity. The script uses standard Unix tooling with no additional dependencies and can be dropped onto any instance in minutes. It is designed to be deployed across multiple machines simultaneously and supports configurable alert thresholds so teams can act before things become critical.

    Read article ↗
  12. 12 5 min read

    Easily Migrate Postgres/MySQL Records to InfluxDB

    Relational databases were not designed for time series data — as write volumes grow, table cardinality climbs and query performance degrades in ways that are hard to tune around. Purpose-built time series databases like InfluxDB handle this workload efficiently by design, with compression, downsampling, and retention policies built in from the start. This post explains when that tradeoff is worth making and walks through the practical steps of migrating existing Postgres or MySQL records into InfluxDB using Python. It covers schema mapping, batching strategies, and the key differences in querying that will affect any application sitting on top of the new store.

    Read article ↗
  13. 13 3 min read

    Clarpse - The Way Source Code Was Meant To Be Analyzed

    Clarpse is a multi-language source code analysis tool designed for extracting deep structural relationships between entities in a codebase — classes, methods, fields, imports, and the connections between them. It exposes these relationships through a clean, language-agnostic API that decouples downstream tooling from any particular compiler or parser. Features like jump-to-definition, find-usages, type inference, and documentation generation can be built on top of Clarpse without re-implementing the underlying language analysis for each supported language. The library currently supports Java and Go, with a design that makes adding additional language backends straightforward.

    Read article ↗
  14. 14 6 min read

    Developing a Search Engine using Elastic Search

    Building a search engine that works well across Arabic and English content is harder than it looks — tokenization, stemming, and relevance ranking all behave differently across language families. This post documents the architecture of a multi-language search system built with Elasticsearch, including the index configuration, language-specific analyzers, and query strategies that made retrieval accurate across both languages. It covers the practical edge cases that appear when users mix languages within a single query and how those were handled at the application layer. The lessons apply to any multilingual content platform that needs more than naive keyword matching.

    Read article ↗
  15. 15 3 min read

    Analyzing JavaScript Programmatically In Java Using The Google-Closure Compiler

    The Google Closure Compiler is designed to minify and optimize JavaScript, but its internals include a full AST parser that can be repurposed for programmatic code analysis. This post demonstrates how to use the Closure Compiler as a JavaScript parsing backend from Java, giving you access to a structured representation of any JavaScript source file without writing your own parser. The approach enables use cases like dependency analysis, code quality checks, and automated refactoring tooling. It is particularly useful when you need to reason about JavaScript code programmatically within a JVM-based toolchain.

    Read article ↗

© 2026 Muntazir Fadhel. All rights reserved.