1. 01 Latest 14 min read

    The ML and Infrastructure Architecture Behind striff.io

    A walkthrough of the async Kafka-staged pipeline, Triton-based inference serving, and degradation hierarchy that powers striff.io’s architectural review system. Covers why the pipeline moved from synchronous to event-driven, how three independent Kafka worker tiers decouple graph construction, GNN scoring, and LLM annotation, the distributed systems problems that Triton separation introduces, and the three-tier degradation strategy…

    striff-gnn striff-lib clarpse mlops-blueprint
    Read article ↗
  2. 02 16 min read

    Detecting Architectural Anomalies in Code with Graph Neural Networks

    How striff.io uses a neurosymbolic pipeline with typed dependency graphs, Chidamber-Kemerer features, and a distilled GCN to flag the parts of a pull request that actually carry architectural risk. The post covers the graph construction pipeline, the 404-dimensional feature vector design, why spectral GCN was chosen over attention-based architectures, and how symbolic facts are fused…

    striff-gnn striff-lib clarpse mlops-blueprint
    Read article ↗
  3. 03 8 min read

    Building a Production Neurosymbolic Pipeline for Scientific Discourse Graphs

    Lessons from building Fylo’s ingestion pipeline that turns scientific papers into typed discourse graphs. Covers ShEx schema as executable contract, a phased LLM extraction loop that closes the validator-LLM feedback gap, and a three-tier cross-document merge policy that keeps the graph converging as more papers are ingested.

    Read article ↗
  4. 06 10 min read

    Designing a Production MLOps Pipeline: The Decisions That Actually Determine Reliability

    Teams that invest heavily in model development often ship the surrounding infrastructure as an afterthought — and pay for it later in operational failures that are difficult to diagnose and expensive to fix. This post distills a reference MLOps architecture built from repeated production engagements, using a GitHub pull request categorization pipeline with a PyTorch…

    mlops-blueprint
    Read article ↗
  5. 07 1 min read

    Deploy A Production-Ready E-Commerce Solution on AWS with CloudFormation

    Shopify works until it doesn’t — vendor lock-in and transaction fees compound at scale, and the platform gives you limited control over infrastructure when it matters. This post walks through a CloudFormation-based reference architecture for running PrestaShop on AWS with the operational properties of a production system: auto-scaling application tier, managed RDS database, ElastiCache for…

    Read article ↗
  6. 08 11 min read

    3 Common Misunderstandings of Inter-Service Communication in MicroServices

    REST and message queues are the two dominant approaches to inter-service communication in distributed systems, and teams frequently choose between them based on assumptions that do not hold under scrutiny. Synchronous HTTP calls are not always simpler or more reliable than async messaging, and message queues are not always the right choice when decoupling is…

    Read article ↗
  7. 09 8 min read

    Why Object Oriented Code Accelerates Microservices Adoption

    Migrating a monolith to microservices is a difficult undertaking regardless of technical quality, but the difficulty scales dramatically with how coupled and procedural the source code is. When a codebase lacks clear object boundaries, the decomposition process becomes a guessing game about which pieces can be extracted without breaking everything else. This post demonstrates how…

    Read article ↗
  8. 10 10 min read

    4 Elements of A Great Serverless Application Deployment Strategy

    Serverless apps depend on many managed services — storage, caches, load balancers, execution environments — which makes deployment automation non-trivial compared to a single application binary. Without structure, provisioning infrastructure and deploying code across dev, staging, and production environments becomes a manual, error-prone process. This post covers four practices for keeping that process automated and…

    Read article ↗
  9. 11 16 min read

    Dissecting GitHub Code Reviews: A Text Classification Experiment

    Code review comments on GitHub contain a wealth of signal about what engineers care about — naming, logic, performance, test coverage — but that signal is buried in unstructured free text. This post builds an SVM classifier to categorize over 30,000 GitHub pull request review comments by the main technical topic each addresses. The dataset,…

    PRs Welcome Open Source Love Star
    As part of the code review process on GitHub, developers can leave comments on portions of the unified diff of a GitHub pull request. These comments are extremely valuable in facilitating technical discussion amongst developers, and in allowing developers to get feedback on their code submissions.

    But what do code reviewers usually discuss in these comments?

    In an effort to better understand code reviewing discussions, we’re going to create an SVM classifier to classify over 30 000 GitHub review comments based on the main code-related topic addressed by each comment (e.g. naming, readability, etc.).

    Grab the Jupyter Notebook for this experiment on GitHub.

    sample_comment

    Review Comment Classifications

    The list of classifications we’re going to incorporate into our classifier are summarized in the table below. This list was developed based on a manual survey of approximately 2000 GitHub review comments I performed on randomly selected, but highly forked Java repositories on GitHub.

    The selected categories reflect the most frequently occurring topics encountered in the surveyed review comments. Majority of the categories are related to code level concepts (e.g. variable naming, exception handling); however, certain review comments that did not naturally fall into any existing categories and were unrelated to the overall goal of code reviewing were placed in the “other” category.

    In situations where a review comment discussed more than one subject, I gave it a classification according to the topic it spent the most words discussing.

    Category Label Further Explanation      Sample Comment      
    Readability 1 Comments related to readability, style, general project conventions. “This code looks very convoluted to me”
    Naming 2   “I think foo would be a more appropriate name”
    Documentation 3 Comments related to licenses, package info, module documentation, commenting. “Please add a comment here explaining this logic”
    Error/Resource Handling 4 Comments related to exception/resource handling, program failure, termination analysis, resource . “Forgot to catch a possible exception here”
    Control Structures/Program Flow 5 Comments related to usage of loops, if-statements, placement of individual lines of code. “This if-statement should be moved after the while loop”
    Visibility/ Access 6 Comments related to access level for classes, fields, methods and local variables. “Make this final”
    Efficiency / Optimization 7   “Many unnecessary calls to foo() here”
    Code Organization/ Refactoring 8 Comments related to extracting code from methods and classes, moving large chunks of code around. “Please extract this logic into a separate method”
    Concurrency 9 Comments related to threads, synchronization, parallelism. “This class does not look thread safe”
    High Level Method Semantics & Design 10 Comments relating to method design and semantics. “This method should return a String”
    High Level Class Semantics & Design 11 Comments relating to class design and semantics. “This should extend Foo”
    Testing 12   “is there a test for this?”
    Other 13 Comments not relating to categories 1-12. “Looks good”, “done”, “thanks”

    Loading The Data Set

    Now we’ll discuss our SVM text classifier implementation. This experiment represents a typical supervised learning classification exercise.

    We’ll start by first loading our training data consisting of two files representing 2000 manually labeled comment-classification pairs. The first file contains a review comment on each line, while the second file contains manually determined classifications for each corresponding review comment on each line.

    with open('review_comments.txt') as f:
        review_comments = f.readlines()
        
    with open('review_comments_labels.txt') as g:
        classifications = g.readlines()
        
    
    Read article ↗
  10. 12 4 min read

    API Documentation Using Tables

    Swagger UI communicates endpoint details well but fails at conveying the shape of a complex API at a glance. When an API exposes dozens of resources and hundreds of operations, developers need a high-level map before they can navigate the detail. This post proposes a table-based documentation format that presents resources, operations, and their relationships…

    Read article ↗
  11. 13 7 min read

    The Roots of Object Oriented Programming

    Most OOP languages claim the label but miss what Alan Kay actually meant when he coined the term. Polymorphism, encapsulation, and inheritance are frequently cited as its pillars — but these exist in functional languages too. Kay’s actual vision was rooted in biology: autonomous objects communicating exclusively through message passing, with no direct access to…

    Read article ↗
  12. 14 3 min read

    Monitor Disk Usage Levels on Slack

    Disk space exhaustion is a quiet failure mode — systems degrade gradually until they stop working entirely, often at the worst possible moment. This post shares a bash script that monitors local disk storage levels and reports them to a Slack channel at a configurable interval, color-coded by usage severity. The script uses standard Unix…

    License PRs Welcome
    Integrations are what takes Slack from a normal online instant messaging and collaboration system to a solution that enables you to centralize all your notifications, from sales to tech support, social media and more, into one searchable place where your team can discuss and take action on each. In this article, I’ll share a simple bash script that reports local disk storage levels to Slack at a continuous time interval. It is easily deployable to multiple instances, highly configurable, and can helps teams take proactive measures in maintaining the operational well-being of their systems.

    Download the App on GitHub

    inheritance

    The script is available on GitHub and can be dropped anywhere on the instance you want to monitor. At a specified interval, it will post disk storage related information to slack as illustrated above. The drive information is retrieved using the df -h command on Unix systems. Additionally, listed drives on the system are color coded based on how much storage capacity they have left. Two quick steps are required for getting the integration setup and running.

    1 - Create a Slack Webhook Notification:

    This will allow the script to post as a bot/integration instead of as yourself (which would require your personal credentials). First, ensure the Incoming WebHooks app is installed in your slack organization. Next, click Add Configuration and read the instructions to configure the integration settings as desired. Copy the value for the Webhook URL field, which will be required in the next step.

    inheritance

    2 - Use a time-based job scheduler to run the script:

    The job scheduler will execute the script regularly at a time interval based on how often we want to view the reports. On a Linux environment, the crontab command, which is used to schedule commands to be executed periodically, is the perfect tool for the job. To create a new cronjob, simply type crontab -e in a command prompt. New jobs can be installed by adding a new entry to the file with the following syntax:

    1 2 3 4 5 /path/to/command arg1 arg2
    
    Read article ↗
  13. 15 5 min read

    Easily Migrate Postgres/MySQL Records to InfluxDB

    Relational databases were not designed for time series data — as write volumes grow, table cardinality climbs and query performance degrades in ways that are hard to tune around. Purpose-built time series databases like InfluxDB handle this workload efficiently by design, with compression, downsampling, and retention policies built in from the start. This post explains…

    Read article ↗

© 2026 Muntazir Fadhel. All rights reserved.