guides
Making LLMs Fast and Small: A Guide to Inference Optimization Research in 2026
Five approaches to making LLMs faster and cheaper — compression, diffusion decoding, architecture, KV cache, and sparse attention — explained with real numbers.
guides
Five approaches to making LLMs faster and cheaper — compression, diffusion decoding, architecture, KV cache, and sparse attention — explained with real numbers.
THINC trains a 4B parameter model to reason entirely in code. It scored 78.1% on competition math, beating Qwen3-235B at 75.2%. Here's how the method works.
Set up Chrome DevTools MCP with Claude Code, Cursor, or Gemini CLI in 5 minutes. 44 tools for debugging, performance, network, and browser automation.
$725B in AI capex, 100K+ layoffs, and 275K unfilled AI roles. The numbers behind big tech's 2026 human-to-GPU trade — and what it means for your career.
Three open-source PDF-to-Markdown tools for RAG and LLM pipelines, tested on real documents. Speed, table fidelity, install pain, and which one to pick.
Long-form posts in your inbox roughly once a week — research breakdowns, tutorials, comparisons, the occasional review. No tracking pixels, no growth-hacked subject lines.
Or grab the RSS feed — same posts, no email needed.
I'm Maksim. By day I lead an engineering team at inDrive. After hours I ship side projects (PageBloom, NotesPilot, MyDevKit, startgaze) and write things up here when I learn something worth keeping.
The blog itself runs on an agentic publishing pipeline I built and rebuilt — a slow-moving experiment in how much of a writer's workflow can be automated without losing the voice. It writes, fact-checks, and refreshes; I edit, decide, and publish.