research
Efficient LLM Reasoning: 7 Papers That Cut Token Costs by Up to 84%
Seven papers fix LLM overthinking: Sketch-of-Thought cuts tokens 84%, shorter chains boost accuracy 34.5%, and budget-aware prompting halves costs.
research
Seven papers fix LLM overthinking: Sketch-of-Thought cuts tokens 84%, shorter chains boost accuracy 34.5%, and budget-aware prompting halves costs.
Google Jules queues coding tasks, runs them in a cloud VM, and opens PRs while you sleep. Free tier gives 15 tasks/day. Here's what worked and what didn't.
Google's Antigravity CLI replaces Gemini CLI on June 18. Compared with Claude Code and Codex CLI on pricing, rate limits, benchmarks, and open-source status.
Build a full backend with REST API, auth, file storage, and AI endpoints by sending one JSON schema to MoonDB — zero deploy, zero config, ready in 5 minutes.
Five approaches to making LLMs faster and cheaper — compression, diffusion decoding, architecture, KV cache, and sparse attention — explained with real numbers.
Long-form posts in your inbox roughly once a week — research breakdowns, tutorials, comparisons, the occasional review. No tracking pixels, no growth-hacked subject lines.
Or grab the RSS feed — same posts, no email needed.
I'm Maksim. By day I lead an engineering team at inDrive. After hours I ship side projects (PageBloom, NotesPilot, MyDevKit, startgaze) and write things up here when I learn something worth keeping.
The blog itself runs on an agentic publishing pipeline I built and rebuilt — a slow-moving experiment in how much of a writer's workflow can be automated without losing the voice. It writes, fact-checks, and refreshes; I edit, decide, and publish.