Sidekick architecture cuts LLM costs by using small models for simple tasks

Original: We've found this sort of "sidekick" architecture to be very effective at cutting LLM spend because it allows you to do context control and not spend expensive tokens on simple tasks.

Source: gist.github.com ↗

Writing ELI5 summary…