🦔 LLM+TAG

Tool-Augmented Generation

Client-Side Processing Token Optimization Local-First

📋 Overview

LLM+TAG (Tool-Augmented Generation) is a method for processing information specific to a user's prompt using local tools before the prompt is sent to a Large Language Model (LLM). This process is executed entirely on the client side, avoiding LLM-associated token usage for common, predictable requests.

The goal of TAG is to handle "low-hanging fruit" tasks by executing a tool and replacing the original user query with the tool's result.

The inspiration for TAG came from thinking about how text-adventure games in the 1980s could handle fuzzy commands. If a Commodore 64 could interpret Stab the stupid dragon now, then surely a modern computer can map natural language sentences to local function calls.

💡 Why LLM+TAG?

Traditional LLM usage, including Retrieval-Augmented Generation (RAG) and Model Context Protocol (MCP) embedding, involves token costs for every step: input, processing, and output. While RAG retrieves specific content to augment the prompt, it can still incur high token usage if the augmented content is large or irrelevant. MCP, on the other hand, increases both IO and token usage as the LLM decides what information it needs and sends requests back to the client.

Method Description Token Cost Impact
Standard LLM Processes the prompt using its internal knowledge. High (for all tasks).
LLM+RAG Retrieves external content and augments the prompt. Can be reduced, but large augmentation increases cost.
LLM+MCP LLM requests specific information via formal tool calls. Reduces irrelevant content, but defining functions and back-and-forth adds overhead.
LLM+TAG Local system executes tools based on prompt sentences before sending to LLM. Directly reduces token usage by preemptively answering predictable queries.
Note: LLM+TAG is not a replacement for RAG or MCP. A complete, robust application would likely implement all three: LLM+TAG+RAG+MCP.

⚙️ How TAG Works

The TAG process is entirely local and does not engage the LLM:

  1. Split: The user's prompt is split into individual sentences.
  2. Match: Every sentence is compared to a configured list of utterances.
  3. Execute & Replace:
  4. Reassemble & Send: The processed sentences are joined back together to form a new, augmented prompt. This new prompt is then sent to the LLM (or to RAG/MCP processing).

Example

📝 Configuration

TAG relies on a configuration file to map user sentences to executable tools. The configuration file may be implemented in many different ways. The following uses ini file format with multiline support.

# Simple function call with an optional parameter. [func:get_current_time] # If any of these utterances is a good match, the get_current_time() function is called. utterances: what time is it whats the current time what time is it in [CITY] time please # An API call that gets back three key values, but only responds with one of them. [api:stock_price_lookup] url: https://stock.tracker.us/api/rest/lookup.php key: avg_price, open_price, close_price response: The average price for [SYMBOL] on [DATE] was {avg_price}. utterances: get the stock price for [SYMBOL] on [DATE] what was the price of [SYMBOL] on [DATE] on [DATE] what was the price of [SYMBOL] # An application execution that includes the entire application output in the response. [app:datacenter_health.exe] response: Current datacenter health is: {}. utterances: hows the datacenter what is the datacenter health how is the datacenter doing # An API call without a defined response, so it responds with the value from the key. [api:hedgehog_facts] url: http://example.com/hedgehogs/get_a_fact key: fact utterances: tell me about hedgehogs give me a hedgehog fact

🔌 Tool Flexibility

The tool definition is flexible, allowing for various operations, including:

The configuration format allows you to define all necessary parameters (like URLs or API keys) and even define parameter values within the utterances (e.g., [CITY]). The response may be formatted in the configuration as well to include the user's parameter values (e.g., [SYMBOL]) and the tool's responses (e.g., {avg_price}). To include the entire response from the tool, embedded in a formatted response sentence, use {}.

🔄 The "Failover to LLM" Workflow

There will be times that a tool fails. It could be that the tool is unavailable. It could be that the user's prompt didn't include required parameters or that the values provided are not valid. In the case that a tool fails, the attempt to use TAG on that sentence aborts. The original prompt from the user is kept and sent to the LLM. There is no loss. It is simply a matter of moving local execution to the remote LLM.

⚡ Crucial Detail: Complete Replacement

It is possible for TAG to replace every sentence in a user's prompt with a tool response. In this scenario, there is no reason to send the request to the LLM, and the tool-augmented prompt becomes the final reply to the user.

🎯 Key Benefits

📄 License

This project is open source and available under the MIT License.