Tool-Augmented Generation
LLM+TAG (Tool-Augmented Generation) is a method for processing information specific to a user's prompt using local tools before the prompt is sent to a Large Language Model (LLM). This process is executed entirely on the client side, avoiding LLM-associated token usage for common, predictable requests.
The goal of TAG is to handle "low-hanging fruit" tasks by executing a tool and replacing the original user query with the tool's result.
The inspiration for TAG came from thinking about how text-adventure games in the 1980s could handle fuzzy commands. If a Commodore 64 could interpret Stab the stupid dragon now, then surely a modern computer can map natural language sentences to local function calls.
Traditional LLM usage, including Retrieval-Augmented Generation (RAG) and Model Context Protocol (MCP) embedding, involves token costs for every step: input, processing, and output. While RAG retrieves specific content to augment the prompt, it can still incur high token usage if the augmented content is large or irrelevant. MCP, on the other hand, increases both IO and token usage as the LLM decides what information it needs and sends requests back to the client.
| Method | Description | Token Cost Impact |
|---|---|---|
| Standard LLM | Processes the prompt using its internal knowledge. | High (for all tasks). |
| LLM+RAG | Retrieves external content and augments the prompt. | Can be reduced, but large augmentation increases cost. |
| LLM+MCP | LLM requests specific information via formal tool calls. | Reduces irrelevant content, but defining functions and back-and-forth adds overhead. |
| LLM+TAG | Local system executes tools based on prompt sentences before sending to LLM. | Directly reduces token usage by preemptively answering predictable queries. |
The TAG process is entirely local and does not engage the LLM:
get_current_time() tool.TAG relies on a configuration file to map user sentences to executable tools. The configuration file may be implemented in many different ways. The following uses ini file format with multiline support.
The tool definition is flexible, allowing for various operations, including:
func:)api:)mcp:)exec:)
The configuration format allows you to define all necessary parameters (like URLs or API keys) and even define parameter values within the utterances (e.g., [CITY]).
The response may be formatted in the configuration as well to include the user's parameter values (e.g., [SYMBOL]) and the tool's responses (e.g., {avg_price}).
To include the entire response from the tool, embedded in a formatted response sentence, use {}.
There will be times that a tool fails. It could be that the tool is unavailable. It could be that the user's prompt didn't include required parameters or that the values provided are not valid. In the case that a tool fails, the attempt to use TAG on that sentence aborts. The original prompt from the user is kept and sent to the LLM. There is no loss. It is simply a matter of moving local execution to the remote LLM.
It is possible for TAG to replace every sentence in a user's prompt with a tool response. In this scenario, there is no reason to send the request to the LLM, and the tool-augmented prompt becomes the final reply to the user.
This project is open source and available under the MIT License.