💾 View and Download My Code
I don't like it when code is hidden, so I put my code right at the begining!
Click here to view my most current implementation of QIL+LLM and download the associated code.
📖 What is QIL?
QIL Isn't LLM. QIL is a lightweight, client-side preprocessing layer that intercepts user prompts and intelligently executes local functions.
Combined with an LLM as QIL+LLM, QIL processes prompts locally, based on intent, before sending anything to an LLM.
QIL is also independent of LLM usa. By using advanced similarity matching, QIL can route user intent to the appropriate tools without requiring exact command syntax—making it perfect for both prompt augmentation and intent-based command routing.
The Core Idea: Why send a prompt to an expensive LLM when a local function can answer it instantly? QIL identifies which parts of a user's prompt can be handled locally, executes those tools, and either augments the prompt with results or bypasses the LLM entirely.
Notes:
- From LLM+TAG to QIL+LLM: This project was originally titled LLM+TAG (Tool-Augmented Generation). The transition to QIL+LLM (QIL Isn't LLM) serves two purposes:
- Logical Flow: Placing QIL before LLM reflects its role as a preemptive gatekeeper that intercepts prompts before they reach the model.
- Identity: The recursive backronym emphasizes that QIL is a deterministic, local-first engine, not a probabilistic cloud model.
- The "Adventure Game" Heritage: The inspiration for this project stems from a realization that routing user intent is a problem already solved decades ago by text-based adventure games. Long before modern AI, these systems used robust string-matching and intent-parsing to turn natural language into game actions. QIL applies that same "zero-latency" philosophy to the modern AI stack, handling the predictable so the LLM can focus on the complex.
🔄 How It Differs from RAG and MCP
QIL+LLM is complementary to both RAG (Retrieval-Augmented Generation) and MCP (Model Context Protocol), but serves a different purpose:
| Method |
Description |
Token Cost Impact |
| Standard LLM |
Processes the prompt using its internal knowledge. |
High (for all tasks). |
| LLM+RAG |
Retrieves external content and augments the prompt before LLM processing. |
Can be reduced, but large augmentation increases cost. |
| LLM+MCP |
LLM requests specific information via formal tool calls during processing. |
Reduces irrelevant content, but function definitions and back-and-forth add overhead. |
| QIL+LLM |
Local system executes tools based on prompt matching before sending to LLM. |
Directly reduces token usage by preemptively answering predictable queries. |
Note: QIL+LLM is not a replacement for RAG or MCP. A complete, robust application would likely implement all three: QIL+LLM+RAG+MCP.
⚙️ How QIL Works
The QIL process runs entirely on the client side before any LLM interaction:
- Split: The user's prompt is intelligently split into logical segments (sentences or intent phrases).
- Match: Each segment is compared against configured utterances using a custom similarity algorithm based on Levenshtein distance and Monge-Elkan similarity (MEWF).
- Execute & Replace:
- If a segment achieves a sufficiently high similarity score (default 95%) with an utterance, the associated tool is executed.
- The tool's response replaces the original segment in the prompt.
- If no match is found or the tool fails, the original segment remains unchanged.
- Reassemble: Processed segments are joined back together to form an augmented prompt.
- Decision: If all segments were handled by tools, the augmented result can be returned directly to the user. Otherwise, send the augmented prompt to the LLM.
Example: Time Query
- Original Prompt: "What time is it? I want to know if it's OK to go to lunch."
- QIL identifies: "What time is it?" matches the
get_current_time() function.
- QIL replaces: Function returns "It is currently 10:42 AM."
- Augmented Prompt: "It is currently 10:42 AM. I want to know if it's OK to go to lunch."
- Result: LLM receives context-rich prompt, saving tokens and latency.
🎯 Fuzzy Command Routing
One of QIL's most powerful features is its ability to perform intent-based command routing without requiring exact syntax.
Traditional command systems fail if users don't type commands perfectly. QIL uses similarity matching to understand user intent.
Example: Help Command
Traditional Approach: User must type exactly /help or help
QIL Approach: All of these work:
- "help"
- "help me"
- "give me help"
- "what can I do"
- "what are the commands"
QIL matches all these variations to the same format_help() function without requiring exact string matching or complex regex patterns.
This makes QIL ideal for chatbots, CLI tools, and voice interfaces where users naturally phrase requests differently each time.
You get the flexibility of LLM-style natural language understanding with the speed and zero cost of local function execution.
📝 Configuration Format
QIL uses a simple, human-readable configuration file to map utterances to Python functions.
The format is intentionally minimalist:
# Configuration file: qil.conf
# Format: module.function_name(optional_defaults):
# utterance
# utterance with [PARAMETER]
# Help command - multiple natural variations
formatting.format_help:
help
give me help
what can I do
what are the commands
commands
# Customer info - with optional branch parameter
formatting.format_customers:
customers
who are the customers
who are the customers for [BRANCH]
customers for [BRANCH]
# Function with default parameters
llm.make_prompt(query="This is an example prompt."):
prompt
whats the prompt
example prompt
show me the prompt
ai prompt
Configuration Rules
- Functions are defined as
module.function_name: (ending with colon)
- Optional default parameters:
module.function(param="value"):
- Utterances are listed below the function (indentation is optional but improves readability)
- Parameters in utterances use
[PARAMETER] syntax
- QIL automatically extracts values from user input and passes them to functions
- Lines starting with
# are comments
🧮 Similarity Matching Algorithm
QIL uses a custom similarity algorithm that combines:
- Levenshtein Distance: Measures character-level differences between words
- Monge-Elkan Similarity: Finds optimal word pairings between user input and utterances
- Wildcard Support:
[PARAMETER] tokens match any user input at zero cost
The algorithm:
- Normalizes both user input and utterances (lowercase, remove punctuation)
- Removes exact matching words
- Separates wildcards from static template words
- Greedily pairs remaining words using minimum Levenshtein distance
- Wildcards consume remaining user tokens at zero cost
- Penalizes unmatched words
- Returns a similarity score from 0-100
Default threshold is 95% similarity. This allows for minor typos and variations while maintaining high precision.
🔌 Wildcard Parameters
Utterances can include wildcard parameters using [PARAMETER] syntax. QIL automatically extracts numeric values from the user's input:
banking.get_customer_info:
customer info
info for customer [CUSTOMERID]
get details for [CUSTOMERID]
User input: "Get details for customer 12345"
QIL extracts: customerid=12345
Function call: banking.get_customer_info(customerid=12345)
🔄 The "Failover to LLM" Workflow
When a tool fails (unavailable, missing parameters, invalid input), QIL gracefully falls back to the original behavior.
The unmodified sentence is kept in the prompt and sent to the LLM. There is no loss—just a shift from local execution to remote LLM processing.
⚡ Complete Replacement: Zero LLM Calls
If QIL successfully handles every segment of a user's prompt with local tools, there is no need to call the LLM at all.
The augmented prompt becomes the final response, resulting in:
- Zero API costs
- Sub-second latency
- Complete privacy (no data leaves the client)
🎯 Key Benefits
- Reduced Token Costs: Preemptively answers predictable queries without LLM involvement
- Lower Latency: Local tool execution is orders of magnitude faster than LLM API calls
- Privacy: Sensitive data can be processed locally without cloud transmission
- Fuzzy Matching: Users don't need to memorize exact commands—natural variations work automatically
- Complementary: Works seamlessly alongside RAG and MCP for comprehensive solutions
- Graceful Degradation: Failed tools simply fall back to LLM processing
🛠️ Use Cases
1. Intent-Based Command Routing
Replace rigid if-else command parsing with fuzzy intent matching. Instead of checking for exact strings like "/help",
map all variations to functions: "help", "help me", "what can I do", "I need assistance", etc.
2. Customer Service Chatbots
Handle common queries locally (hours, locations, account balance) while routing complex queries to the LLM.
Reduces API costs for high-traffic endpoints.
3. Data Dashboard Queries
Map natural language questions like "what were sales yesterday" or "show me revenue" to database queries.
Execute locally, return results instantly.
4. Multi-Language Command Interfaces
Define utterances in multiple languages for the same function. QIL handles routing without complex NLP pipelines.
🚀 Getting Started
- Create a
qil.conf file with your function mappings
- Import the QIL module:
from qil import qil
- Process user input:
augmented_prompt, all_replaced = qil(user_input)
- If
all_replaced is True, return augmented_prompt directly
- Otherwise, send
augmented_prompt to your LLM
from qil import qil
user_prompt = "What time is it? Help me with my account."
augmented_prompt, fully_handled = qil(user_prompt, "lib/qil.conf")
if fully_handled:
# All queries were handled locally
print(augmented_prompt)
else:
# Send augmented prompt to LLM
response = llm.complete(augmented_prompt)
print(response)
📄 License
This project is open source and available under the MIT License.