QIL+LLM

💾 View and Download My Code

I don't like it when code is hidden, so I put my code right at the begining! Click here to view my most current implementation of QIL+LLM and download the associated code.

📖 What is QIL?

QIL Isn't LLM. QIL is a lightweight, client-side preprocessing layer that intercepts user prompts and intelligently executes local functions. Combined with an LLM as QIL+LLM, QIL processes prompts locally, based on intent, before sending anything to an LLM. QIL is also independent of LLM usa. By using advanced similarity matching, QIL can route user intent to the appropriate tools without requiring exact command syntax—making it perfect for both prompt augmentation and intent-based command routing.

The Core Idea: Why send a prompt to an expensive LLM when a local function can answer it instantly? QIL identifies which parts of a user's prompt can be handled locally, executes those tools, and either augments the prompt with results or bypasses the LLM entirely.

Notes:

From LLM+TAG to QIL+LLM: This project was originally titled LLM+TAG (Tool-Augmented Generation). The transition to QIL+LLM (QIL Isn't LLM) serves two purposes:
1. Logical Flow: Placing QIL before LLM reflects its role as a preemptive gatekeeper that intercepts prompts before they reach the model.
2. Identity: The recursive backronym emphasizes that QIL is a deterministic, local-first engine, not a probabilistic cloud model.
The "Adventure Game" Heritage: The inspiration for this project stems from a realization that routing user intent is a problem already solved decades ago by text-based adventure games. Long before modern AI, these systems used robust string-matching and intent-parsing to turn natural language into game actions. QIL applies that same "zero-latency" philosophy to the modern AI stack, handling the predictable so the LLM can focus on the complex.

🔄 How It Differs from RAG and MCP

QIL+LLM is complementary to both RAG (Retrieval-Augmented Generation) and MCP (Model Context Protocol), but serves a different purpose:

Method	Description	Token Cost Impact
Standard LLM	Processes the prompt using its internal knowledge.	High (for all tasks).
LLM+RAG	Retrieves external content and augments the prompt before LLM processing.	Can be reduced, but large augmentation increases cost.
LLM+MCP	LLM requests specific information via formal tool calls during processing.	Reduces irrelevant content, but function definitions and back-and-forth add overhead.
QIL+LLM	Local system executes tools based on prompt matching before sending to LLM.	Directly reduces token usage by preemptively answering predictable queries.

Note: QIL+LLM is not a replacement for RAG or MCP. A complete, robust application would likely implement all three: QIL+LLM+RAG+MCP.

⚙️ How QIL Works

The QIL process runs entirely on the client side before any LLM interaction:

Split: The user's prompt is intelligently split into logical segments (sentences or intent phrases).
Match: Each segment is compared against configured utterances using a custom similarity algorithm based on Levenshtein distance and Monge-Elkan similarity (MEWF).
Execute & Replace:
- If a segment achieves a sufficiently high similarity score (default 95%) with an utterance, the associated tool is executed.
- The tool's response replaces the original segment in the prompt.
- If no match is found or the tool fails, the original segment remains unchanged.
Reassemble: Processed segments are joined back together to form an augmented prompt.
Decision: If all segments were handled by tools, the augmented result can be returned directly to the user. Otherwise, send the augmented prompt to the LLM.

Example: Time Query

Original Prompt: "What time is it? I want to know if it's OK to go to lunch."
QIL identifies: "What time is it?" matches the get_current_time() function.
QIL replaces: Function returns "It is currently 10:42 AM."
Augmented Prompt: "It is currently 10:42 AM. I want to know if it's OK to go to lunch."
Result: LLM receives context-rich prompt, saving tokens and latency.

🎯 Fuzzy Command Routing

One of QIL's most powerful features is its ability to perform intent-based command routing without requiring exact syntax. Traditional command systems fail if users don't type commands perfectly. QIL uses similarity matching to understand user intent.

Example: Help Command

Traditional Approach: User must type exactly /help or help

QIL Approach: All of these work:

"help"
"help me"
"give me help"
"what can I do"
"what are the commands"

QIL matches all these variations to the same format_help() function without requiring exact string matching or complex regex patterns.

This makes QIL ideal for chatbots, CLI tools, and voice interfaces where users naturally phrase requests differently each time. You get the flexibility of LLM-style natural language understanding with the speed and zero cost of local function execution.

📝 Configuration Format

QIL uses a simple, human-readable configuration file to map utterances to Python functions. The format is intentionally minimalist:

# Configuration file: qil.conf
# Format: module.function_name(optional_defaults):
#   utterance
#   utterance with [PARAMETER]

# Help command - multiple natural variations
formatting.format_help:
  help
  give me help
  what can I do
  what are the commands
  commands

# Customer info - with optional branch parameter
formatting.format_customers:
  customers
  who are the customers
  who are the customers for [BRANCH]
  customers for [BRANCH]

# Function with default parameters
llm.make_prompt(query="This is an example prompt."):
  prompt
  whats the prompt
  example prompt
  show me the prompt
  ai prompt

Configuration Rules

Functions are defined as module.function_name: (ending with colon)
Optional default parameters: module.function(param="value"):
Utterances are listed below the function (indentation is optional but improves readability)
Parameters in utterances use [PARAMETER] syntax
QIL automatically extracts values from user input and passes them to functions
Lines starting with # are comments

🧮 Similarity Matching Algorithm

QIL uses a custom similarity algorithm that combines:

Levenshtein Distance: Measures character-level differences between words
Monge-Elkan Similarity: Finds optimal word pairings between user input and utterances
Wildcard Support: [PARAMETER] tokens match any user input at zero cost

The algorithm:

Normalizes both user input and utterances (lowercase, remove punctuation)
Removes exact matching words
Separates wildcards from static template words
Greedily pairs remaining words using minimum Levenshtein distance
Wildcards consume remaining user tokens at zero cost
Penalizes unmatched words
Returns a similarity score from 0-100

Default threshold is 95% similarity. This allows for minor typos and variations while maintaining high precision.

🔌 Wildcard Parameters

Utterances can include wildcard parameters using [PARAMETER] syntax. QIL automatically extracts numeric values from the user's input:

banking.get_customer_info:
  customer info
  info for customer [CUSTOMERID]
  get details for [CUSTOMERID]

User input: "Get details for customer 12345"
QIL extracts: customerid=12345
Function call: banking.get_customer_info(customerid=12345)

🔄 The "Failover to LLM" Workflow

When a tool fails (unavailable, missing parameters, invalid input), QIL gracefully falls back to the original behavior. The unmodified sentence is kept in the prompt and sent to the LLM. There is no loss—just a shift from local execution to remote LLM processing.

⚡ Complete Replacement: Zero LLM Calls

If QIL successfully handles every segment of a user's prompt with local tools, there is no need to call the LLM at all. The augmented prompt becomes the final response, resulting in:

Zero API costs
Sub-second latency
Complete privacy (no data leaves the client)

🎯 Key Benefits

Reduced Token Costs: Preemptively answers predictable queries without LLM involvement
Lower Latency: Local tool execution is orders of magnitude faster than LLM API calls
Privacy: Sensitive data can be processed locally without cloud transmission
Fuzzy Matching: Users don't need to memorize exact commands—natural variations work automatically
Complementary: Works seamlessly alongside RAG and MCP for comprehensive solutions
Graceful Degradation: Failed tools simply fall back to LLM processing

🛠️ Use Cases

1. Intent-Based Command Routing

Replace rigid if-else command parsing with fuzzy intent matching. Instead of checking for exact strings like "/help", map all variations to functions: "help", "help me", "what can I do", "I need assistance", etc.

2. Customer Service Chatbots

Handle common queries locally (hours, locations, account balance) while routing complex queries to the LLM. Reduces API costs for high-traffic endpoints.

3. Data Dashboard Queries

Map natural language questions like "what were sales yesterday" or "show me revenue" to database queries. Execute locally, return results instantly.

4. Multi-Language Command Interfaces

Define utterances in multiple languages for the same function. QIL handles routing without complex NLP pipelines.

🚀 Getting Started

Create a qil.conf file with your function mappings
Import the QIL module: from qil import qil
Process user input: augmented_prompt, all_replaced = qil(user_input)
If all_replaced is True, return augmented_prompt directly
Otherwise, send augmented_prompt to your LLM

from qil import qil

user_prompt = "What time is it? Help me with my account."
augmented_prompt, fully_handled = qil(user_prompt, "lib/qil.conf")

if fully_handled:
    # All queries were handled locally
    print(augmented_prompt)
else:
    # Send augmented prompt to LLM
    response = llm.complete(augmented_prompt)
    print(response)

📄 License

This project is open source and available under the MIT License.

🦔 QIL+LLM