🦔 QIL+LLM

Intent handling BEFORE sending the prompt to the LLM

Client-Side Processing Token Optimization Fuzzy Matching Local-First

💾 View and Download My Code

I don't like it when code is hidden, so I put my code right at the begining! Click here to view my most current implementation of QIL+LLM and download the associated code.

📖 What is QIL?

QIL Isn't LLM. QIL is a lightweight, client-side preprocessing layer that intercepts user prompts and intelligently executes local functions. Combined with an LLM as QIL+LLM, QIL processes prompts locally, based on intent, before sending anything to an LLM. QIL is also independent of LLM usa. By using advanced similarity matching, QIL can route user intent to the appropriate tools without requiring exact command syntax—making it perfect for both prompt augmentation and intent-based command routing.

The Core Idea: Why send a prompt to an expensive LLM when a local function can answer it instantly? QIL identifies which parts of a user's prompt can be handled locally, executes those tools, and either augments the prompt with results or bypasses the LLM entirely.

Notes:

🔄 How It Differs from RAG and MCP

QIL+LLM is complementary to both RAG (Retrieval-Augmented Generation) and MCP (Model Context Protocol), but serves a different purpose:

Method Description Token Cost Impact
Standard LLM Processes the prompt using its internal knowledge. High (for all tasks).
LLM+RAG Retrieves external content and augments the prompt before LLM processing. Can be reduced, but large augmentation increases cost.
LLM+MCP LLM requests specific information via formal tool calls during processing. Reduces irrelevant content, but function definitions and back-and-forth add overhead.
QIL+LLM Local system executes tools based on prompt matching before sending to LLM. Directly reduces token usage by preemptively answering predictable queries.
Note: QIL+LLM is not a replacement for RAG or MCP. A complete, robust application would likely implement all three: QIL+LLM+RAG+MCP.

⚙️ How QIL Works

The QIL process runs entirely on the client side before any LLM interaction:

  1. Split: The user's prompt is intelligently split into logical segments (sentences or intent phrases).
  2. Match: Each segment is compared against configured utterances using a custom similarity algorithm based on Levenshtein distance and Monge-Elkan similarity (MEWF).
  3. Execute & Replace:
  4. Reassemble: Processed segments are joined back together to form an augmented prompt.
  5. Decision: If all segments were handled by tools, the augmented result can be returned directly to the user. Otherwise, send the augmented prompt to the LLM.

Example: Time Query

🎯 Fuzzy Command Routing

One of QIL's most powerful features is its ability to perform intent-based command routing without requiring exact syntax. Traditional command systems fail if users don't type commands perfectly. QIL uses similarity matching to understand user intent.

Example: Help Command

Traditional Approach: User must type exactly /help or help

QIL Approach: All of these work:

QIL matches all these variations to the same format_help() function without requiring exact string matching or complex regex patterns.

This makes QIL ideal for chatbots, CLI tools, and voice interfaces where users naturally phrase requests differently each time. You get the flexibility of LLM-style natural language understanding with the speed and zero cost of local function execution.

📝 Configuration Format

QIL uses a simple, human-readable configuration file to map utterances to Python functions. The format is intentionally minimalist:

# Configuration file: qil.conf # Format: module.function_name(optional_defaults): # utterance # utterance with [PARAMETER] # Help command - multiple natural variations formatting.format_help: help give me help what can I do what are the commands commands # Customer info - with optional branch parameter formatting.format_customers: customers who are the customers who are the customers for [BRANCH] customers for [BRANCH] # Function with default parameters llm.make_prompt(query="This is an example prompt."): prompt whats the prompt example prompt show me the prompt ai prompt

Configuration Rules

🧮 Similarity Matching Algorithm

QIL uses a custom similarity algorithm that combines:

The algorithm:

  1. Normalizes both user input and utterances (lowercase, remove punctuation)
  2. Removes exact matching words
  3. Separates wildcards from static template words
  4. Greedily pairs remaining words using minimum Levenshtein distance
  5. Wildcards consume remaining user tokens at zero cost
  6. Penalizes unmatched words
  7. Returns a similarity score from 0-100

Default threshold is 95% similarity. This allows for minor typos and variations while maintaining high precision.

🔌 Wildcard Parameters

Utterances can include wildcard parameters using [PARAMETER] syntax. QIL automatically extracts numeric values from the user's input:

banking.get_customer_info: customer info info for customer [CUSTOMERID] get details for [CUSTOMERID]

User input: "Get details for customer 12345"
QIL extracts: customerid=12345
Function call: banking.get_customer_info(customerid=12345)

🔄 The "Failover to LLM" Workflow

When a tool fails (unavailable, missing parameters, invalid input), QIL gracefully falls back to the original behavior. The unmodified sentence is kept in the prompt and sent to the LLM. There is no loss—just a shift from local execution to remote LLM processing.

⚡ Complete Replacement: Zero LLM Calls

If QIL successfully handles every segment of a user's prompt with local tools, there is no need to call the LLM at all. The augmented prompt becomes the final response, resulting in:

🎯 Key Benefits

🛠️ Use Cases

1. Intent-Based Command Routing

Replace rigid if-else command parsing with fuzzy intent matching. Instead of checking for exact strings like "/help", map all variations to functions: "help", "help me", "what can I do", "I need assistance", etc.

2. Customer Service Chatbots

Handle common queries locally (hours, locations, account balance) while routing complex queries to the LLM. Reduces API costs for high-traffic endpoints.

3. Data Dashboard Queries

Map natural language questions like "what were sales yesterday" or "show me revenue" to database queries. Execute locally, return results instantly.

4. Multi-Language Command Interfaces

Define utterances in multiple languages for the same function. QIL handles routing without complex NLP pipelines.

🚀 Getting Started

  1. Create a qil.conf file with your function mappings
  2. Import the QIL module: from qil import qil
  3. Process user input: augmented_prompt, all_replaced = qil(user_input)
  4. If all_replaced is True, return augmented_prompt directly
  5. Otherwise, send augmented_prompt to your LLM
from qil import qil user_prompt = "What time is it? Help me with my account." augmented_prompt, fully_handled = qil(user_prompt, "lib/qil.conf") if fully_handled: # All queries were handled locally print(augmented_prompt) else: # Send augmented prompt to LLM response = llm.complete(augmented_prompt) print(response)

📄 License

This project is open source and available under the MIT License.