AI Studio · Private LLM

Ship an AI agent that actually knows your business

AI Studio is where you compose your model, knowledge, tools and guardrails into a production-grade assistant. Bring your own provider — OpenAI, Anthropic, Google, Mistral, or open-weight Llama on your GPUs — wire in retrieval over your docs, expose your existing APIs as tools, and deploy to WhatsApp, web chat, email or MCP clients in minutes.

  • BYO model: GPT, Claude, Gemini, Llama
  • RAG with vector embeddings out of the box
  • Tool-calling and MCP server included
  • Per-tenant guardrails, PII redaction, audit log
The problem

Generic chatbots, custom-built copilots, and the gap between

There are two ways to ship AI in production today, and both are broken. Option one is a vendor chatbot that answers FAQs from a scraped sitemap — it cannot place an order, check a balance, or modify a subscription. It is a glorified search box. Option two is a six-month internal build where engineering wraps OpenAI, writes the embedding pipeline, builds a tool layer, adds rate limiting, builds an evaluation harness, and ships an MVP just as the model landscape shifts.

The gap is where most operators live. They need an agent that knows their catalog, their pricing rules, their refund policy, and can actually act on those — issue a refund, escalate a ticket, book a slot — not just chat about them. They need this in weeks, not quarters. They need it auditable, multi-tenant, swappable across providers, and cheap to evaluate.

AI Studio is built for that gap. Pick a model, point at your knowledge sources, declare your tools, set your guardrails, and you have a deployable agent. Swap the model next quarter without rewriting your tools. Switch providers without re-embedding your docs. Ship the same agent to WhatsApp, web chat and your internal Slack with one config.

What it is

AI Studio, in depth.

AI Studio is a composition surface for production AI agents. The unit is an Assistant: a configured combination of model, system prompt, knowledge sources, tools, and guardrails. You can have many assistants per tenant — a sales agent, a support agent, an internal HR bot — each with its own scope and personality. Each assistant is versioned, evaluated, and deployable to any channel through SabFlow nodes or our API.

Retrieval is first-class. Connect a knowledge source (PDF folder, Notion workspace, Google Drive, public website, custom database query) and AI Studio handles chunking, embedding (using your provider or our local model), vector storage, and re-ranking. Every assistant response includes citation links back to the source chunks, and you can see retrieval quality in the eval harness. Updating a doc re-embeds only the changed chunks — incremental, fast, cheap.

Tool-calling turns the assistant from a talker into a doer. Declare a tool — `get_order_status`, `issue_refund`, `book_slot` — with a JSON schema and a backing handler (HTTP endpoint, SabFlow, or built-in CRM action). The model decides when to call the tool based on the conversation. You see every tool call in the trace, can require human approval for sensitive operations, and rate-limit per assistant or per contact. The MCP server exposes these same tools to external AI clients (Claude Desktop, Cursor, etc.) over the Model Context Protocol.

Safety and observability are non-negotiable. Every assistant ships with configurable guardrails: PII redaction on inputs (Aadhaar, PAN, credit cards, emails), output filters (no medical advice, no financial recommendations), refusal policies, and token budgets per conversation. The audit log captures every prompt, every tool call, every response — exportable for compliance review. For India deployments we honor DPDP requirements; for EU, GDPR; for healthcare, basic HIPAA-aligned redaction.

Capabilities

Everything you get with AI Studio.

7 capabilities
01

BYO model with provider abstraction

Switch between OpenAI (GPT-4, GPT-4o), Anthropic (Claude Sonnet, Opus, Haiku), Google (Gemini Pro, Flash), Mistral, Cohere, or self-hosted Llama / Qwen / DeepSeek behind a unified interface. Same prompt, same tools, different backend. Swap providers per assistant or A/B test two side by side.

02

Retrieval Augmented Generation

Ingest PDFs, web pages, Notion, Google Drive, S3, GitHub wikis, or a SQL query result. We handle chunking (token-aware, semantic), embedding, vector storage and re-ranking. Every response cites the chunks it drew from. Re-index incrementally as sources change.

03

Tool-calling and function execution

Declare tools with JSON schema. Backing handler can be an HTTP endpoint, a SabFlow, or a built-in CRM action (create_lead, move_stage, add_tag). The model decides when to call. Required-approval mode pauses execution for human sign-off on sensitive tools.

04

MCP server included

Every assistant is automatically exposed as an MCP server endpoint. Connect Claude Desktop, Cursor, Zed or any MCP client and the tools you defined for WhatsApp work in your IDE. One source of truth for AI actions across customer and team-facing surfaces.

05

Evaluation harness

Upload a CSV of (input, expected_output) pairs or curate from real conversations. Run an eval against any model and prompt combination. See win-rate, latency, cost and failure modes side-by-side. Block deploys that regress eval scores below the threshold.

06

Guardrails and PII redaction

Input filters strip Aadhaar, PAN, GST numbers, credit cards, IBAN, US SSN, emails and phone numbers before they reach the model. Output filters block policy violations. Token budgets cap runaway conversations. All configurable per assistant.

07

Audit log and trace replay

Every assistant invocation captures the full trace: input message, retrieved chunks, system prompt, model response, tool calls, final output, latency, cost. Export for compliance, replay for debugging, or pipe into your observability stack via webhook.

Use cases

Built for the way teams actually work.

D2C
Case 01

D2C product assistant on WhatsApp

Assistant indexes the product catalog, ingredient docs and review summaries. Tools include `find_products`, `check_stock`, `add_to_cart`. A customer asks "which face wash for oily skin under ₹500" and the agent searches, filters, and replies with three options and add-to-cart buttons. Conversion lifts 2-3× over static catalog browsing.

SaaS
Case 02

Internal HR bot for mid-market SaaS

Indexed on policy docs, leave calendar API and payroll system. Tools include `request_leave`, `download_payslip`, `check_balance`. Deployed to Slack and the company intranet. Cuts HR ticket volume by 70% and gives the policy team a metric for where docs are unclear.

Financial Services
Case 03

Loan officer copilot for NBFC

Agent ingests product factsheets and regulatory rules. Tools include `pull_credit_report`, `calculate_emi`, `start_application`. PII redaction strips Aadhaar before logging. Required-approval mode forces human sign-off on `start_application`. Trace export feeds RBI audit reports.

Education
Case 04

University admissions counsellor

Assistant indexes course catalogs, fee structures, and admission timelines in five languages. Tools include `book_campus_visit`, `request_brochure`, `connect_counsellor`. Deployed on WhatsApp and the .edu site. Handles 12,000 monthly applicants with two human counsellors on standby.

Logistics
Case 05

Logistics support agent

Knowledge sources include the shipment SOP wiki and the tracking API. Tools include `track_shipment`, `raise_dispute`, `request_redelivery`. Deflects 80% of "where is my package" tickets while preserving the option for hand-off to a human for genuinely stuck shipments.

How it works

From signup to first send in minutes.

AI Studio is included on every SabNode workspace. No separate billing, no extra setup — flip it on from your workspace settings.

  1. 01

    Pick a model and configure

    Choose your provider, paste an API key (or use SabNode-managed credits), set temperature, max tokens, and the system prompt. Start with a template — sales, support, internal — and customise.

  2. 02

    Ingest knowledge sources

    Connect Drive, Notion, S3, a website, or upload PDFs. AI Studio chunks, embeds and stores vectors. Initial ingestion runs in the background and surfaces progress per source.

  3. 03

    Declare tools and guardrails

    Add tools with JSON schema and backing handler. Toggle guardrails: PII redaction, output filters, token budgets, refusal patterns. Mark sensitive tools as approval-required.

  4. 04

    Evaluate before shipping

    Run the eval harness against a curated set. See win-rate, cost and latency. Iterate on prompt, model or tools until the assistant clears your threshold.

  5. 05

    Deploy to channels

    Drop an AI Generate node in a SabFlow, expose the MCP endpoint, or call the assistant API directly. Same assistant, multiple surfaces — WhatsApp, Web Chat, Slack, IDE.

Plays well with

Works with the tools you already ship on.

OpenAIAnthropicGoogle GeminiMistralMeta LlamaPineconeNotionGoogle Drive
Frequently asked

Questions about AI Studio.

Can't find what you're looking for? Talk to our team.

How do I choose between GPT, Claude, Gemini and Llama?
The eval harness is the answer. Upload 50-200 representative (input, expected) pairs from your real conversations and run the same prompt against four models. Compare win-rate, p50 latency and cost-per-thousand. For Indian-language support, Gemini Flash and GPT-4o-mini tend to lead on price-performance. For complex reasoning, Claude Sonnet often wins. For air-gapped on-prem, Llama 3.3 or Qwen 2.5 on your GPUs.
Does my data train the model?
No. We send requests to providers with their no-training flags enabled (OpenAI zero-retention, Anthropic no-train, Gemini Vertex). For maximum control, deploy an open-weight model on your own infrastructure and AI Studio talks to it over a private endpoint. The audit log gives you provable evidence of every byte sent.
How does MCP integration work?
Every assistant exposes an MCP endpoint at `mcp.sabnode.com/{tenant}/{assistant}`. Add it to Claude Desktop, Cursor, Zed or any MCP-compatible client with one config line and bearer auth. The tools you defined become callable from the IDE — your engineers can query CRM data, trigger flows or check shipment status from inside their editor with the same auth and audit trail.
What happens when a tool call fails?
Three configurable behaviors. (1) Retry with backoff up to N times. (2) Return the error to the model so it can recover gracefully (recommended for non-critical tools). (3) Stop and hand off to a human agent in the inbox with the partial conversation. You set this per tool. Failed calls land in the trace log with the full request and response for debugging.
Can I prevent the agent from giving medical or financial advice?
Yes, through three layers. (1) System prompt instructions — "you are not a doctor, refer to a professional". (2) Output filter rules that match phrases and refuse or redact. (3) An optional moderation step where the response runs through a second model that classifies and approves. For regulated industries we recommend all three plus required-approval on any tool that triggers a transaction.
How are RAG sources kept fresh?
Each source has an ingestion schedule — manual, hourly, daily, or webhook-triggered. Incremental ingestion detects changed docs by hash and only re-embeds delta chunks, so a 10,000-page knowledge base updates in minutes when one page changes. Failed ingestions raise alerts. For real-time data (live inventory, shipment status), use a tool call instead of RAG.
What does it cost to run an AI Studio agent at scale?
Three components: model inference (passed through at provider rates plus a small uplift), embeddings (one-time per chunk, then on updates), and vector storage (included up to 10M vectors on Pro plans). For a typical D2C tenant doing 30,000 AI conversations per month on Gemini Flash with RAG, costs land between ₹8,000 and ₹15,000. The eval harness lets you forecast before scaling.
AI Studio · Private LLM

Ship ai studio into production this week.

No credit card. No sales call required. Spin up a workspace, plug in a number, and your team is live in under an hour.