AI Agents Platform¶
Overview¶
Our platform allows the creation and management of AI Personas: customizable virtual assistants that can interact with users through natural voice conversations. Each persona can be specialized for a specific use case (e.g., weather assistant, receptionist, sales support) and configured with flexible speech, language, and tool integrations.
Persona Profile¶
The Profile section defines the identity and role of the persona: - Name: A unique identifier for the persona. Example: Carla. - Job: The role of the assistant. Example: Weather Assistant. - Description: A short explanation of the assistant’s purpose.
Speech Engine¶
The Speech Engine defines how the assistant processes audio interactions. There are two main modes available:
1. Single Real-Time Model¶
In this mode, a single provider/model handles the entire conversational flow (speech recognition, reasoning, and voice synthesis) in real time.
- Advantages:
- Lower latency
- Simpler setup
- Useful for assistants that need fast, natural dialogue
- Configuration:
- Provider: Choose from supported vendors (e.g., Google)
- Model: Select a real-time multimodal model (e.g., Gemini 2.5 Flash Preview Native Audio Dialog)
2. Modular Pipeline¶
This mode allows breaking down the speech process into separate components, giving maximum flexibility and control.
- Components:
-
STT (Speech-to-Text): Converts spoken input into text (e.g., Deepgram Nova-3)
-
LLM (Language Model): Processes the text and generates responses (e.g., OpenAI GPT-4o Mini)
-
TTS (Text-to-Speech): Converts text back into audio for the user (e.g., OpenAI GPT-4o Mini TTS)
-
Advantages:
- Freedom to select the best provider for each stage
- Fine-grained customization (accuracy, cost, and performance tuning)
- Easier integration with external tools
Voice¶
The voice identity defines how the persona sounds in conversations. - Provider: Currently only available with OpenAI. Demo: https://www.openai.fm/ - Voice Selection: Choose from OpenAI’s voice models. - Vibe: Adjusts the personality style of the assistant (e.g., casual, formal, empathetic)
Prompt and LLM Tools¶
Definitions¶
What is a Prompt?
A prompt is the set of instructions that tells a Large Language Model (LLM) how it should behave. It defines the assistant’s role, tone, and rules. For example:
- “You are a phone assistant that helps callers with office availability and weather information.”
- “Always greet the caller politely.”
What is an LLM Tool?
An LLM tool (sometimes called function or API call) is an external capability that the model can use to get real data or perform an action.
Tools are like actions the assistant can take when the prompt tells it: - The prompt explains when to use the tool. - The tool provides the real answer or action. - The assistant then translates that into a natural response for the user.
Example to join the dots¶
Prompt:
You are a phone assistant that helps callers with office availability and weather information. - Always greet the caller politely. - If the caller asks whether the office is open, call is_office_open. - If the office is open, say: "The office is currently open." - If the office is closed, say: "The office is currently closed." - If the caller asks about the weather in a specific city, call get_weather with the city as the parameter, then tell the caller the result in simple, friendly words. - If the caller needs to be transferred to another number, call transfer_call with the provided destination number. - Never invent information; always use the available tools. - Always answer in a natural, conversational tone suitable for phone calls.
Flow cases¶
User: "Is your office open today?" Assistant (execute LLM tool): call is_office_open Tool returns: { "open": true } Assistant: "Yes, the office is currently open."
User: "What’s the weather like in New York?" Assistant (execute LLM tool): call get_weather with "New York" Tool returns: { "temperature": 72, "condition": "Cloudy" } Assistant: "In New York it’s 72 degrees and cloudy right now."
User: "Please transfer me to 123456789." Assistant (execute LLM tool): call transfer_call with "123456789" Assistant: "Sure, I’ll transfer your call now."
Note: it’s a good practice to use
snake_case(lodash style) when referring to LLM tools, as shown in the example.