Skip to content

AI Provider

Config

This section defines how to configure the AI Provider connection, including media parameters and WebSocket connection details.

Fields

  • name (string): Display name of the AI Provider.
  • media (object): Specifies the audio encoding and sample rate.
  • websocket (object): Defines the WebSocket endpoint and optional headers used for authentication or custom configuration.

media object

  • audioFormat (string): Supported formats are:
  • "pcm16" — sample rate 24000 Hz
  • "g711_ulaw" — sample rate 8000 Hz
  • "opus" — sample rate 48000 Hz
  • sampleRate (number): Must match the selected audio format.

websocket object

  • url (string): Secure WebSocket URL used to connect to the AI Provider. Example: wss://cad.fonouc.com:9443/ws/v1/echo
  • headers (object, optional): Key–value pairs of headers to include in the connection request. Commonly used for authorization tokens or custom metadata.

Events API

This section defines the event types exchanged over the WebSocket connection between the AI Provider and the backend. Each message is a JSON object containing an "event" field that specifies its type.


Event: start

Sent once at the beginning of the WebSocket session to initialize the call context.

Fields

  • event (string): Must be "start".
  • start (object): Contains call initialization data.

start object

  • callId (string): Unique identifier of the call.
  • otherLegCallId (string): Unique identifier of the other leg of the call.
  • callerIdNumber (string): E.164-formatted phone number of the caller.
  • calleeIdNumber (string): E.164-formatted phone number of the callee.
  • accountId (string): Account associated with the call.
  • mediaFormat (object): Defines the audio encoding and sample rate used for the call.

mediaFormat object

  • encoding (string): Audio encoding format. Supported values are:
  • "pcm16" — sample rate 24000 Hz
  • "g711_ulaw" — sample rate 8000 Hz
  • "opus" — sample rate 48000 Hz
  • sampleRate (number): Must match the selected encoding (24000 for pcm16, 8000 for g711_ulaw, 48000 for opus).

Example

{
  "event": "start",
  "start": {
    "callId": "call_12345",
    "otherLegCallId": "call_67890",
    "callerIdNumber": "+1234567890",
    "calleeIdNumber": "+1234567891",
    "accountId": "acc_abc123",
    "mediaFormat": {
      "encoding": "pcm16",
      "sampleRate": 24000
    }
  }
}

Event: media

Carries audio data between the backend and the AI Provider. Multiple messages of this type are exchanged during the call.

Fields

  • event (string): Must be "media".
  • media (object): Contains the encoded audio payload.

media object

  • payload (string): Base64-encoded audio data.

Example

{
  "event": "media",
  "media": {
    "payload": "GkXfo59ChoEBQveBAULygQRC84EIQoKEd2VibUKHgQCh..."
  }
}

Event: dtmf

Sent when the caller presses a key on their phone keypad during the call. DTMF (Dual-Tone Multi-Frequency) tones are used for interactive voice response (IVR) navigation, entering PIN codes, or other user input scenarios.

Fields

  • event (string): Must be "dtmf".
  • dtmf (object): Contains the DTMF digit information.

dtmf object

  • callId (string): Unique identifier of the call.
  • otherLegCallId (string): Unique identifier of the other leg of the call.
  • digit (string): The DTMF digit pressed. Valid values are:
  • "0" through "9" — numeric digits
  • "*" — asterisk/star key
  • "#" — pound/hash key
  • "A", "B", "C", "D" — extended DTMF tones (less common)
  • duration (number, optional): Duration of the key press in milliseconds.

Example

{
  "event": "dtmf",
  "dtmf": {
    "callId": "call_12345",
    "otherLegCallId": "call_67890",
    "digit": "5",
    "duration": 100
  }
}

Use Cases

  • IVR Navigation: The AI Provider can use DTMF events to navigate menu options (e.g., "Press 1 for sales, Press 2 for support").
  • PIN Entry: Capture secure numeric input from the caller.
  • Call Transfer: Use specific key combinations to trigger call transfers or other actions.
  • Interruption Handling: Detect when a caller wants to interrupt the AI agent by pressing a key.

Event: hangup

Sent when the call ends to notify the AI Provider that the session is terminating. This event is sent immediately before the WebSocket connection is closed.

Fields

  • event (string): Must be "hangup".
  • hangup (object): Contains hangup information.

hangup object

  • callId (string): Unique identifier of the call.
  • otherLegCallId (string): Unique identifier of the other leg of the call.
  • reason (string, optional): The reason for the hangup. Common values include:
  • "NORMAL_CLEARING" — Normal call termination
  • "USER_BUSY" — Called party is busy
  • "NO_ANSWER" — No answer from called party
  • "CALL_REJECTED" — Call was rejected
  • "ORIGINATOR_CANCEL" — Caller hung up before answer

Example

{
  "event": "hangup",
  "hangup": {
    "callId": "call_12345",
    "otherLegCallId": "call_67890",
    "reason": "NORMAL_CLEARING"
  }
}

Use Cases

  • Session Cleanup: The AI Provider can use this event to clean up resources, save conversation logs, or trigger post-call processing.
  • Analytics: Track call duration and termination reasons for reporting purposes.
  • State Management: Gracefully terminate any ongoing AI processing or speech synthesis.

Message Flow Overview

  1. Server → AI Provider: Sends a single start event to begin the session.
  2. Server ↔ AI Provider: Exchanges continuous media events containing audio frames.
  3. Server → AI Provider: Sends dtmf events when the caller presses keypad digits.
  4. Server → AI Provider: Sends a hangup event when the call ends.
  5. Connection Close: Server closes the WebSocket connection after sending the hangup event.