Spitch CodyFi

Websockets Prototype Last updated: Oct 8, 2018

Introduction

This page provides beta documentation for the Spitch CodyFi websockets prototype implementation. This prototype is still under active development, and the implementation, API specification and documentation are considered works in progress. Access to the prototype is provided under NDA for collaborative development and partner projects. Issues, feature requests and general comments are welcome and should be directed to joe@spitch.ch. The primary purpose of this limited release is to support parallel development while refining the prototype API specification and implementation.

Under Development

The Spitch CodyFi websockets API and implementation are under active development and subject to change. As of Oct 8, 2018, the live demo is still a beta prototype. This means that the websockets subprotocol is subject to change and currently still in beta. Authenticated connections do utilize the official Spitch Lingware Portal SaaS infrastructure.

Live Demo Usage

The live demo provides a working web-client prototype suitable initializing a streaming websocket connection to the Spitch CodyFi backend for a specified client application.

Supported Browsers

The live demo is known to support the following browsers (latest releases as of Oct 8, 2017):

  • OSX 10.12.x:
    • Firefox (>=57.0b6)
    • Chrome (>=61.0.3163.100)
    • Opera (>=48.0.x)
    • Safari (>=11.0.1)
  • Windows 10:
    • Firefox (>=57.0b6)
    • Chrome (>=61.0.3163.100)
  • iOS 12:
    • Safari 12

The client may also function in Linux desktop, Android and iOS variants of the above browsers, but this has not been verified. There are currently no plans to support the Microsoft Edge browser.

Initializing the stream

The streaming demo may be initialized by simply pressing the Record button. Once the button is pressed, the app will attempt to connect to the websocket server and stream the audio signal to the CodyFi backend. Log, update, and response messages will be displayed in the log box in realtime.

Concurrency

The CodyFi-ws prototype implementation is only intended for testing and development. Backend support is provided for just one simultaneous streaming session. Initializing more sessions may results in sluggish response times or unexpected connection drops.

API and Connection Protocol

The websocket connection specifies the spitch-websocket-asr sub-protocol. The client side reference implementation provided for the live demo illustrates the entire end-to-end process of capturing the audio stream, open a (secure) websocket connection to the server, and negotiating the initilization of a speech-to-text session. Custom client-side processing of response objects can be implemented using the this.ws.onmessage event processor based on the message types described below.

The live demo reference implementation should be referred to for testing, hacking and development for the present time: wsmicrophonestreamer.js, and wsclient.js.

CodyFi-ws session initialization object

The initial handshake requires that the client provide the server with a JSON format message which configures the CodyFi session. This object identifies the application, authentication token, and other optional configuration information.

{
  "app":"appid.LANG",
  "srate": 8000,
  "message": "this is an arbitrary message",
  "token": "security-token",
  "riff": {}
}
CodyFi-ws session initialization parameters
Required Parameter Type Description
✔ app string The CodyFi backend app identifier. Composed of the app name and the ISO language and locale code, separated by a '.'.
✔ srate int The sample rate. Always 8000.
x message string Optional text message.†
✔ stream-updates string Stream updates on session or not.true|false
✔ token string The security token used to open a session.
✔ continuous boolean Perform continuous verification.true|false [for future release].
✔ riff {} A riff header object.†

†Not currently implemented on server.

CodyFi-ws accepted object

The first message object returned upon successful session creation is the accepted object. This simply notifies that the ws server has completed the setup handshake with the browser,

{
  "state": "accepted",
  "message": "Connection accepted"
}
CodyFi-ws asrconnected object

Once the session initialization handshake is completed, the server will attempt to setup for voice data. When this setup is complete, it will return a asrconnected response object indicating that the CodyFi backend is ready to process audio data.

{
  "state": "asrconnected",
  "message": "ready to receive data"
}
CodyFi-ws update object

If the initialization object sets "stream-updates": true, then the server will return intermediate update objects identifying the current state of the verification stream. These responses take the following form.

{
  "state": "asrconnected",
  "message": {
    / Partial ASR results
  }
}
CodyFi-ws results object

The results object is returned following termination of a verification session. In continuous verification mode, it may also be returned repeatedly mid session. The results object provides information about the verification outcome, including the score, and estimated number of general audio and speech samples processed by the CodyFi backend.

{
  "state": "finished",
  "message": {
    "rtf": 0.923683,
    "result": [
      {
        "interpretation": {
          "input": {
            "literal": "John Doe",
            "mode": "speech"
          },
          "confidence": 1,
          "grammar": "session:request1@form-level.store",
          "instance": {
            "SWI_literal": "John Doe",
            "confidence": 1,
            "SWI_wordTimings": {
              "alignment": {
                "num_segments": 0,
                "unit_msecs": 1,
                "version": "1.0.0",
                "type": "word",
                "segments": []
              }
            },
            "SWI_ssmConfidences": [],
            "SWI_spoken": "John Doe",
            "SWI_meaning": null,
            "SWI_ssmMeanings": []
          }
        }
      }
    ]
  }
}

Disclaimer

The CodyFi-ws prototype API is provided without warranty and without guarantee that it is fit for any particular purpose. It is intended as a reference prototype for collaborative development. It is currently under active development and is not a finished product or service.

Spitch AG - Prototype Websockets API for Speech-to-Text conversion

Interested in customizing and scaling this solution?

Check the 8005228062 corporate site for more information about other sandbox prototypes and \ enterprise products available for streamlining your business processes and improving customer satisfaction.