After a successful installation, three new services will be available to run:
They cannot be started successfully, however, until Asterisk and Voiceglue are properly configured.
The phoneglue service needs to log in to the Asterisk manager with username “phoneglue” and password “phoneglue” (configurable with command-line arguments), so configure Asterisk's manager.conf with an entry like this for Asterisk 1.4:
[phoneglue]
secret=phoneglue
read = system,call,log,verbose,command,agent,user
write = system,call,log,verbose,command,agent,user
and like this for Asterisk 1.6:
[phoneglue]
secret=phoneglue
read = system,call,log,verbose,command,agent,user,originate
write = system,call,log,verbose,command,agent,user,originate
Also make sure you have:
enabled=yes
in the same file.
The asterisk dialplan must be used to route calls to voiceglue. The simplest way to do this would be a dialplan looking something like:
[phoneglue]
exten => 1,1,Answer
exten => 1,2,Agi(agi://localhost)
exten => 1,3,Hangup
Here, whenever a call is routed to context phoneglue extension 1
it will first get answered, then routed to voiceglue (the Agi command),
then hungup.
Parameters can be passed to the VXML script like this:
exten => 1,1,Answer
exten => 1,2,Set(vxmlarg=bar)
exten => 1,3,Agi(agi://localhost/foo=${vxmlarg})
exten => 1,4,Hangup
Here, the value bar will be available in the VXML script as
the value of session-scoped variable session.connection.initargs.foo
and foo=bar will be passed as a URL-encoded argument to the
initial VXML script fetch.
As a larger example:
exten => 1,1,Answer
exten => 1,2,Set(vxmlurl=http%3A%2F%2Fvgweb-laptop%2Fvxml%2Furlin.vxml)
exten => 1,3,Set(vxmlarg=foo)
exten => 1,4,Set(virthost=vgweb-laptop)
exten => 1,5,Agi(agi://vgweb-laptop/url=${vxmlurl}&arg=${vxmlarg}&virthost=${virthost})
exten => 1,6,Agi(agi://vgweb-laptop/url=${vxmlurl}&arg=${vxmlarg}&virthost=${virthost})
exten => 1,7,Agi(agi://vgweb-laptop/url=${vxmlurl}&arg=${vxmlarg}&virthost=${virthost})
exten => 1,8,Hangup
Here, 3 parameters are passed to the VXML script, url, arg,
and virthost, and these are available in the script as
session.connection.initargs.url, session.connection.initargs.arg,
and session.connection.initargs.virthost.
The url parameter is special, as it is used as the URL of
the initial VXML page to fetch and run for the call.
If this is not specified here, voiceglue uses the definitions
in the /etc/voiceglue.conf file to determine the initial script.
Notice the percent-encoding of the url argument so that it
doesn't confuse the parameter parsing. The URL
represented in the example is actually http://vgweb-laptop/vxml/urlin.vxml.
For details on percent-encoding, see a reference such as
http://en.wikipedia.org/wiki/Percent_encoding
Also notice that if you are going to pass another URL as a parameter, then you need to URL encode also the percent sign. E.g.:
Set(vxmlurl=http%3A%2F%2Falt.com%2Fvxml%2Fdoit.vxml)
Set(vxmlarg=http%253A%252F%252Fanother.site%252Fdot%252Fcom.vxml)
Agi(agi://localhost/url=${vxmlurl}%26arg=${vxmlarg})
If you are getting the url from an external source (SIP for instance), then you can encode it like this:
Set(uri_encoded=${URIENCODE(${BASE64_ENCODE(${external_uri})})})
Agi(agi://localhost/url=${vxmlurl}%26uri=${uri_encoded})
Remember to (base64) decode it in your script! '
Notice this portion of the last example dialplan above:
exten => 1,5,Agi(agi://vgweb-laptop/url=${vxmlurl}&arg=${vxmlarg}&virthost=${virthost})
exten => 1,6,Agi(agi://vgweb-laptop/url=${vxmlurl}&arg=${vxmlarg}&virthost=${virthost})
exten => 1,7,Agi(agi://vgweb-laptop/url=${vxmlurl}&arg=${vxmlarg}&virthost=${virthost})
This shows three consecutive calls to voiceglue. If the VXML script does not hang up the call with a <disconnect> tag, the dialplan continues on upon termination of the script. Additionally, if a namelist is passed to the <exit> tag in the VXML script, then those variable names and values will be set as channel variables in the asterisk dialplan. Because of asterisk limitations on AGI syntax, these values should be scalars with no special characters.
Even if you use the uri parameter in the AGI command in your asterisk
dialplan, the /etc/voiceglue.conf file must be present and valid. The
file /etc/voiceglue.conf containins the all-important definition of
ast_sound_dir (don't remove this!) and additional lines that
contain a whitespace-separated DNIS
(incoming number) and url pair per line. Such a pair maps the
incoming phone number to that url to load. As mentioned above, when passing
the uri parameter to the AGI command from asterisk, this mapping is ignored.
The /etc/voiceglue.conf file can be changed dynamically and
voiceglue will immediately notice
any changes. The wildcard dnis * can be used to match anything that
isn't matched otherwise.
So, an example /etc/voiceglue.conf could contain:
* http://localhost/vxml/welcome-audiofile.vxml
This would result in all incoming calls being handled by the
welcome-audiofile.vxml script found at http://localhost/vxml/.
Additional parameters that affect the operation of voiceglue may also be placed in the /etc/voiceglue.conf file. The format of every parameter is:
parameter = value
The whitespace on either side of the ”=” is required.
All of the parameters below are optional – the default value is used if it is not specified in /etc/voiceglue.conf:
| Parameter | Default | Meaning |
|---|---|---|
| blind_xfer_method | transfer | The Asterisk method used to implement the VXML transfer tag, choices are “transfer” or “dial” |
| audio_fetch_retry | 60 | The number of seconds after an audio fetch fails to wait before retrying |
| audio_fetch_timeout | 7 | The default number of seconds to wait for audio to be retrieved from a source before timing out |
| audio_maxage | 300 | The default number of seconds that an audio cache entry remains valid |
| cache_purge_interval | 420 | The number of seconds between audio cache purges |
| cache_lastused_purge | 240 | The number of seconds of non-use that will cause an audio cache item to be purged |
| ssml_passthrough | 0 | If = 1, will pass all SSML markup to the TTS generator |
By default, the dynlog program collects all logs from the phoneglue and voiceglue processes. It is not strictly required, but without it you will be scouring multiple log files to find out what's happening. The logs are written to /var/log/dynlog/dynlog. Dynlog has a dynamic log-level changing capability, so by running “dynlog_level 7” you will get the full output from all voiceglue components. Running “dynlog_level 4” will get you back to a more sane level. These levels are identical to those used by syslog. I recommend setting the level to 7 (the highest) when you are trying to debug a problem with voiceglue. Note that you don't have to stop or re-start anything; dynlog and its clients coordinate dynamically to achieve the appropriate level of logging. Note, also, that this could cause massive performance changes if done on a loaded system.
The VXML <log> markups appear in dynlog as well, and are assigned log level 5.
After performing the above configuration steps and making sure asterisk is running, start the voiceglue services by rebooting or running as root:
/etc/init.d/dynlog start
/etc/init.d/phoneglue start
/etc/init.d/voiceglue start
These services must always be brought up in this order (and after Asterisk is running), and be brought down in the reverse order.
Once everything is up and stays up, you should be able to call
in and have the VXML file(s) specified in /etc/voiceglue.conf
or in the arguments interpreted.
Voiceglue supports the following audio file formats:
Each of these is only supported in so far as the installed Asterisk supports them.
WARNING: Some versions of Asterisk have a bug in their implementation of mp3 support for the STREAM FILE command. Until this bug is fixed, voiceglue cannot play mp3s.
The audio format of a file must be able to be determined by voiceglue before it can be used. Voiceglue first checks the Content-Type returned by the HTTP server that supplied the audio data. The supported Content-Type fields are:
| Content-Type | Audio Format |
|---|---|
| audio/basic | ulaw |
| audio/x-alaw-basic | alaw |
| audio/x-wav | slin |
| audio/x-gsm | gsm |
| audio/mpeg | mp3 |
If the Content-Type field is not defined, or is returned as text/plain (which is common if your web server is not configured with the proper content type mapping for the audio file extensions), then voiceglue attempts to determine the audio file type by the filename extension. The supported extensions are:
| Extension | Audio Format |
|---|---|
| .ulaw | ulaw |
| .au | ulaw |
| .pcm | ulaw |
| .ul | ulaw |
| .mu | ulaw |
| .alaw | alaw |
| .al | alaw |
| .wav | slin |
| .gsm | gsm |
| .mp3 | mp3 |
The VXML specification does not require audio streaming. It implies that audio fetches are finite, and requires only that an implementation start playing audio after the resource has been completely fetched. The specification does permit for an “optimization” whereby an implementation can start playing audio before it has been completely fetched, but voiceglue does not perform this optimization.
Voiceglue employs a shared audio caching mechanism that provides significant performance gains when multiple calls use the same audio data.
All audio data, whether downloaded from an HTTP server or generated by TTS, is cached by default in the filesystem that is shared between voiceglue and asterisk. This cached audio data is used for all calls until it expires based on the HTTP headers returned from the web server or lack of use by the application.
Voiceglue never uses stale audio data. Thus, the VXML maxstale attribute and audiomaxstale property have no effect.
If the script author desires to prevent caching, the VXML maxage attribute or the audiomaxage property can be set to 0. This will force a non-shared and non-reusable audio fetch for that instance.
It is not possible to prevent caching by returning an HTTP header value that disables caching, such as Expires, Cache-Control:no-cache, or Cache-Control:private. While these values will prevent further use of the audio data returned, it will not prevent the sharing of audio requests from other calls that have been generated prior to retrieving this result. For this reason, the VXML maxage attribute or the audiomaxage property are the only reliable means of fully disabling caching.
Two audio data requests are considered sharable from the same cache entry if they reference the same URL (including all parameters and cookies), or if they specify the same TTS request.
Cached audio data in the shared directory will get removed when it has not been used by any application for some period of time. Currently voiceglue checks for expiry of cache items every 7 minutes (configurable by the cache_purge_interval parameter in /etc/voiceglue.conf), and removes those that have gone unused for the last 4 minutes (configurable by the cache_lastused_purge parameter in /etc/voiceglue.conf). There is no way currently to set a maximum cache size for voiceglue; this may be implemented in the future.
As mentioned above, cookies are implict parameters to audio fetch requests. Thus, if one call's document fetches set a cookie to a value, and another call's document fetches set that cookie to a different value (or create a different set of cookies), then even though they request audio from the same URL they will not share the audio. Because of this, it is important to realize the negative effects on caching that cookies can have, and to not use cookies haphazardly.
Although cookies can be set by document, script, or grammar fetches, they cannot be set by audio fetches. They are, however, always provided on all fetches, including audio fetches. The shared caching of audio fetches makes the setting of cookies from audio fetches often counterintuitive. It could be argued that there are cases where it still should be allowed, for example when shared caching is explicitly prohibited with maxage=0, but this is not currently implemented.
Using an alternate TTS implementation should be fairly
straightforward. Every time a TTS is required, voiceglue
runs the /usr/bin/voiceglue_tts_gen program with four
arguments. The first is -t (and can be ignored),
the second is the text to create, the third
is the file in which to place the audio,
and the fourth is the language as specified
by the VXML's xml:lang setting.
This generated audio file must be in 16-bit 8kHz PCM wav format
with a riff header.
The default implementation of voiceglue_tts_gen for flite is:
#!/usr/bin/perl -- -*-CPerl-*-
$file = $::ARGV[2];
system ("flite", @::ARGV[0..2]);
system ("mv", $file, $file . ".16khz.wav");
system ("sox", $file . ".16khz.wav", "-r", "8000", $file);
The last two lines convert the format from flite's default (on Ubuntu) output of 16khz wav to 8khz.
If you want to use Cepstral, this voiceglue_tts_gen file has worked:
#!/usr/bin/perl -- -*-CPerl-*-
# Cepstral interface
$file = $::ARGV[2];
system ("/usr/local/bin/swift", "-m", "text", "-o" , $file , $ARGV[1]);
Voiceglue has rudimentary support for the <transfer> tag in VXML. It only supports blind transfers. There are two different Asterisk mechanisms that may be used to implement the transfer, the “transfer” command and the “dial” command. The “transfer” command is the default and correctly returns control to the VXML script immediately, but is a less reliable Asterisk command. The “dial” command is more reliable in Asterisk, but does not return control to the VXML script until the transfered call disconnects or fails.
These choices are controlled by the blind_xfer_method parameter in the /etc/voiceglue.conf file.
The directory “examples” here contains some example VXML files that work with voiceglue. Keep in mind that there is much (some say too much) latitude in the VXML specification as to what could be supported, so not all VXML files will run without modification. Specifically, voiceglue only supports simple SRGS XML DTMF grammars, and no speech input (but working on it).
Here are what the example files do:
welcome-tts.vxml – Speaks “Welcome” in TTS
welcome-audiofile.vxml – Recorded audio of Allison saying “Welcome”
single-digit-input.vxml – Repeatedly gets and speaks a single digit
menu-input.vxml – Repeatedly gets a menu input
multi-digit-input.vxml – Repeatedly gets and speaks multiple digits
record-audio.vxml – Repeatedly records audio from the caller