This shows you the differences between the selected revision and the current version of the page.
| — | voiceglue_0.12_user_guide 2010/05/14 05:35 current | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | ====== Verifying Voiceglue Installation ====== | ||
| + | |||
| + | After a successful installation, three new services will | ||
| + | be available to run: | ||
| + | |||
| + | * dynlog | ||
| + | * phoneglue | ||
| + | * voiceglue | ||
| + | |||
| + | They cannot be started successfully, however, until Asterisk and | ||
| + | Voiceglue are properly configured. | ||
| + | |||
| + | ====== Configuring Asterisk ====== | ||
| + | |||
| + | The phoneglue service needs to log in to the Asterisk manager with | ||
| + | username "phoneglue" and password "phoneglue" (configurable with | ||
| + | command-line arguments), so configure Asterisk's manager.conf with an | ||
| + | entry like this for Asterisk 1.4: | ||
| + | |||
| + | [phoneglue] | ||
| + | secret=phoneglue | ||
| + | read = system,call,log,verbose,command,agent,user | ||
| + | write = system,call,log,verbose,command,agent,user | ||
| + | |||
| + | and like this for Asterisk 1.6: | ||
| + | |||
| + | [phoneglue] | ||
| + | secret=phoneglue | ||
| + | read = system,call,log,verbose,command,agent,user,originate | ||
| + | write = system,call,log,verbose,command,agent,user,originate | ||
| + | |||
| + | Also make sure you have: | ||
| + | |||
| + | enabled=yes | ||
| + | |||
| + | in the same file. | ||
| + | |||
| + | ===== Passing Calls and Values to Voiceglue ===== | ||
| + | |||
| + | The asterisk dialplan must be used to route calls to voiceglue. | ||
| + | The simplest way to do this would be a dialplan looking something like: | ||
| + | |||
| + | [phoneglue] | ||
| + | exten => 1,1,Answer | ||
| + | exten => 1,2,Agi(agi://localhost) | ||
| + | exten => 1,3,Hangup | ||
| + | |||
| + | Here, whenever a call is routed to context ''phoneglue'' extension ''1'' | ||
| + | it will first get answered, then routed to voiceglue (the Agi command), | ||
| + | then hungup. | ||
| + | |||
| + | Parameters can be passed to the VXML script like this: | ||
| + | |||
| + | exten => 1,1,Answer | ||
| + | exten => 1,2,Set(vxmlarg=bar) | ||
| + | exten => 1,3,Agi(agi://localhost/foo=${vxmlarg}) | ||
| + | exten => 1,4,Hangup | ||
| + | |||
| + | Here, the value ''bar'' will be available in the VXML script as | ||
| + | the value of session-scoped variable session.connection.initargs.foo | ||
| + | and ''foo=bar'' will be passed as a URL-encoded argument to the | ||
| + | initial VXML script fetch. | ||
| + | As a larger example: | ||
| + | |||
| + | exten => 1,1,Answer | ||
| + | exten => 1,2,Set(vxmlurl=http%3A%2F%2Fvgweb-laptop%2Fvxml%2Furlin.vxml) | ||
| + | exten => 1,3,Set(vxmlarg=foo) | ||
| + | exten => 1,4,Set(virthost=vgweb-laptop) | ||
| + | exten => 1,5,Agi(agi://vgweb-laptop/url=${vxmlurl}&arg=${vxmlarg}&virthost=${virthost}) | ||
| + | exten => 1,6,Agi(agi://vgweb-laptop/url=${vxmlurl}&arg=${vxmlarg}&virthost=${virthost}) | ||
| + | exten => 1,7,Agi(agi://vgweb-laptop/url=${vxmlurl}&arg=${vxmlarg}&virthost=${virthost}) | ||
| + | exten => 1,8,Hangup | ||
| + | |||
| + | Here, 3 parameters are passed to the VXML script, ''url'', ''arg'', | ||
| + | and ''virthost'', and these are available in the script as | ||
| + | session.connection.initargs.url, session.connection.initargs.arg, | ||
| + | and session.connection.initargs.virthost. | ||
| + | The ''url'' parameter is special, as it is used as the URL of | ||
| + | the initial VXML page to fetch and run for the call. | ||
| + | If this is not specified here, voiceglue uses the definitions | ||
| + | in the /etc/voiceglue.conf file to determine the initial script. | ||
| + | Notice the percent-encoding of the url argument so that it | ||
| + | doesn't confuse the parameter parsing. The URL | ||
| + | represented in the example is actually ''http://vgweb-laptop/vxml/urlin.vxml''. | ||
| + | For details on percent-encoding, see a reference such as | ||
| + | http://en.wikipedia.org/wiki/Percent_encoding | ||
| + | |||
| + | Also notice that if you are going to pass another URL as a parameter, then you need to URL encode also the percent sign. | ||
| + | E.g.: | ||
| + | |||
| + | Set(vxmlurl=http%3A%2F%2Falt.com%2Fvxml%2Fdoit.vxml) | ||
| + | Set(vxmlarg=http%253A%252F%252Fanother.site%252Fdot%252Fcom.vxml) | ||
| + | Agi(agi://localhost/url=${vxmlurl}%26arg=${vxmlarg}) | ||
| + | |||
| + | If you are getting the url from an external source (SIP for instance), then you can encode it like this: | ||
| + | |||
| + | Set(uri_encoded=${URIENCODE(${BASE64_ENCODE(${external_uri})})}) | ||
| + | Agi(agi://localhost/url=${vxmlurl}%26uri=${uri_encoded}) | ||
| + | |||
| + | Remember to (base64) decode it in your script! | ||
| + | ' | ||
| + | ===== Retrieving Values From Voiceglue ===== | ||
| + | |||
| + | Notice this portion of the last example dialplan above: | ||
| + | |||
| + | exten => 1,5,Agi(agi://vgweb-laptop/url=${vxmlurl}&arg=${vxmlarg}&virthost=${virthost}) | ||
| + | exten => 1,6,Agi(agi://vgweb-laptop/url=${vxmlurl}&arg=${vxmlarg}&virthost=${virthost}) | ||
| + | exten => 1,7,Agi(agi://vgweb-laptop/url=${vxmlurl}&arg=${vxmlarg}&virthost=${virthost}) | ||
| + | |||
| + | This shows three consecutive calls to voiceglue. | ||
| + | If the VXML script does not hang up the call with a <disconnect> tag, | ||
| + | the dialplan continues on upon termination of the script. | ||
| + | Additionally, if a namelist is passed to the <exit> tag in | ||
| + | the VXML script, then | ||
| + | those variable names and values will be set as channel variables | ||
| + | in the asterisk dialplan. | ||
| + | Because of asterisk limitations on AGI syntax, these values should | ||
| + | be scalars with no special characters. | ||
| + | |||
| + | ====== Configuring Voiceglue ====== | ||
| + | |||
| + | Even if you use the ''uri'' parameter in the AGI command in your asterisk | ||
| + | dialplan, the ''/etc/voiceglue.conf'' file must be present and valid. The | ||
| + | file ''/etc/voiceglue.conf'' containins the all-important definition of | ||
| + | ast_sound_dir (don't remove this!) and additional lines that | ||
| + | contain a whitespace-separated DNIS | ||
| + | (incoming number) and url pair per line. Such a pair maps the | ||
| + | incoming phone number to that url to load. As mentioned above, when passing | ||
| + | the ''uri'' parameter to the AGI command from asterisk, this mapping is ignored. | ||
| + | |||
| + | The ''/etc/voiceglue.conf'' file can be changed dynamically and | ||
| + | voiceglue will immediately notice | ||
| + | any changes. The wildcard dnis ''*'' can be used to match anything that | ||
| + | isn't matched otherwise. | ||
| + | |||
| + | So, an example /etc/voiceglue.conf could contain: | ||
| + | |||
| + | <code bash> | ||
| + | * http://localhost/vxml/welcome-audiofile.vxml | ||
| + | </code> | ||
| + | |||
| + | This would result in all incoming calls being handled by the | ||
| + | welcome-audiofile.vxml script found at ''http://localhost/vxml/''. | ||
| + | |||
| + | Additional parameters that affect the operation of voiceglue may | ||
| + | also be placed in the /etc/voiceglue.conf file. | ||
| + | The format of every parameter is: | ||
| + | |||
| + | <code bash> | ||
| + | parameter = value | ||
| + | </code> | ||
| + | |||
| + | The whitespace on either side of the "=" is required. | ||
| + | |||
| + | All of the parameters below are optional -- the default value is | ||
| + | used if it is not specified in /etc/voiceglue.conf: | ||
| + | |||
| + | ^ Parameter ^ Default ^ Meaning ^ | ||
| + | | blind_xfer_method | transfer | The Asterisk method used to implement the VXML transfer tag, choices are "transfer" or "dial" | | ||
| + | | audio_fetch_retry | 60 | The number of seconds after an audio fetch fails to wait before retrying | | ||
| + | | audio_fetch_timeout | 7 | The default number of seconds to wait for audio to be retrieved from a source before timing out | | ||
| + | | audio_maxage | 300 | The default number of seconds that an audio cache entry remains valid | | ||
| + | | cache_purge_interval | 420 | The number of seconds between audio cache purges | | ||
| + | | cache_lastused_purge | 240 | The number of seconds of non-use that will cause an audio cache item to be purged | | ||
| + | | ssml_passthrough | 0 | If = 1, will pass all SSML markup to the TTS generator | | ||
| + | |||
| + | ====== Extra Logging ====== | ||
| + | |||
| + | By default, the dynlog program collects all logs from the phoneglue | ||
| + | and voiceglue processes. It is not strictly required, but without it | ||
| + | you will be scouring multiple log files to find out what's happening. | ||
| + | The logs are written to /var/log/dynlog/dynlog. Dynlog has a dynamic | ||
| + | log-level changing capability, so by running "dynlog_level 7" you will | ||
| + | get the full output from all voiceglue components. Running | ||
| + | "dynlog_level 4" will get you back to a more sane level. These levels | ||
| + | are identical to those used by syslog. I recommend setting the level | ||
| + | to 7 (the highest) when you are trying to debug a problem with | ||
| + | voiceglue. Note that you don't have to stop or re-start anything; | ||
| + | dynlog and its clients coordinate dynamically to achieve the | ||
| + | appropriate level of logging. Note, also, that this could cause | ||
| + | massive performance changes if done on a loaded system. | ||
| + | |||
| + | The VXML <log> markups appear in dynlog as well, and are assigned | ||
| + | log level 5. | ||
| + | |||
| + | ====== Starting Voiceglue ====== | ||
| + | |||
| + | After performing the above configuration steps and making sure | ||
| + | asterisk is running, start the voiceglue services by rebooting or | ||
| + | running as root: | ||
| + | |||
| + | /etc/init.d/dynlog start | ||
| + | /etc/init.d/phoneglue start | ||
| + | /etc/init.d/voiceglue start | ||
| + | |||
| + | These services must always be brought up in this order (and | ||
| + | after Asterisk is running), and be brought down in the reverse | ||
| + | order. | ||
| + | |||
| + | Once everything is up and stays up, you should be able to call | ||
| + | in and have the VXML file(s) specified in ''/etc/voiceglue.conf'' | ||
| + | or in the arguments interpreted. | ||
| + | |||
| + | |||
| + | ====== Audio file formats ====== | ||
| + | |||
| + | Voiceglue supports the following audio file formats: | ||
| + | |||
| + | * ulaw | ||
| + | * alaw | ||
| + | * slin (16-bit 8khz signed linear) | ||
| + | * gsm | ||
| + | * mp3 | ||
| + | |||
| + | Each of these is only supported in so far as the installed | ||
| + | Asterisk supports them. | ||
| + | |||
| + | **WARNING**: Some versions of Asterisk have a bug | ||
| + | in their implementation of mp3 support for the STREAM FILE | ||
| + | command. | ||
| + | Until this bug is fixed, voiceglue cannot play mp3s. | ||
| + | |||
| + | The audio format of a file must be able to be determined by | ||
| + | voiceglue before it can be used. | ||
| + | Voiceglue first checks the Content-Type returned | ||
| + | by the HTTP server that supplied the audio data. | ||
| + | The supported Content-Type fields are: | ||
| + | |||
| + | ^ Content-Type ^ Audio Format ^ | ||
| + | | audio/basic | ulaw | | ||
| + | | audio/x-alaw-basic | alaw | | ||
| + | | audio/x-wav | slin | | ||
| + | | audio/x-gsm | gsm | | ||
| + | | audio/mpeg | mp3 | | ||
| + | |||
| + | If the Content-Type field is not defined, or is returned | ||
| + | as text/plain (which is common if your web server is not | ||
| + | configured with the proper content type mapping for the | ||
| + | audio file extensions), then voiceglue attempts to determine | ||
| + | the audio file type by the filename extension. | ||
| + | The supported extensions are: | ||
| + | |||
| + | ^ Extension ^ Audio Format ^ | ||
| + | | .ulaw | ulaw | | ||
| + | | .au | ulaw | | ||
| + | | .pcm | ulaw | | ||
| + | | .ul | ulaw | | ||
| + | | .mu | ulaw | | ||
| + | | .alaw | alaw | | ||
| + | | .al | alaw | | ||
| + | | .wav | slin | | ||
| + | | .gsm | gsm | | ||
| + | | .mp3 | mp3 | | ||
| + | |||
| + | ===== Audio Streaming ===== | ||
| + | |||
| + | The VXML specification does not require audio streaming. | ||
| + | It implies that audio fetches are finite, and requires | ||
| + | only that an implementation start playing audio after the | ||
| + | resource has been completely fetched. | ||
| + | The specification does permit for an "optimization" whereby an implementation | ||
| + | can start playing audio before it has been completely fetched, | ||
| + | but voiceglue does not perform this optimization. | ||
| + | |||
| + | ====== Audio Caching ====== | ||
| + | |||
| + | Voiceglue employs a shared audio caching mechanism | ||
| + | that provides significant performance gains when | ||
| + | multiple calls use the same audio data. | ||
| + | |||
| + | All audio data, whether downloaded from an HTTP | ||
| + | server or generated by TTS, is cached by default | ||
| + | in the filesystem that is shared between voiceglue | ||
| + | and asterisk. | ||
| + | This cached audio data is used for all calls until | ||
| + | it expires based on the HTTP headers returned from | ||
| + | the web server or lack of use by the application. | ||
| + | |||
| + | Voiceglue never uses stale audio data. | ||
| + | Thus, the VXML maxstale attribute and audiomaxstale | ||
| + | property have no effect. | ||
| + | |||
| + | If the script author desires to prevent caching, | ||
| + | the VXML maxage attribute or | ||
| + | the audiomaxage property can be set to 0. | ||
| + | This will force a non-shared and non-reusable | ||
| + | audio fetch for that instance. | ||
| + | |||
| + | It is not possible to prevent caching by returning | ||
| + | an HTTP header value that disables caching, such | ||
| + | as Expires, Cache-Control:no-cache, or | ||
| + | Cache-Control:private. | ||
| + | While these values will prevent further use of the | ||
| + | audio data returned, it will not prevent the sharing | ||
| + | of audio requests from other calls that have been | ||
| + | generated prior to | ||
| + | retrieving this result. | ||
| + | For this reason, the VXML maxage attribute or | ||
| + | the audiomaxage property are the only reliable | ||
| + | means of fully disabling caching. | ||
| + | |||
| + | Two audio data requests are considered sharable | ||
| + | from the same cache entry if they reference the | ||
| + | same URL (including all parameters **and cookies**), | ||
| + | or if they specify the same TTS request. | ||
| + | |||
| + | ===== Cache Expiry ===== | ||
| + | |||
| + | Cached audio data in the shared directory will get | ||
| + | removed when it has not been used by any application | ||
| + | for some period of time. | ||
| + | Currently voiceglue checks for expiry of cache items | ||
| + | every 7 minutes (configurable by the cache_purge_interval | ||
| + | parameter in /etc/voiceglue.conf), and removes those that have gone | ||
| + | unused for the last 4 minutes (configurable by the | ||
| + | cache_lastused_purge parameter in /etc/voiceglue.conf). | ||
| + | There is no way currently to set a maximum cache | ||
| + | size for voiceglue; this may be implemented in | ||
| + | the future. | ||
| + | |||
| + | ====== Audio fetch cookie handling ====== | ||
| + | |||
| + | As mentioned above, cookies are implict parameters to | ||
| + | audio fetch requests. | ||
| + | Thus, if one call's document fetches set a cookie to | ||
| + | a value, and another call's document fetches set that | ||
| + | cookie to a different value (or create a different set of cookies), | ||
| + | then even though they request audio from the same URL | ||
| + | they will not share the audio. | ||
| + | Because of this, it is important to realize the negative | ||
| + | effects on caching that cookies can have, and to | ||
| + | not use cookies haphazardly. | ||
| + | |||
| + | Although cookies can be set by document, script, or | ||
| + | grammar fetches, they cannot be set by audio fetches. | ||
| + | They are, however, always provided on all fetches, including | ||
| + | audio fetches. | ||
| + | The shared caching of audio fetches makes the setting | ||
| + | of cookies from audio fetches often counterintuitive. | ||
| + | It could be argued that there are cases where it still | ||
| + | should be allowed, for example when shared caching is | ||
| + | explicitly prohibited with maxage=0, but this is not | ||
| + | currently implemented. | ||
| + | |||
| + | ====== Alternate TTS ====== | ||
| + | |||
| + | Using an alternate TTS implementation should be fairly | ||
| + | straightforward. Every time a TTS is required, voiceglue | ||
| + | runs the ''/usr/bin/voiceglue_tts_gen'' program with four | ||
| + | arguments. The first is -t (and can be ignored), | ||
| + | the second is the text to create, the third | ||
| + | is the file in which to place the audio, | ||
| + | and the fourth is the language as specified | ||
| + | by the VXML's ''xml:lang'' setting. | ||
| + | This generated audio file must be in 16-bit 8kHz PCM wav format | ||
| + | with a riff header. | ||
| + | |||
| + | The default implementation of voiceglue_tts_gen for flite | ||
| + | is: | ||
| + | |||
| + | #!/usr/bin/perl -- -*-CPerl-*- | ||
| + | $file = $::ARGV[2]; | ||
| + | system ("flite", @::ARGV[0..2]); | ||
| + | system ("mv", $file, $file . ".16khz.wav"); | ||
| + | system ("sox", $file . ".16khz.wav", "-r", "8000", $file); | ||
| + | |||
| + | The last two lines convert the format from flite's default | ||
| + | (on Ubuntu) output of 16khz wav to 8khz. | ||
| + | |||
| + | If you want to use Cepstral, this voiceglue_tts_gen file has | ||
| + | worked: | ||
| + | |||
| + | #!/usr/bin/perl -- -*-CPerl-*- | ||
| + | # Cepstral interface | ||
| + | $file = $::ARGV[2]; | ||
| + | system ("/usr/local/bin/swift", "-m", "text", "-o" , $file , $ARGV[1]); | ||
| + | |||
| + | ====== Transfer ====== | ||
| + | |||
| + | Voiceglue has rudimentary support for the <transfer> tag in VXML. | ||
| + | It only supports blind transfers. | ||
| + | There are two different Asterisk mechanisms that may be used | ||
| + | to implement the transfer, the "transfer" command and the "dial" | ||
| + | command. | ||
| + | The "transfer" command is the default and correctly returns control | ||
| + | to the VXML script immediately, but is a less reliable Asterisk command. | ||
| + | The "dial" command is more reliable in Asterisk, but does not return | ||
| + | control to the VXML script until the transfered call disconnects or fails. | ||
| + | |||
| + | These choices are controlled by the blind_xfer_method parameter | ||
| + | in the /etc/voiceglue.conf file. | ||
| + | |||
| + | ====== Example VXML Files ====== | ||
| + | |||
| + | The directory "examples" here contains some example VXML files that | ||
| + | work with voiceglue. Keep in mind that there is much (some say too | ||
| + | much) latitude in the VXML specification as to what could be | ||
| + | supported, so not all VXML files will run without modification. | ||
| + | Specifically, voiceglue only supports simple SRGS XML DTMF grammars, | ||
| + | and no speech input (but working on it). | ||
| + | |||
| + | Here are what the example files do: | ||
| + | |||
| + | welcome-tts.vxml -- Speaks "Welcome" in TTS | ||
| + | |||
| + | welcome-audiofile.vxml -- Recorded audio of Allison saying "Welcome" | ||
| + | |||
| + | single-digit-input.vxml -- Repeatedly gets and speaks a single digit | ||
| + | |||
| + | menu-input.vxml -- Repeatedly gets a menu input | ||
| + | |||
| + | multi-digit-input.vxml -- Repeatedly gets and speaks multiple digits | ||
| + | |||
| + | record-audio.vxml -- Repeatedly records audio from the caller | ||
| + | |||