You are on page 1of 12

Using

Location-Aware Multimodal Speech Interfaces to Survive a Zombie Apocalypse


Christopher Contolini
christopher.contolini@uta.fi

1. Introduction
In the event of a large-scale apocalyptic assault on human civilization by recently deceased reanimated corpses, or zombies, a preexisting infrastructure allowing the real-time monitoring of assailants and guidance of armed personnel will greatly improve the likelihood of marginalized humans surviving. I propose the implementation of a system improving the conventional EU 112 emergency telephone number service with a multimodal interactive voice response (IVR) system capable of being deployed at the onset of a zombie apocalypse. The systems input and output modalities include speech recognition, speech synthesis, large visual displays, and global positioning information transmitted via common cellular telephones. Current emergency telephone services connect callers with response centers staffed with live operators who receive input, assess situations, and recommend actions based on their training. The result is often the dispatching of emergency personnel to the callers location or the providing of instructions for the caller to follow. Integrated GPS functionality in modern cellular phones allows the operator to determine the callers location with reasonable accuracy. Non-GPS capable phones can be pinpointed with a lesser degree of accuracy using antenna tower triangulation. Should the undead rise from their graves and actively seek consumption of human flesh (and/or brains), traditional emergency call centers will be unable to handle the sudden influx of incoming requests for assistance. Increasing the number of staff to accommodate the escalated demand will prove difficult due to decreased personnel reliability as a result of death or exodus, and dangers imposed by having large numbers of humans in a fixed environment; namely, the attraction of zombies to both noise and concentrations of human flesh. A zombie

outbreak within vicinity of a call center would force evacuation, causing service outages. A decentralized and redundant IP telephony network would allow favorable uptime in situations where it would be dangerous for humans to remain stationary for extended periods of time. The location of safe houses and gun caches can be provided to callers. Information received from callers about the quantity of zombies in his or her presence is added to a database alongside GPS coordinates taken from the users device. Large public displays strategically located in safe houses and central squares show maps with the locations of recently spotted zombies to help affected citizens avoid dangerous areas and track movement of zombie hordes.

2. Speech Recognition Technology Requirements


Speech recognition is an imperfect process and it is important to address specific properties of the systems recognizer before design and development processes begin. Attribute Required value Vocabulary and language A small dictionary of important action words are required by the speech recognizer. System recognizes predefined keywords. Grammar and system interaction model considers possible situations and user base. Safety of citizens requires the system to support some degree of extensibility. Additional vocabulary can be added should new features be desired. Rationale

Vocabulary size

Small

Grammar

Phrase-based

Extensibility

Changeable

Communication style This is a public application, so speaker and gender independent models must be applied. Word-spotting is emphasized. The ability to interrupt system outputs and offer new commands is important in life-threatening situations. This will also allow longer and more informative outputs to be offered.

Speaker

Independent

Speaking style

Continuous

Overlap

Barge-in

Usage conditions With the majority of phone calls taking place in hectic situations, the system must be prepared to handle hostile environments. System operates over cellular phone networks and a particular users reception may be poor.

Environment

Hostile

Channel quality

Low-quality

2. Draft of the Systems Speech Interface


When a zombie or zombie horde is spotted, citizens dial a three-digit emergency number on their mobile phone and use voice input to inform a virtual agent of the situation. Speech input is recognized and speech output is provided in the form of directions to safe houses or gun caches. While supported by the application alongside speech input, traditional touchpad input is impractical because callers will often be fleeing on foot, unable to devote visual attention to a handheld device. While modern smartphones are capable of providing new interaction experiences, including tactile and haptic interfaces, the sensitivity of this application necessitates broad compatibility across older generations of mobile phones with a low word error rate (WER) threshold of less than 5%.

The system supports both system-initiative and mixed-initiative dialogue strategies. Due to the safety-critical domain of the application, most users will be trained how to use the system before calling for the first time. Expert users will prefer the mixed-initiative approach, inputting known commands without waiting for guidance, while novice and regular users will require the system to take initiative and guide the conversation. Basic commands include SAFE HOUSE, when the location of the nearest safe house is desired, GUN, when the location of the nearest gun cache is sought and, HELP when immediate support is requested in the form of medical or defensive assistance. Since all paramedics and safety personnel will be heavily armed, there is no need to specify between requests for medical or firepower support. Word-spotting is an important trait of the speech recognizer but supplementary speech inputs; namely, screams of various volumes, lengths and tones, should be able to be spotted alongside standard words. It is possible that callers will be communicating over low-quality channels in hostile environments. Contributing to the less-than-ideal situation of the call is the likelihood of a user being assaulted by a zombie during the application dialogue. The system should be able to recognize when a caller is screaming in pain or fear, and immediately dispatch assistance. If the same emergency number service is adapted to support additional situations, such as traditional fire, police, and ambulance scenarios, it is important the system confirms the motivation of the call. If a caller engages the mixed-initiative approach without using any of the above keywords, the keyword ZOMBIE can be spoken at the beginning of the call to indicate the context of the situation. This application description assumes it is a dedicated zombie-only hotline. At the end of each transaction, callers who are not in time-sensitive situations are asked for specific details of their situation, including the number of zombies present at their location. This information is combined with the users location coordinates to updates maps on large public displays in central areas of town.

Below are some example dialogues of the system in action. Party System: Speech in/out What is your emergency? THERE ARE ZOMBIES EVERYWHERE! Are you able to safely flee the situation? NO. Explanation Call is connected and dialogue begins. Panicking novice user engages the systems initiative and answers the question. System prepares to suggest assistance by clarifying the situation. Caller responds to question.

Caller:

System: Caller:

System:

System acknowledges the callers situation and forwards his or her I am dispatching assistance location coordinates and a recording of to your location. the conversation to applicable personnel. Dialogue Example 1

Party System: Caller:

Speech in/out What is your emergency? I need a safe house! Nearest safe house is at Koskikatu 9. Would you like directions? Yes. Turn right at Rongankatu. Say "next" when you are ready for the next direction. I am there. Next.

Explanation Call is connected and dialogue begins. Caller ignores the systems initiative and states a known keyword System recognizes safe house keyword and provides nearest location. Takes initiative by offering directions. Caller responds to question. System uses the phones GPS functionality to locate the caller. First direction is provided with a prompt for the next. Caller follows direction and uses provided keyword to indicate he or she is ready for the next response.

System:

Caller:

System:

Caller:

Dialogue Example 2

Party System:

Speech in/out What is your emergency?

Explanation Call is connected and dialogue begins. Caller ignores the systems initiative and screams as a zombie bites into his or her arm.

Caller:

AAAAAAAAAAAHHHHHH!

System:

System acknowledges the callers situation and forwards his or her I am dispatching assistance location coordinates and a recording of to your location. What is the conversation to applicable your emergency? personnel. Question is repeated in case additional details can be provided. [Sounds of screams and zombie grunts in the background.] Caller is being mauled by zombies.

Caller:

System:

What is your emergency?

System repeats the question every five seconds while continuing to record the conversation for later manual analysis.

Dialogue Example 3 Party System: Caller: Speech in/out What is your emergency? I NEED HELP! Explanation Call is connected and dialogue begins. Caller responds to the applications inquiry.

System:

System recognizes the keyword help, acknowledges the callers inexperience I am dispatching assistance and forwards his or her location to your location. How many coordinates and a recording of the zombies are present? conversation to applicable personnel. A question is asked for further detail. Numbers along with general quantitative phrases such as a lot must be recognized by the system for later use. Dialogue Example 4

Caller:

A LOT!

Party System: Caller:

Speech in/out What is your emergency? Guns!

Explanation Call is connected and dialogue begins. Caller ignores the systems initiative and states a known keyword. System recognizes gun keyword and provides nearest location. Takes initiative by offering directions.

System:

Nearest gun cache is at Hmeenpuisto 28. Would you like directions?

Caller:

Yes.

Caller accepts offer for directions.

System:

[Directions dialogue initiated]

Dialogue Example 5 Party System: Speech in/out What is your emergency? Explanation Call is connected and dialogue begins. Caller ignores the systems initiative and states a known keyword. A number is provided to aid analysis and update public maps.

Caller:

Two zombies.

System:

Two zombies noted. Do you System confirms the input and initiates require assistance? further dialogue.

Caller:

No. [Caller hangs up]

Caller was not in danger and simply wanted to report a zombie sighting.

Dialogue Example 6

Party System: Caller:

Speech in/out What is your emergency? Where is the nearest safe house?

Explanation Call is connected and dialogue begins. Caller ignores the systems initiative and states a known keyword. System queries database and begins providing information but is interrupted by callers scream.

System:

Nearest safe house is at--

Caller:

AAaGaURAAAaaHHHHH

Caller barges-in when a zombie jumps him or her from behind.

System:

System recognizes scream and stops giving directions. Personnel are I am dispatching assistance notified and initial question is to your location. What is repeated every five seconds while your emergency? continuing to record the conversation for later manual analysis. Dialogue Example 7

Party System:

Speech in/out What is your emergency? I am surrounded by zombies! Are you able to safely flee the situation? Yes!

Explanation Call is connected and dialogue begins. Panicking novice user engages the systems initiative and answers the question. System prepares to suggest assistance by clarifying the situation. Caller responds to question. System acknowledges the callers ability to run and automatically recommends the location of the nearest safe house. Direction dialogue ensues.

Caller:

System: Caller:

System:

Nearest safe house is at Satamakatu 17. Would you like directions?

Dialogue Example 8

Party System:

Speech in/out What is your emergency? I want a gun so I can kill some zombies. Nearest gun cache is at Hmeenpuisto 28. Would you like directions?

Explanation Call is connected and dialogue begins. Caller doesnt have an emergency, but knows he can call to receive the location of gun caches. System recognizes gun keyword and provides nearest location. Takes initiative by offering directions.

Caller:

System:

Caller:

Nope. I know where that is.

Caller denies offer for directions and hangs up.

Dialogue Example 9 Party System: Caller: Speech in/out What is your emergency? HELPHELPHELPHELP! Explanation Call is connected and dialogue begins. Caller frantically yells for help

System:

System recognizes the keyword help, acknowledges the callers inexperience I am dispatching assistance and forwards his or her location to your location. How many coordinates and a recording of the zombies are present? conversation to applicable personnel. A question is asked for further detail. The system chooses the last number stated to record in the database. This allows callers to correct themselves if need be. Dialogue Example 10

Caller:

FIVE OR SIX.

3. Multimodal Interaction Techniques


Along with the aforementioned speech input/output interface, location coordinates are taken from callers mobile phones using GPS technology (or cell tower triangulation when GPS data is unavailable). The speech interface will operate at the same time but independently of the location management module, making the modalities concurrent. The callers coordinates are plotted onto a map with a picture of a zombie indicating that there was a zombie sighting at that given location. The time at which the caller contacted the emergency system is placed alongside the icon. If the caller provided a quantitative number of zombies located at his or her position at time of call, this number is also situated next to the location icon. Sightings of ten or more zombies in one place constitute a horde, and an icon of multiple zombies alongside each other is used in place of the standard zombie icon. If a qualitative figure was given, such as a lot, it is considered a horde. These zombie-sighting maps are shown on large LCD displays in safe houses and busy public areas. Zombies poor vision will allot them little interest in the displays in situations where noise would disturb and attract hordes.

4. Implementation Plan
A frame-based dialogue system designed using the VoiceXML markup language will manage dialogue flow. The open-endedness afforded by frame-based systems, opposed to the rigorous structure of finite state machines, will allow missing pieces of information to be provided to the application at the will of the caller. The system can be developed and deployed in a comprehensive voice environment such as Nuance. VoiceXMLs ability to rapidly prototype, test, deploy, and iterate applications is an important feature in constantly changing safety and rescue environments. The system will be installed and configured on several servers located in geographically different places to decrease the likelihood of server outages occurring simultaneously. Following the setup of traditional IVR systems, the architecture should involve speech recognition and synthesis modules along

with dialogue, presentation, database, and location management modules. The diagram below demonstrates the components of the server system architecture and their relationship to one another.

Figure 1: Server system architecture

5. Evaluation Plan
The ultimate evaluation plan will be witnessing whether the system is capable of preventing or postponing the demise of human civilization during a zombie apocalypse. Until the day comes when the undead rise from their graves, traditional speech interface evaluation methodologies can be used to quantify the quality of the system. Word error rate (WER) is an important metric to control in this safety critical application especially considering the word-spotting techniques mentioned in previous sections. A WER as low as possible, no higher than 5%, will ensure callers are able to receive the attention they need. Concept error rate should also be kept as low as possible because incorrectly recognized concepts will result in

the system providing irrelevant information. Perplexity should be monitored to verify that longer strings of words can be properly identified. Task completion rates and times are critical application level metrics that must be kept to a minimum because, similar to word and concept error rates, the larger the value of this metric, the more likely it is that lives will be lost. While money may not be of concern during a zombie apocalypse, significant cost- savings will be introduced by the implementation of this system through the reduction of live operator expenses. Significant human-computer studies should be performed before the system is released. Test subjects can be asked to call the number and perform a variety of simulated tasks. Speech recognition can be compared against other input modes, such as touchtone input or SMS messaging. Test participants abilities to quickly obtain information on fictitious safe houses and gun caches will emulate the real world experiences that someday will be had by citizens in the not-so-distant future.

You might also like