You are on page 1of 18

Requirements Document: LuteKinduct

Sachi A. Williamson Bachelor of Science, Computer Science Laurie Murphy, Faculty Mentor Pacific Lutheran University CSCE 499 - Fall 2012



Table of Contents

1 Table of Contents 2 Introduction 3 Project Description 4 Project Objectives 4.1 Functional Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Learning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Development Resources 6 Requirements 6.1 Performance Requirements . . . . . . . . . . . . . . . . . . . . . . . 6.2 Design Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 User Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7 Portability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8 Maintainability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.9 External Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.10 Use Case Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.10.1 Use Case #1: Load a MIDI file. . . . . . . . . . . . . . . . . 6.10.2 Use Case #2: Begin conducting a musical piece. . . . . . . . 6.10.3 Use Case #3: Change the speed of the music being played. . 6.10.4 Use Case #4: Change the volume of the music being played. 6.10.5 Use Case #5: Conduct a musical piece. . . . . . . . . . . . . 6.10.6 Use Case Model #6: Stop the musical piece at its finish. . . 7 Development Documentation 8 Task Breakdown 9 Preliminary Timetable 10 Budget 11 Annotated Bibliography 12 Glossary 2 3 3 4 4 4 5 6 6 6 6 7 7 7 7 7 8 9 9 9 9 9 10 11 11 12 14 15 16 18

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .




Music and technology have most often been considered as disparate fields. Despite some technological improvements (such as virtual accompaniment and distribution of digital music online) and realization of its considerable potential, technology being used with music education has remained in rudimentary development. This project is designed to enhance one aspect of music education, conducting, by utilizing the Kinect by Microsoft to allow for gesture-based virtual conducting. Any user will be able to import a MIDI file into the application and conduct the musical piece based on technical conducting factors such as tempo and volume changes by using a time-stretching algorithm and respective libraries that are compatible with C# such as SoundEffect. This concept of virtual conducting is not completely ground-breaking since the London Symphony Orchestra created a similar application titled Virtual Maestro,1 and the Haus der Musik in Vienna allows guests to “conduct” the Vienna Philharmonic.2 In a programming event in 2011 at the headquarters of Microsoft, a team of professionals implemented a project called “Kinductor”3 that is most similar to this project. The primary difference of these previous projects to the current project is that this capstone is intended to be applied in educational environments, particularly the upper-level conducting courses at Pacific Lutheran University.


Project Description

This project was designed to assist conducting education courses. In a typical introductory conducting course, students must conduct other students in the course, which is problematic because of odd instrumentation and small class sizes as well as limited feedback during the performance, a rather limited method of teaching in contrast to other fields using technology. A gesture-based application using a camera as a sensor is suitable for assisting traditional conducting courses that will allow the student to experience an immersive ensemble-like performance and more immediate feedback. The determining factors of the success of the project are as follows: the application will allow the user to input a MIDI file through an intuitive interface, then play through the piece while adjusting the volume and tempo changes determined by the magnitude and speed of the user’s gestures. Ideally, if those goals can be accomplished, the project will be nearly complete enough to be implemented in conducting courses; however, the long-term (thus only potentially feasible in a semester) plans would be to allow the user to follow a score on the screen, enable the Kinect to video-record the performances, and allow the user to control musical nuances such as fermati. 2 3



Project Objectives
Functional Objectives

• The creation of a gesture-based application using the Kinect SDK that allows the import of MIDI files and playback of the music file with specific constraints such as volume and tempo changes using time-stretching algorithms and the SoundEffect library in C#. • An application that both utilizes the Kinect SDK and related multimedia libraries and that also displays a user-friendly menu interface


Learning Objectives

• To gain knowledge of C#, the Kinect SDK and open-source libraries utilizing various features (e.g., SoundEffect) • To learn how to more efficiently and effectively research using online materials • To improve problem-solving (debugging) skills and software engineering methods and processes • To gain more in-depth experience in conducting techniques



Development Resources

Since the Kinect SDK was released by Microsoft in 2011, it has been given appropriate improvements and enhancements for better accessibility to the public for open-source projects. As of October 2012, Microsoft has released version 1.6 of the SDK that offers developer tools, more exposure of the internal workings of the hardware itself, and support with Visual Studio 2012. Respectively, more print and online resources related to development with the Kinect SDK have been released. Surprisingly, there are several online books available for the Kinect SDK. Although the direct documentation is useful for specific implementation, other books will assist in the analysis of coordinates (Ashley: 2012, Catuhe: 2012). Meet the Kinect: an Introduction to Programming Natural User Interfaces seems slightly outdated, but other resources may be more updated with their recent publications (Miles: 2012, Ashley: 2012, Catuhe: 2012). Programming C# is very important for the fundamentals of programming in C# with function calls, classes, and object-oriented programming. Human Interface Guidelines provides useful suggestions on design considerations with the Kinect sensor. Lastly, there are multiple libraries that support MIDI files and C# while implementing time-stretching algorithms that may be useful further in the semester (Heath: 2012, Parvianinen: 2012, Naveh: 2012).



Performance Requirements

Due to the continuous improvements being made on the Kinect SDK, it is challenging to predict precisely the runtime capabilities of the program. However, the transitions through musical pieces and the Kinect’s ability to recognize specific gestures should be as reliable and precise as possible. The menu should be simple and complement gesture-based interaction. The process for importing MIDI files should also be efficient and unobtrusive to the overall application. Finally, the time-stretching algorithm should minimize memory consumption.


Design Constraints

This project’s success is contingent on the breadth of the capabilities of the Kinect’s SDK, and more importantly, on the hardware’s technical capabilities. Memory is much less of an issue since the SDK is constantly improving and is designed to run on various machines, although the application needs to run efficiently enough to not cause lag on a PC computer. The project is only available on the Windows operating system because Microsoft does not currently support development with the Kinect SDK in Mac OS or Linux environments. The development time is limited to the academic year, preferably by April. The budget for the project is feasible due primarily to the previous ownership of the Kinect hardware when purchased for personal use in 2011. The optimal environment for both lighting and visual recognition of the hardware will also need to be taken into consideration. Intuitive yet distinct gestures are needed for proper functionality of gesture -recognition. The design constraints are mainly dependent on the capacities of the frameworks for the Kinect SDK, as well as the hardware of the Kinect itself.


User Characteristics

The typical target group for this project is either those with musical backgrounds (especially conducting students) and others who wish to learn more about musical conducting techniques. A rudimentary knowledge of rhythm is preferred, although there may be an additional walkthrough for beginners if time is permitted. They will expect a robust, easy-to-navigate user interface that performs efficiently. These expectations will provide more of a challenge in the design of the application because of the complications of simplifying complex projects. Although impossible to be completely free from glitches, the project will attempt to minimize current and future irregularities of the hardware and software.




It will be assumed that the user has access to a Windows computer and a reasonable performance space to run the application. It will be determined as to whether Visual Studio also needs to be installed as well as the Kinect SDK. Some other important assumptions that will need to be addressed in the design considerations are as follows: the number of people present in the hardware’s line-of-sight, various lighting conditions, extra objects or clothing in the hardware’s line-of-sight, the sensor placement itself, and a reasonable display that is comfortable to the user.



There are no anticipated security concerns for this type of application. There do not seem to be reports of exploitation in the Kinect’s SDK.



This aspect may be one of the most significant for the project, especially if put into practice following its completion. Therefore, the application must be able to operate normally even after software/hardware updates, maintain precise gesture-recognition consistently, and generally be as robust as possible.



Due to the limitations of Microsoft’s cross-platform functionality for the Kinect SDK, the application will only be available on Windows operating systems. Ideally, the application should be able to run on any operating system, but Microsoft has built the SDK and hardware to be unworkable on Mac OS and Linux operating systems. It should also be noted that there may be unintentional consequences with compatibility with Windows 8, but there likely will be a software update for normal operation.



This is one of the most significant, if not the most important, requirement for the project. This application must be able to function efficiently and robustly after the academic year if it will be utilized in the classroom. Updates should be minimal for the application and only for additional features for the project. The project should continue functioning properly as a stand-alone application if possible. However, one of the greatest challenges will be how the project performs if there are software updates or bugs to the Kinect SDK, as well as communication errors with the Kinect hardware itself. These will be taken into consideration with designing the application with vigorous and constant testing. 7


External Interface

The following images are tentative depictions of how the user interface for the application should look. Ideally, the application would show the reflection of the user in the color view panel in the middle of the display, the skeletal view of the user in the lower corner of the screen, and the time left in the piece on the opposite lower corner. The image below was taken from example code of the Kinect SDK to demonstrate the various viewpoints that the SDK can display to the user:

Figure 1: The Skeletal Viewer as part of the Kinect SDK’s example code The example code would be used to assist in displaying the skeletal view and the color view to the user in an unobtrusive way while being particularly useful in debugging certain situations and determining if the environment is optimal with lighting and other factors. The main menu will be crafted in such a way that it provides only necessary options and buttons to the user to minimize confusion and complication. Figure 2 below is a rudimentary example of a basic user interface that the user may experience. The progress bar on the left will indicate whether there are any connection errors with the Kinect and will display text that shows the sensor is ready. The top-middle text area will show available MIDI files located in the appropriate directory with the project, and there will also be a text area for the user to type in the new MIDI and browse (not shown) through files. Lastly, checkboxes will be included to give the user the option of volume adjustment capabilities and tempo adjustment capabilities (as well as sensitivity) turned on or off that will be used during playback. Once the “Let’s Kinduct” button is clicked, the window will close and the viewer similar to Figure 1 will be displayed.


Figure 2: A tentative Graphical User Interface for the project


Use Case Models

Figure 3: The use-case diagram for this project Actors: User, Kinect Hardware, System and Software 6.10.1 Use Case #1: Load a MIDI file.

The user places the MIDI file with the project’s C# files into the “MIDIs” folder appropriately. System requests for the name of the file in a prompt. User types in the name and the system finds the file in the specied directory. If a menu layout can be achieved, the user will select the file from a list. The system then loads the MIDI file and plays through the musical piece.



Use Case #2: Begin conducting a musical piece.

The user wants to start performing a musical piece. Kinect hardware recognizes that the joints and coordinates of both arms are in the ready position (out in front of the body at abdomen level) and informs the software. User will go through beat pattern and software will map the coordinates. Software loads the MIDI file to be played and waits for the coordinates of both arm joints to be raised above the abdomen level then plays the MIDI file back accordingly. 6.10.3 Use Case #3: Change the speed of the music being played.

The user wants to increase the tempo of the piece by moving arms in a faster motion. Kinect hardware recognizes the positions of the joints appear at a quicker rate and informs the software of the sudden change. Software recognizes the faster arrival of positions of joints from the gestures and uses a time-stretching algorithm accordingly. 6.10.4 Use Case #4: Change the volume of the music being played.

The user wants to increase the volume of the piece by moving the left arm in a perfectly upward motion (raising one hand up and down). Kinect hardware recognizes the positions of that particular side of the body is in a vertical motion relative to the positions of the coordinates of the joints and informs the software of the sudden change. Software recognizes the vertical changes of the positions of the arm and adjusts the volume of the MIDI playback accordingly. 6.10.5 Use Case #5: Conduct a musical piece.

The user would like to conduct a musical piece after loading a MIDI file. The user will begin playback by raising both arms above the abdomen level, then moving both arms outside of the body in a U-shaped motion or similar. Kinect hardware will recognize the gestures by mapping coordinates of the positions of the wrist joints and comparing coordinates to previous positions of the wrist; it will send the information of the coordinates to the system. System will translate the coordinates given and adjust the volume and tempo accordingly (see Use Cases #3 and 4).



Use Case Model #6: Stop the musical piece at its finish.

The user would like to end the musical piece when the MIDI file is finished playing. System will announce to the user via text that the piece is nearly finished. The user will finish the piece by making a counter-clockwise circular motion and hold the wrists steady for approximately two or three seconds. Kinect hardware searches for either circular motion or the steady positions of the wrist joints over an elapsed time period and will inform the software. Software receives the stopping information from the hardware and immediately fades playback down.


Development Documentation

The project’s blog can be found at the address: The source code (with GitHub) can be found at the address: Consequently, any further questions or comments can be sent via email to:



Task Breakdown
I. Research A. C# Programming Language i. C# MIDI Toolkit? B. Functionality of the Kinect SDK and Toolkit i. Coordinates and positions of joints ii. Motions/Gestures of joints (skeletal tracking) iii. Time-based comparisons or gestures relative to time iv. (optional) Video recording and playback C. Time-Stretching Algorithms D. Functionality with MIDI files E. Main Menu (Graphical User Interface) II. Prototyping A. Fundamentals of C# i. Declaring objects and relevant information for Kinect programming ii. Supported libraries with multimedia, Kinect, etc. B. Kinect SDK i. Run example code and learn functions for coordinates and joint position ii. Connect MIDI files with SDK C. Functionality of MIDI files i. C# MIDI Toolkit or other libraries ii. Playback and manipulation

III. Development A. Coordinate Systems with joint positions (Kinect SDK) B. Gestures with joint positions C. Functionality of MIDI files D. Time-stretching algorithms E. Relevant libraries for multimedia and Kinect F. Menu interface


G. Video recording and playback using Kinect H. Exporting project as stand-alone application(.exe) IV. Documents A. Project Proposal B. Requirements Documents C. Design Document D. Final Report V. Presentation A. Clean-up of code B. Rigorous testing C. Presentation at Academic Festival



Preliminary Timetable




The table below lists the tentative budget for the overall project. There are no further costs expected with the development of the application. Materials/Resources Kinect for Windows Sensor (Kinect for Xbox 360 sensor already purchased) AC Adapter for Kinect for Xbox 360 Sensor (already purchased) Microsoft Kinect SDK (open-source) Beginning Kinect Programming with the Microsoft Kinect SDK (library) Programming with the Kinect for Windows Software Development Kit (library) Meet the Kinect: an Introduction to Programming Natural User Interfaces (library) Programming C# (library) Start Here! Learn the Kinect API (library) Human Interface Guidelines (free from Microsoft Developer’s Website) Price $249.99 $10.90 N/A N/A N/A N/A N/A N/A N/A



Annotated Bibliography

J. Ashley, J. Webb, Beginning Kinect Programming with the Microsoft Kinect SDK, 1st Edition, New York: Apress, 2012. Resource that will help analyze coordinates (Data Retrieval in Chapter 2), as well as image processing and skeleton tracking fundamentals for the SDK. Helpful with the theories behind gestures and mathematics for the Kinect. D. Catuhe, Programming with the Kinect for Windows Software Development Kit, Redmond: Microsoft Press, 2012. Resource that will help with gestures and positions (Chapter III: Algorithmic gestures and positions, using gestures and postures in an application) S. Kean, J. Hall, P. Perry, Meet the Kinect: an Introduction to Programming Natural User Interfaces, New York: Apress, 2011. Resource that will help primarily behind the theories behind gesture-based interaction with the Kinect (since the information on the SDK is slightly outdated) and will help in the understanding of mathematics in depth-processing. J. Liberty, Programming C#, Sebastopol: O’Reilly and Associates, Inc., 2001. Resource for basic knowledge of C#, especially for classes and functions, as well as supported libraries. Very important for foundation of programming the Kinect SDK. Mark Heath. (2012 October 12). “NAudio.” Available: Possible library/resource to help design the time-stretching algorithm for MIDI control. R. Miles, Start Here! Learn the Kinect API, Sebastopol: Microsoft Press, 2012. Resource for the beginning implementation using the Kinect SDK, such as image storage in computers, detecting movement, body tracking, and applications with the sensor. Microsoft Corporation. (2012 October 10). Kinect for Windows SDK Documentation. [Online]. Available: Resource for direct documentation for the Kinect SDK, organized by the creators of the SDK and hardware. Contains vast amount of information on specific topics such as gesture-coordinate tracking and supported libraries. Microsoft Corporation. (2012 October 12). Human Interface Guidelines. [Online]. Available: Resource primarily for the design of application with considerations into the human-interaction environment of the Kinect sensor. Provides information on design considerations such as crowds, lighting, skeletal tracking, and specifications of the hardware itself.


Microsoft Corporation. (2012 October 10). “Adjusting Pitch and Volume.” Available: Resource that describes how to adjust the pitch of a specific library called ”SoundEffect” in XNA programming. TBD if the resource will be directly applicable to the project. Olli Parvianinen. (2012 October 10). “Sound touch Audio Processing Library: SoundStretch Audio Processing Utility.” Available: Resource to assist in supporting the C# libraries for MIDI adjustment and time-stretching algorithms. Will be useful as guide for implementation for algorithms. Yuval Naveh. (2012 October 11). “PracticeSharp.” Available: Resource as loose model for design considerations and implementation of time-stretching algorithms. More useful for libraries that the project is using that may be useful for this project.




tempo - : the rate of speed of a musical piece or passage indicated by one of a series of directions (as largo, presto, or allegro) and often by an exact metronome marking; rate of motion or activity : pace (Merriam-Webster Dictionary) fermata - : a prolongation at the discretion of the performer of a musical note, chord, or rest beyond its given time value (Merriam-Webster Dictionary) MIDI (musical instrument digital interface) - : an electronic standard used for the transmission of digitally encoded music (Merriam-Webster Dictionary); easily accessible, feasible in use with C#. time signature - : a sign used in music to indicate meter and usually written as a fraction with the bottom number indicating the kind of note used as a unit of time and the top number indicating the number of units in each measure (Merriam-Webster Dictionary)

Figure 4: How not to design the Kinect project