You are on page 1of 3

software review

Power Tools
Code, analyze, and illustrate data with the Provalis Research suite.
By Deborah Bobier

ontreal’s Provalis Research offers a suite of data and text analysis tools for those interested in both quantitative and qualitative data. It is particularly useful for those who wish to “put numbers” to qualitative data to help establish frequencies and relationships. The current offering consists of: Simstat, a Microsoft Windows-based, full-purpose statistical package; WordStat, a content analysis and text mining module; and the newer (released in 2004) QDA Miner, a qualitative data analysis package. Note that WordStat is not a stand-alone application; users must access it through Simstat or QDA Miner. Simstat. This is a statistical analysis package not unlike SPSS for Windows, and those familiar with the latter will find that navigation is easy and straightforward. It contains a tuto-


For those who have ever needed to find themes or relationships in verbatim responses, focus group transcripts, or other text sources, WordStat is very attractive indeed.
rial to help users become familiar with the possibilities of the software (and there are many), and the comprehensive manual provides clear instructions on how to perform actions and analyses—not just a description of what is available. This is often sorely lacking in software manuals, forcing users to find other sources of information to work with new software, and is added value here. Users can enter data directly into Simstat or import them from a variety of sources, including SPSS, Excel, and comma- or tab-delimited ASCII data. If necessary, they can easily merge data files—a handy feature if multiple people are entering data. Simstat provides the full range of statistical offerings, from more basic cross-tabs and descriptive statistics to factor-, bootstrap-, and time-series analysis. Userfriendly descriptions of each test and the available options are presented (users must bring their own knowledge of interpretation, and the software cannot guard against inappropriate uses). Users can create and edit a full range of charts (e.g., bar, pie, Pareto, histogram, box-and-whisker, scatter) following analysis, and transfer them to the clipboard or export them to other applications for use elsewhere. When running analyses, the notebook displays the statistical output for all analyses performed during a session. For those who have lost their way searching for a particular data run,

through page after page of logged results in a single file, one of the nicest features of the notebook is that users can add pages to it to organize output and make navigation faster and easier. Labeled tabs on pages make searching even simpler. Blank pages can be added to the notebook, allowing users to include the analysis plan, make comments, and outline next steps if desired. Simstat also enables users to save analysis specifications with a script feature. The script automatically keeps track of what was done in a session, and users can save this for later. This is extremely helpful when someone frequently uses an analysis, or when the analysis is complex and would be timeconsuming to recreate with the Windows point-and-click method. Descriptions of the commands/syntax conventions and examples of scripts help less-familiar users. WordStat. This is an add-on module for studying textual information such as responses to open-ended questions and interviews, articles, speeches, and other communications. Its power is that it allows for automatic categorization of text with a dictionary approach (after some user setup). For those who have ever needed to find themes or relationships in verbatim responses, focus group transcripts, or other text sources, WordStat is very attractive indeed. Although getting started might take a while, both in formatting the text to be analyzed and setting up the dictionary, the results make it worthwhile. To get full value from WordStat’s capabilities, users must invest time in customizing and maintaining the appropriate dictionaries—for categorization and exclusion. They must populate the inclusion dictionary with the categories under consideration, all of the words and technical terms to be included in the category, and so on. The advantage is that this is a subject-specific word list, and users can save it, reuse it,

Exhibit 1 Heat plot

marketing research 41

Exhibit 2 Code mark example

and modify it. They can also append words and categories into other dictionaries, which makes establishing subsequent dictionaries more efficient. WordStat offers tools to help users compile dictionaries. For example, they can look at simple word frequencies of all the words contained in the data, automatically generated by WordStat. They can also look at the words not contained in the inclusion dictionary, to make sure important words haven’t been overlooked. The phrase finder looks at idioms, phrases, and expressions throughout the text, which otherwise might be missed. WordStat also will suggest synonyms for the existing words in users’ categories, which can enhance the dictionary. The keyword-in-context page allows users to see—in one table—all occurrences of a word or category in the original text. Users then can sort these instances to look for similarities or differences in word usage, as well as inconsistencies in word meanings. Any discrepancies in usage can then be addressed by refining the dictionary. Users also can create rules to specify under which conditions a word or category should be coded. If specified properly, this can reduce ambiguity when words have multiple meanings, because the different meanings can be clearly defined. An exclusion dictionary specifies which words should not be included in the analysis. It comes already populated, and can be edited by users. Once the dictionaries have been set up, users must prepare their data files and documents for import into the program. (All text must be in raw text, or ASCII, format.) WordStat searches for specific spellings; therefore users should check text before beginning analysis, to ensure that words aren’t missed (Provalis supports English, French, Spanish, and several other languages). Other formatting issues will also need to be addressed. WordStat generally treats hyphenated words as separate words, and uses brackets and braces as specific markers, so users need to remove these symbols or replace them with other symbols. Lemmatization is also used: treating word stems, singular and plural, and tense forms as one word. Users can adjust these rules for the dictionary. Now the fun begins. Users can cross-tab categories and words with other categorical variables of interest to uncover

patterns in the data, and it is easy to switch between the crosstabs and the keywords in context page to get additional insight. Anyone interested in some of the more powerful ways to depict relationships between categories and categorical variables will enjoy the heat plots, which show relative correlation between words or categories using a color spectrum. WordStat offers multiple spectra to suit various tastes. Users can easily create dendograms (tree graphs) to display hierarchical clusterings of categories. Correspondence maps can be used to illustrate the likelihood that items—categories and variables—will appear together. This can be shown in two-dimensional and three-dimensional maps, and gives a snapshot of similarities between items. If this isn’t enough for the average user, then multivariate statistics also are available. QDA Miner. This is designed for use with already coded data, or for coding text data. QDA Miner and WordStat possess many similarities, and the two can work together for enhanced capability. For instance, the word- and phrase-finder functions of WordStat can identify items that might be included in the code book, and QDA Miner also offers heat maps, dendograms, and correspondence maps. The manual provides easy instructions for running these analyses, and detailed explanations of the output. QDA Miner presents a straightforward way to manually code text. Users select the text to be coded and click the desired code, and a code mark appears in the right margin. Code marks appear as the “name” of the code and a colored bracket, to show the physical limits of the coded segment. Users can code sections within segments multiple times, with additional code marks appearing in the right margin, and easily add or remove code marks as needed. A nice feature is that they can add comments to coded text by clicking the code mark; these appear as a small yellow square in the middle of the code bracket. Some of the package’s other highly useful functions are the multiuser possibilities, and the ability to merge files or projects when coding is performed on different computers—or in different files by different team members. Creating project backups is simple using an archiving procedure. Users should do this regularly, so they can recover lost variables or go back to earlier versions. Separately or together, Provalis’ statistical and text analysis software packages provide basic and higher-level tools for coding, analyzing, and illustrating data. Users looking for only a basic statistics package might find Simstat somewhat overwhelming; however, it is easy to use, and help is available with the tutorial and user’s manual. WordStat and QDA Miner will allow researchers to fully investigate textual information and although startup may be time consuming, the end result is well worth it. Those interested can download demonstrations from the company’s Web site ( They can also order products there. Current prices are $355 for Simstat, $595 for QDA Miner, and $1,150 for WordStat. G

Deborah Bobier is an account executive at Millward Brown in Toronto. She may be reached at

Spring 2006