SOFTWARE REVIEW
JACK YURKIEWICZ, Feature Editor, Lubin School of Business, Pace University

SPSS for Windows 95

by Jack Yurkiewicz, Feature Editor

SPSS released the first statistical software specifically for Windows 95 (or for Windows NT version 3.51 or higher). Called SPSS Version 7, the program makes full use of the additional features offered in Windows 95 (long file names, extensive help with the right-mouse-button "Whats This" explanations, etc.) over the older version 6.1, which is still available for users of Windows 3.1. I have used Version 7 for several months, both in teaching and for research. This review will summarize my impressions of the product.

System Requirements

I ran SPSS on two computers. One is a 90 MHz Pentium desktop with 16 MB of RAM, the other on a notebook with a 486 33MHz chip and 8 MB of RAM. The latter is the minimum hardware that SPSS recommends and I found the performance to be acceptable. The only bottleneck was graphics generation, and even here the speed was not objectionable; I never waited more than 15 seconds for the most elaborate plot.

Features

The program comes a la carte. The Base program has the standard file and data management features (data importation, data transformations, etc.), along with less- sophisticated statistical procedures. These include descriptive statistics such as cross-tab tables, linear multiple regression, correlation analysis, and nonparametric analysis. The graphics capabilities in the Base program include the bar, line, area, pie, probability, high-low, and error bar graphs, box plots, histograms, time plots, and two and three-dimensional scatterplots with the ability to identify specific data observations. However, the Base program also includes quality control charts, such as Pareto, X-bar, range, sigma, individual, moving range, p, np, c, and u charts. A notable feature of all the graphs is that the user has great flexibility in customizing their look.

If you need additional statistical capabilities, SPSS offers separate modules. For example, the Professional Statistics module does cluster, factor, and discriminant analysis, weighted and two-stage least squares procedures. The Advanced Statistics program does general linear modeling, MANOVA, Kaplan-Meier estimation, Cox regression, hiloglinear, logistics, and nonlinear regression, and probit analysis. The Trends package will do time series and forecasting procedures, including the general smoothing methods (Brown, Holt, Winters), Box-Jenkin's ARIMA modeling, and X11 decomposition. The program will find the optimal parameters for these models using a grid-search technique. I used all these modules, but spent the most time with Trends. This program outperformed, in terms of accuracy, all other forecasting programs that I ever used. Not tested, but available for survey researchers, the Categories module does conjoint analysis and correspondence analysis, while the Tables module allows more automated and sophisticated tabular reports than the Base program. The MapInfo module creates thematic maps for data visualization, and you can choose the geographic region from country down to street level to create your own boundary files.

Using the Program

Getting data into SPSS is easy. You can enter data in a spreadsheet-like data editor or import data from another format. I tried importing data from various formats and did not encounter any problems. In previous versions of SPSS, depending upon which options you chose, the program gave you page after page of text output. Now the program splits the screen vertically into two adjustable-width columns. The left column, called the Output Navigator, shows you a sort of bulleted chart of the results, by heading, and the right column gives the actual output. By clicking on a heading in the left column, SPSS then gets to that part of the output in the right column. As you do more analysis, the output and the headings are added to a "master report." Text and graphs are integrated in this report. I found the feature useful, especially for many "runs" of the data. However, if you want to see the specific output, you must do considerable left-right scrolling in the right column to see all the results. That became cumbersome after a while, and I turned the feature off. Figure 1 shows the Output Navigator and a plot for a regression analysis I ran.

Another far more significant enhancement is the "look" of the output. Previously, the text output appearance was Spartan, as if someone had typed it on an electric typewriter. Now the output is far more elegant and pleasing. Users can get tables to look as if they were typeset, and can further customize their output by modifying fonts, line styles, headings, and color. You can create tables by choosing from a library of presentation-ready formats called TableLooks. You can preview how your table will look as you scan the various choices from this library, and the table is automatically reformatted once you make your selection. Figure 2 shows an example of this.

I got my output easily into this review. I simply clicked on a table or graph and dragged it directly into my word processor, with its formatting intact. Thanks to SPSS' implementation of OLE 2.0 in-place editing, it is easy to make further formatting changes to the SPSS table in my word processor. By activating the table with a double click, the SPSS menu appears and I can edit the table. To get back into SPSS, I just select the Output 1-SPSS tab from the Windows 95 task bar. The procedure worked very well, and I doubt if I will ever be satisfied again with printing output directly from a statistics program.

Perhaps the most significant enhancement to SPSS is a feature called Pivot Tables. This makes it easier to explore my crosstab output from different perspectives. For example, I wanted to run a crosstab for the reliability of 1996 cars by the "origin" (American, European, or Japanese) of the make (data from Consumer's Union). But I also wanted to factor in the "crash worthiness" of the vehicle, measuring the extent of injuries to the driver in a front end collision, as tested by the National Highway Traffic Safety Administration.

By choosing the pivot table feature, I can easily get a breakdown (via different crosstab tables) of this data as a function of the extent of the injuries to the driver. Figure 3 shows how this Pivot Table feature works, while Figures 4 and 5, respectively, show the corresponding crosstab results as a function of little and moderate injuries to the driver. It is very easy to get different "views" or layers (as SPSS of these tables). By clicking on an icon, a different layer or table appears and you can scroll through the layers one by one, as if you are turning pages in a book. Analyzing such categorical data is thus intuitive.

SPSS' help system has been greatly improved with this version, and it now is the best such system I have seen. The standard help topics are available, along with an extensive tutorial, online glossary, and step-by-step instructions. Windows 95 users know that they can frequently point to an area on the screen and click on the right mouse button to get a "What's This?" explanation. This feature works very well in SPSS. For example, I wanted information about Pearson's chi-square statistic. I clicked the right mouse button on the word "Pearson" in the output and the program gave a definition of this statistic. Sometimes the explanations or descriptions were a bit too generic. For example, for the standard error of the estimate in regression analysis, the program gave

A measure of how much the value of a test statistic varies from sample to sample. It is the standard deviation of the sampling distribution for a statistic. For example, the standard error of the mean is the standard deviation of the sample means.

Other times the explanations are concise yet to-the-point. If you ask about the Durbin-Watson statistic, SPSS gives

A test for serially correlated (or autocorrelated) residuals. One of the assumptions of regression analysis is that the residuals for consecutive observations are uncorrelated. If this is true, the expected value of the Durbin-Watson statistic is 2. Values less than 2 indicate positive autocorrelation, a common problem in time-series data. Values greater than 2 indicate negative autocorrelation.

Some things have not changed. While the histogram feature will give you a pleasing plot for continuous or noncategorical data, with many options (number of classes, start, finish, etc.), but if you want a frequency distribution of the data, SPSS gives an unwieldy table showing each observation as a distinct class interval. You must manually recode the data as a new variable and then you will get the desired frequency distribution with just several class intervals. The process is tedious for such a simple request. When SPSS was a DOS program, it permitted you to do this via commands, and these now-hidden commands are still the cornerstone of this current version. In the interests of "ease-of-use," these commands are "hidden."

SPSS graphics capabilities are much improved in this version. The old "chart carousel" is gone as the Output Navigator makes that feature obsolete. More important, graphics output is integrated with the statistical features. For example, previously you might run a regression analysis, save the residuals in a file, and then go the graphics menu to get a residual plot. Now that plot is an option in the regression module. The graphs can be customized extensively. If you like the look of a particular graph, you can automatically apply those characteristics to the other charts that you generate.

The main addition to the statistical capabilities is a new general linear model (GLM) module. With this, you can perform post-hoc tests that show significant differences between groups and get mixed models that allow analysis of both fixed and random effects. The documentation says that users can do four types of sums-of-squares procedures that comply with the regulations of the Food and Drug Administration. Other, less prominent, enhancements include more probability plots and the inclusion of post-hoc tests in the ONEWAY ANOVA procedure.

Documentation

SPSS has long been noted for the excellent documentation of its products and that still applies now. The current trend in the software industry is to include better one-line documentation with the programs and "skimp" on the printed material. SPSS has bucked that trend, and the manuals are as extensive, complete, and as well-written as ever.

Pricing

The Base Program lists for $695 and additional modules are $395. Thus, the complete program is expensive. Academic discounts are available.

If you have a CD player on your computer, I strongly urge you to get the CD version of SPSS. This contains the Base Program and the complete Base System Syntax Reference Guide (in case you want those commands). The CD also has more than 200 MB of data. This includes the complete results from the 1990 census short form, which has over 1000 variables for the state, county, city, and census tract level for the entire United States. There is another data set, the Household Trend and Family Trend, which has combined data from the 1980 and 1990 census. A third data set, the consumer CLOUT database, provides projected retail sales expenditures for a sampling of specific products and retail store types. All the data are already in SPSS format. The price of the CD is the same as that of the Base Program alone if purchased on floppy disks.

Summary

I found SPSS to be very reliable. It never crashed and I found no bugs. I also could not find any errors in the statistical output. If you have the resources (computer and money) I recommend it. I found it to be the most comprehensive statistical program I have seen. It may be the most compelling reason to upgrade to Windows 95 or Windows NT if you have not already done so.
SPSS Inc.
444 North Michigan Avenue
Chicago, IL 60611-3962
312-329-2400
fax: 312-329-3668


For copies of figures mentioned in this article, contact the managing editor at hjacobs@gsu.edu.