SOFTWARE REVIEWJACK YURKIEWICZ, Feature Editor,Pace University NCSS Jr. -- Bargain of the Year?
by Jack Yurkiewicz, Lubin School of Business, If you are teaching a course in statistics, you probably know that the current emphasis and thrust of such a course involves data analysis. "Hand calculations" (meaning solving problems using formulas and a hand-held calculator) are no long de rigueur. Students are encouraged to find data sets (or instructors give these to them) and then use the computer to get descriptive statistics, or make cross-tabulation tables (an excellent way to introduce probability ideas), run multiple regressions, etc. As instructors in such courses, we must walk a fine tightrope, making sure that students understand the concepts, recognize when certain procedures are valid or invalid, and verify the assumptions of the model; and yet, we deemphasize the details of the underlying formulas. (I know that many of you disagree with this approachþplease send me e-mail about your ideas, objections, etc.). For example, many instructors believe that students don't need to know the formula that gives the slope and intercept of a simple regression model; computer output readily gives them that. Yet they agree that students should be aware of the notions of least squares, model assumptions, residuals and their analysis, "goodness of fit," etc. For them, a good statistics program is a key requirement for the course. Many instructors use software that comes bundled with various texts, or still use that fine veteran, MYSTAT, or perhaps use a spreadsheet such as Excel, which has a decent set of built-in statistical procedures. Last February, Jerry Hintze, the author and publisher of Number Cruncher Statistical Systems (NCSS), an excellent "professional" statistical product, released NCSS Jr., a subset of that program. It has most of the capabilities needed to teach a traditional statistics course, and the price is right. It is freeþyou can download it from the NCSS home page on the World Wide Web. This review will focus on that product, and peripherally talk about the commercial version. Capabilities NCSS Jr. can solve sufficiently large data sets. Generally, you can have as many as 256 variables and up to 16,000 observations, but with a "zipped" database, NCSS can handle up to 5,000 variables and 32,000 observations. It is a Windows product, and works well with Windows 95. I was not able to test it on Windows NT 3.51, but the author says there should be no problems using that platform. All of the documentation is found in the Help section, including a tutorial and examples. Thus, the program needs about 6MB of hard disk space for storage. I tried running NCSS Jr. on a minimally equipped 33MHz 486SX computer with 4 MB of RAM. It ran adequately, but if you have more RAM and a faster processor, you will see a dramatic improvement in speed. The statistical capabilities include descriptive statistics (including frequency tables and cross-tab tables), one and two-sample t-tests and confidence intervals for the mean, tests for one or two proportions, multiple regression analysis (including prediction intervals for new observations, something that Excel will not give), one-way ANOVA including post-hoc analysis (Bonferroni, Fisher, Scheffe, Tukey-Kramer, and others), and time series models (including Brown's simple exponential smoothing, Holt's two-parameter method for trend, and Winters' procedure for seasonal data). For the three time series techniques, NCSS finds the optimal smoothing constants using a grid-search procedure. Unfortunately, if you need quality control capabilities, such as X-bar and range charts, NCSS Jr. does not do them. These are included in the commercial version (more about the differences later). The program can do a wide variety of data transformations. It also has a handy "filter" feature which easily permits an analysis of a subset of the data. For example, suppose one variable in your data set has the price of products made in different countries and you want to analyze some subset of your data, such as the price of this product made in some specific country. Most other programs require that you make a transformation in which you extract those observations and put them in a new column or variable. With NCSS, you invoke the filter, describe it (such as Origin = 1, for, say, American products) and run the usual analyses. Only those appropriate observations are considered. The graphics capabilities include Box plots, histograms, scatter diagrams, and something new, invented by the author, called "violin" plots. What is a violin plot? The help system in NCSS says, The Violin Plot is made by combining a form of box plot with two vertical density traces (frequency distributions). One density trace extends to the left while the other extends to the right. We put two density traces on the plot to add symmetry, which makes it easier to compare batches. The violin plot highlights the peaks and valleys of a variable's distribution. We changed the box plot slightly by showing the median as a circle, so that quick comparisons of the medians can be made. If you compare this plot with the box plot and frequency distribution of the same data, you will notice that although the box plot is useful in a lot of situations, it does not represent data that are clustered (multimodal). On the other hand, although the frequency distribution shows the distribution of the data, it is hard to see the mean and spread. The obvious answer to these shortcomings is to combine the two plots. Notice how easily you can compare the medians, the box lengths (the spread), and the distributional patterns in the data. Figure 1 shows a violin plot of a variable in one of my sample data sets, the prices of 116 automobiles made in 1989. Data Entry You can enter the data manually onto a spreadsheet. The program sadly will not import data. If you want to do that, you must copy your data to the clipboard and then paste it into the spreadsheet. Because of this, only datasets much smaller than the program's maximum capability can be "imported." The commercial version of NCSS allows you to import data from many formats: Access, ASCII, BMDP, Clipper, dBase, DIF, Excel, Gauss, Lotus, NCSS DOS, Paradox, Quattro, SAS, SPSS, Stata, and Systat, among others. The spreadsheet found in NCSS is a Microsoft Excel 4.0 compatible spreadsheet. Thus it works like Excel 4.0, using the same interface, including formatting and formulas. If you are familiar with Excel, then you can easily make transformations, cut, copy, paste, insert, find/replace, delete, "undo" an operation, etc. Besides the traditional spreadsheet manipulations, NCSS has a powerful transformation section, which permits logic operators, numeric, date, fill, mathematical, probability, random number, rearrangement (collate, sort, splice, uncollate, unsplice), recode, statistical, and text functions. The list is long and comprehensive and there is little that you cannot do with your numeric or text data. Working with the Program All procedures (e.g., regression analysis, cross-tabs, histograms, etc.) are controlled by what NCSS calls a template. A template is a list of the settings, options, and parameters that control a particular procedure. Each procedure has its own particular template. Figure 2 shows the template to construct a histogram. The main part of the template has two columns: the option's description on the left and its setting on the right. You control a procedure by changing these settings, and depending on what you write or choose, you will get more or less output. To get to different yet "related" options, you can click on the tabs (or go to the Topic section on the main menu). This template approach is both the power and bane of NCSS. There are so many options that it is easy to get bewildered or even lost. For example, on the histogram template there are more than twenty tabs (see Figure 3 for a topic view of them), and each tab can have up to ten choices of settings. While so many options almost guarantees that your histogram will look precisely how you envision it (you control the number of classes, labels, lines, colors, tick marks, etc.), it is tedious at best and confusing at worst to find and enter those settings. For something like a histogram or box plot, a "less is more" approach would definitely be an improvement. For regression analysis, on the other hand, the long list of options gives comprehensive output, including a large selection of plots. For example, I ran a regression analysis in which I tried to predict the box office return of a movie as a function of how much money was spent on advertising, the budget or cost of the film, and the quality of the movie as perceived by critics. Exhibit 1 shows some of the output (quite a few plots and much additional text output, including the residual analysis, is omitted here) from NCSS. Another notable feature is that NCSS sends all statistics and graphics output to its built-in word processor. There they can be viewed, edited, printed, or saved. NCSS saves reports and graphs in rich text format (RFT), which is a standard Windows document transfer format. Thus, it is easy to import this output directly into your own word processor for further embellishment. The text portions of the output are formatted using tabs, not spaces, so they can be more easily reformatted in your word processor. Accuracy I used many data sets on NCSS, including various "messy" ones. When comparing NCSS with other statistical programs (SPSS, Systat, etc.), I found that NCSS generally gave accurate results, using double precision calculations. The only area in which NCSS gave answers different from other products was in Winters' method for time series analysis. The program (including the commercial version) gave accurate forecasts, but the seasonal indices that it found were significantly different from other programs. I have informed the author about this, and he told me he is looking into the matter. Differences Between the Commercial and Junior Versions The commercial product works the same way as the junior version, but offers many more options. The additional capabilities of the commercial version include: general linear models, logistic, robust, stepwise, all-possible regression, quality control (X- bar, range, c, p, np, u, cusum, EWMA, individuals, moving average control charts, capability analysis, etc.) multivariate analysis (MANOVA, discriminant, cluster, and factor analysis, principal components, eigenvalue analysis, etc.), nonparametric tests, survival analysis, and curve fitting. The graphics capabilities are far more extensive, including 3D plots, error bar plots, surface plots, contour plots, percentile plots, and others. The documentation in the commercial product comes in three volumes, covering more than 2,200 pages. It is well-written, with many examples and screen shots. There is a fourth book, a 90-page Quick Start and Self Help Guide, that covers the basics of the program and includes a series of excellent tutorials. The commercial version costs $395 and ranks with the best statistical software available. If you find that the free junior version is inadequate for your use in the classroom, the author also makes the commercial version available for classroom use. Under an extraordinary offer, if you order a minimum of five copies, the price is just $25 per copy. This does not include the three volumes of documentation. Students get just the Quick Start and Self Help book. For further information, contact:
NCSS Statistical Software Getting the Junior Version The junior version is available on the Web. Just log on at http://www.ncss.com and follow the directions for downloading the files. You should download three zipped or compressed files into three separate temporary directories on your hard disk. Be sure to download the PKZIP utility to uncompress these files. Once you have done this, "unzip" or uncompress the files in each of these three subdirectories onto three high density floppies. Finally, install the program on your hard disk using the files on these floppies. The compressed files in the temporary directories on your hard disk can be erased. Conclusions NCSS Junior is a remarkable product. It is far superior to any software that comes bundled with a statistics text, and in many respects is better than the "student" statistical software that costs between $20 and $50. Except for the omission of statistical quality control, it will do everything traditionally covered in an introductory statistics course. If its capabilities are insufficient for your needs, then a bulk purchase of the commercial product is highly recommended. That product compares favorably with the best on the market, and its output is better than just about any other product. Only the new SPSS 7.0 for Windows 95 compares with it.
If you are interested in writing a software review, please
contact: ***************************************************************** For copies of tables or figures mentioned in this article, contact the Managing Editor at hjacobs@gsu.edu. ***************************************************************** ***************************************************************** DR. JACK YURKIEWICZ is a professor of management science and is the assistant chairperson of the Management Science Department at the Lubin Graduate School of Business at Pace University, New York. He received his Ph.D. in operations research from Yale University. His current interests include computers and software and their use in the classroom, educational software for children, forecasting, and statistical quality control. |