Categorized Means with Error Plots - Advanced Tab

Graphical Analytic Techniques

The Advanced tab of the Categorized Means with Error Plots dialog contains various additional (to the Categorized Means with Error Plots - Quick tab) options for the Means with Error Plots. Use the options on this tab to specify the variables and select the type of graph you want to create. More options are available for computing the graph as well as for its display. Some of the options on this tab are used to add additional components, such as the fit of a predefined function, the outliers and extremes, and certain test statistics.

Graph type. Select the type of Mean with Error Plot to be plotted from the Graph type box. Click the desired plot link below to obtain a brief description of that type of graph.

Whiskers

High-Low Close

Layout. Select the type of layout for the graph(s).

Separate. Select this option button to produce a Separate plot layout (where each subset of cases is displayed in a separate graph) for the categorized plots.

Overlaid. Select this option button to produce an Overlaid plot layout (where all subsets are overlaid in one graph and identified by patterns and colors) for the categorized plots.

Variables. Click the Variables button to display a standard variable selection dialog in which you can select the Dependent variable, the Grouping variable, and the X- and (optional) Y-Category variables for creating the graph. If more than one dependent variable is selected, then a sequence of graphs (one for each dependent variable) will be produced using the same set of grouping variables. The selection that you make will then be displayed in the area of the dialog below the Variables button.

The dependent variable values will be used in calculating the respective statistics that define the components of the graph (e.g., means, medians, standard deviations, etc.), while the grouping variable will be used to categorize the data, using the method of categorization as selected via the options in the Grouping intervals group. Note that the selected grouping variables do not have to be categorical variables (e.g., contain codes); you can use one of the methods of categorization to categorize continuous variables. The selection of grouping variables is not necessary if the categories are defined via the Multiple Subsets method in the X-Categories, Y-Categories, and Intervals group boxes.

X-Categories / Y-Categories. Categorization is used in two classes of graphs in STATISTICA: categorized graphs (e.g., Categorized Scatterplots) and graphs that include grouping or categorized variables (e.g., 2D Histograms, or 2D Box Plots).

Select Integer mode, Unique values, or Categories to specify that method of categorization for each of the variables selected via the Change Variable button, or use the Boundaries, Codes, or Multiple subsets options. For more information about each of these methods of categorization, click on the links below:

Intervals. Use the options in this group box to choose the method of categorization for the selected Grouping variable. Each of the methods is discussed in methods of categorization.

Graph icon. The graph icon in the lower section, left side of the Advanced tab represents the currently selected Graph type (Whiskers or High-Low Close) and the Middle Point options (see below). It also previews the selected Value (Conf. Interval, Non-outlier range, Min-max, or Constant) that will define the Mean with Error Plot that you are about to create as specified in the Whisker group box.

Middle point. The options in the Middle point group box are used to select the statistic that will be used as middle point in the Means with Error Plots.

Value. Select the statistic Mean or Median from the Value drop-down box that will be used to determine the center (middle) points in the plot (variable and group).

Style. Use the Style drop-down box to specify how the middle point should be represented in the Whiskers or High-Low Close plot. You can choose the selected middle point to appear as a line (select Line) or as a point (select Point).

Pooled variance. The Pooled variance check box is available when you select Mean as the Middle point Value. The setting of this check box determines how the standard deviations and standard errors (for the means) are computed from grouped data. When the Pooled Variance check box is selected, STATISTICA computes the pooled within-group (category) variance for all groups (categories), and uses this value as an estimate of σ (Sigma) in computing the standard errors for the means (see, for example, Milliken and Johnson, 1984). Specifically, STATISTICA computes the pooled within-group (category) variance as:

spooled2 = 1/(n-k) * [s12*(n1 -1) + ... + sk2 *(nk -1)]

In this equation, k refers to the k groups in the plot, s12, refers to the variance in the i'th category or group, n1 refers to number of valid observations in the i'th category or group, and n is the overall number of valid observations in the plot.

The standard error of the mean for the i'th group is then computed as:

s.e.(mean) = spooled / square root(ni)

Whisker. The options in the Whisker group box are used to select the options for computing the range of Whiskers or High-Low Close, i.e., to define the error ranges.

Value. Use the Value drop-down box to choose how the range of Whiskers or High-Low Close are computed (Conf. Interval, Non-outlier range, Min-max, or Constant). If you select Conf. Interval, then the range will be displayed as the confidence interval around the mean value. If you select Non-outlier range, then STATISTICA determines which points in the data are outliers (see Outliers and Extremes), and then uses the highest and lowest data points that are closest to the outliers (but are not outliers) to determine the range in the plot. On the other hand, the option Min-Max uses the minimum and maximum values of the data to determine the range, without considering whether or not these values are outliers. If you choose option Constant, then the specified constant will be added/subtracted from the chosen center point (mean or median), to define the range around that center point.

Probability/Coefficient. If you select the Value option (see above) as Conf. Interval, then you also need to specify a value between 0.15 and .99 in the Probability edit field. This value will be used to determine the length of the Whiskers or High-Low Close around the Mean value, based on the standard error for the respective means, and the standard normal (z) value associated with the chosen probability. When you select the Value as Non-outlier range or Min-max (see above), you also need to specify a value in the Coefficient edit field by which the selected Value will be multiplied to determine the range. In case of the Value option as Constant, the value of the Coefficient itself determines the range (no multiplier is used). By default the value of the Coefficient is 1.

Connect middle points. Select the Connect middle points check box to connect the selected middle points (Means, Medians, trimmed Means, or trimmed Medians) of the Whiskers or High-Low Close.

Display raw data. Select this check box to display the raw data points.

Jitter. Use the options in this group box to jitter the data points, i.e. modify the original position of the data point from the center of the graph in order to more easily identify/brush overlapping points.

Off. If you select Off, no jitter is applied to the raw data points, outliers, and extremes.

Sequential. If you select Sequential, the jitter is applied sequentially to the raw data points, outliers, and extremes. The jitter is applied such that the first case in the data set is maximally shifted to the left and the last case is shifted maximally to the right.

Random. If you select Random, the data point is randomly shifted within the available range.

Width. With this option, you can specify the maximum jitter width defined as percentage of box width. Possible percentages range from 0 to 250.

Outliers. The Outliers group box is used to control the display of outliers and extremes. Select either Off, Outliers, Extreme, or Outl. & Extremes from the drop-down box. See Outliers and Extremes for additional details on these options.

Coefficient. If you select Outliers, Extreme, or Outl. & Extremes in the Outliers drop-down box, specify a coefficient in the Coefficient edit field to be used to determine the outlier or extreme value range; see Outliers and Extremes for additional details.

Fit. You can fit an equation to the points in the plots by selecting one of the predefined functions in this dialog.

Linear	Distance Weighted Least Squares
Polynomial	Negative Exponential Weighted
Logarithmic	Spline
Exponential	Lowess

Trim distr. extremes. Use the Trim distr. extremes box to specify the percent of cases to be "trimmed" from the extremes (i.e., tails) of the distributions of cases for the selected dependent variables. For example, if you specify 10%, then for a variable with 100 cases, STATISTICA removes the 10 cases with the lowest values and the 10 cases with the highest values for the respective variable from the graph, and uses only the 80 remaining ("middle") cases. If you enter a value for Trim distrib. extremes for mean-based Means with Error Plots, then the so-called "trimmed means" will be used in the graph.