HomeStartingEnvironmentDBMSVisualPQLProceduresSQLFormsHost/APIIndex
PQL Procedures homecontents start chapter top of pagebottom of pagenext page index Tabulate

Tabulate

The TABULATE procedure selects and summarises data. It then produces cross-tabulations with summary data, statistics or percentages. See
syntax. A wide variety of cross-tabulations can be produced:

homecontents start chapter top of pagebottom of pagenext page index

Syntax

TABULATE
     FILENAME = filename
     HEADER   = (expression)
    [STUB     = (expression)]
    [WAFER    = (expression)]

Record Filtering

  BOOLEAN     = (boolean expression)
  SAMPLE      = fraction
  WEIGHT      = varname

Cell Statistics

  TOTAL      = varname ['label']...
  COUNT      = varname ['label']...
  CSS        = varname ['label']...
  CV         = varname ['label']...
  CVERR      = varname ['label']...
  MAXIMUM    = varname ['label']...
  MEAN       = varname ['label']...
  MEDIAN     = varname ['label']...
  MINIMUM    = varname ['label']...
  MISSING    = varname ['label']...
  NORMALIZED = varname ['label'](n1)['label1'],(n2)['label2']...
  PERCENT    = varname ['label']...
  QUANTILE   = varname ['label'] n ['label1','label2'...]...
  QUANTILE   = varname ['label'](n1)['label1'],(n2)['label2']...
  RANGE      = varname (lo,hi) ['label']...
  STDERR     = varname ['label']...
  STDEV      = varname ['label']...
  TSTATISTIC = varname ['label']...
  USS        = varname ['label']...
  ISDNUMBER = n

Data Print Formatting

  PRINTFORMATS= varlist (option)...

Page Formatting

  PAGETITLE   = 'string'
  PAGELENGTH  = n
  PAGEWIDTH   = n
  COLLAPSE

Header Formatting

  HEADERWIDTH       = n
  HEADERINDENTATION = n
  HEADERDIVIDER     = 'character'
  NODIVIDERS
  NOHEADERCENTER

Stub Formatting

  STUBTITLE       = option 'string'
  STUBWIDTH       = n
  STUBINDENTATION = n
  STUBCONTINUATION= n
  STUBHEIGHT      = n
  STUBFILLER      = 'character'
  STUBDIVIDER     = 'character'

Wafer Formatting

  WAFERTITLE        = option 'string'
  NOWAFERTITLE
  WAFERCENTER
  WAFERSTART        = n
  WAFERINDENTATION  = n
  WAFERCONTINUATION = n
  WAFERHEIGHT       = n

Other Options

  HTML
  NOBOTTOMBORDER
  NOLEFTBORDER
  NORIGHTBORDER
  NOTES= 'string'
  FOOTNOTES= 'string'
  WAFERNOTES= 'string'
  NOZEROS
  ZEROCHARACTER= 'character'
  NOEMPTYROWS
  DELETEMISSING
  UPPERCASE
  SPANNERS
  DUMMY  = varname 'label' ...
  LANGUAGE = DUTCH | GERMAN
The FILENAME and the header expression are the only required clauses.

FILENAME Specify the filename created by the procedure. If a filename is not specified on a subsequent TABULATE procedure, the output is appended to the last TABULATE output file.

If the filename is CGI then, if the procedure is run when SIR/XS is operating from the common gateway interface, the output is returned to the user's browser. When run locally, a filename of CGI results in the file sircgi.htm. It is normal to specify the HTML keyword when using CGI as the filename.

expressions

The three possible expressions (HEADER, STUB & WAFER) define the data in one of the dimensions of the table. Each expression consists of from one to n variables and/or statistical keywords all enclosed in brackets. If multiple variables are specified on a single expression they are related by the keywords BY or THEN.

Percentages can also be used in expressions.

The type of variable, Variable Modifiers, and statistics all effect the Cell contents.

HEADER = (expression) Defines the columns of the tabulation. This clause is required.

STUB = (expression) Defines the rows of the tabulation.

WAFER = (expression) Defines the sections of the tabulation. Sections are a number of two dimensional sections specified by the stub and header.

BY Nest a variable within another thus producing combinations of the variable values within one dimension of the table. The asterisk (*) symbol may be used instead of the keyword.

THEN Appends a second variable or statistical keyword to the set of variables displayed within one dimension of the table. The plus (+) symbol may be used instead of the keyword.

BY and THEN Can be used in combination to create complex structures within a single dimension or expression.

Errors may occur, not only if the table is misspecified, but also if it cannot fit the output page.

homecontents start chapter top of pagebottom of pagenext page index

Expressions

The TABULATE command has the general form:
TABULATE  HEADER = (expression) [STUB = (expression) [WAFER = (expression)]]
A table consists of up to three components or dimensions: header, stub and wafer. Each dimension is defined by an expression. Each expression is identical in format but applies to a different dimension of the table. These expressions name the
variables to tabulate and their relation to each other in the table. The different keywords (HEADER, STUB & WAFER) define the dimension being specified. The HEADER is required; WAFER can only be specified if a STUB is also specified. The sequence of the three expressions in the command is irrelevant.

The header defines variables whose values correspond to columns; the stub defines variables whose values correspond to rows. The rows and columns define a two-dimensional array called the wafer. The wafer is a higher level categorisation resulting in multiple individual row/column combinations. The table description consists of one to three expressions.

Define an expression with variable names and keywords. At the most basic an expression consists of a single variable name. Multiple variables or statistics may be defined in a single expression separated with the keywords BY and THEN. BY specifies a level of nesting within a single dimension; THEN specifies multiple variables within a single dimension.

The structure of the table is defined by the values of the control variables specified in the various expressions. The values in the variables named in these expressions are used as the labels for the particular dimension being specified. The number of categories of any control variable is taken from the schema definition which contains information on variable ranges, valid values and value labels.

The body of the table consists of Cells. The content of a cell is a summary of data which applies to that particular combination of variables in the expressions. This might be a count of occurrences of data, a sum of values, a percentage or some other statistic.

If the command has a single header expression, a table containing a single value for each column is produced. That is one line of data is produced with one entry per column. For example:

TABULATE  HEADER = (REGION)
TABULATE  HEADER = (AGE BY SEX)
TABULATE  HEADER = (SALARY BY (MIN THEN MEAN THEN MAX))
If the tabulate statement contains a STUB expression, then a two-dimensional table is produced. The stub defines the rows of the table going down the page. The header defines the columns going across the page. For example,

TABULATE HEADER = (RACE)  STUB = (REGION)
TABULATE HEADER = (RACE THEN SEX) STUB = (REGION)
TABULATE HEADER = (SALARY BY (MEAN THEN MEDIAN)) STUB = (REGION BY RACE)
By specifying variables in two expressions, a cross-tabulation is produced with one cell for each value of each variable. For example, if a cross tabulation for two variables is specified, where each variable has two values, four cells are produced. The more values, the more cells.

Nesting variables with the BY operator also produces a cross-tabulation within a single dimension. If the same variables are used as a Stub and header or as a header with a BY clause, the same cells are produced but the table has a different shape.

If the tabulate statement contains a WAFER expression, a set of two-dimensional tables is produced. There must be a stub expression for a WAFER expression to be valid. When more than one wafer is generated a table of contents is also produced. If a particular wafer contains no data, it is not printed and is marked deleted in the table of contents. If only one wafer is produced, it is printed even if it is empty. For example:

TABULATE HEADER = (SEX)           STUB = (AGE)         WAFER = (REGION)
TABULATE HEADER = (SEX)           STUB = (AGE)         WAFER = (REGION BY RACE)
TABULATE HEADER = (AGE BY REGION) STUB = (RACE BY SEX) WAFER = (COUNT THEN PCT)

homecontents start chapter top of pagebottom of pagenext page index

BY

Specifying BY between two variables nests each of the values of the second variable in each of the values of the first variable. If a second BY is specified, a third level of nesting takes place. BY defines a structure within a particular dimension. Each variable specifies a further level of nesting. The number of cells produced is the product of the number of values for each variable. There is no limit to the number of levels allowed within one expression. The asterisk (*) symbol can be used instead of BY.

Example: BY

HEADER = (AGE BY INCOME)
Young Old
Rich Poor Rich Poor
6 4 8 3

HEADER = (SEX BY AGE BY INCOME)

Male Female
Young Old Young Old
Rich Poor Rich Poor Rich Poor Rich Poor
4 1 6 2 2 3 2 1

homecontents start chapter top of pagebottom of pagenext page index

THEN

Specifying THEN between two variables or two clauses in effect concatenates two independent tables with all values of the first table followed by all values of the table. The plus (+) symbol can be used instead of THEN.

Example: THEN

The following header expression results in the independent tabulation of individual occurrences of the two values for each of the two variables:

HEADER = (AGE THEN SEX)
Young Old Male Female
10 11 13 8

When two variables are joined by THEN, each value of each variable defines a cell. The number of cells produced is the sum of the number of values for each variable.

homecontents start chapter top of pagebottom of pagenext page index

Combining BY and THEN

BY takes precedence over THEN when the two operators are used in combination in an expression. Use parentheses to specify a different precedence.

Example: BY and THEN

The following examples show various combinations of the same variables in a header expression. In the first example, since nesting takes precedence, SEX is nested with INCOME, and the result of that nesting operation is concatenated with AGE.

HEADER = (AGE THEN INCOME BY SEX)

Young

Old
Rich Poor
Male Female Male Female
10 11 10 4 3 4

The next example concatenates AGE and INCOME and nests SEX within both AGE and INCOME, using parentheses to cause the concatenation operator to be applied first.

HEADER = ((AGE THEN INCOME) BY SEX)
Young Old Young Old
Male Female Male Female Male Female Male Female
5 5 8 3 10 4 3 4

Other combinations might be:

HEADER = (AGE BY INCOME THEN SEX)
Young Old
Male

Female
Rich Poor Rich Poor
6 4 8 3 13 8

HEADER = (AGE BY (INCOME THEN SEX))
Young Old
Rich Poor Male Female Rich Poor Male Female
6 4 5 5 8 3 8 3

Nesting and concatenation can also be used in the stub and wafer expressions.

homecontents start chapter top of pagebottom of pagenext page index

Variable Types

Tabulate uses numeric variables only. This includes categorical variables, which are numeric codes representing discrete string values. In a VisualPQL program, string variables can be
recoded into numeric categorical values. There are two types of numeric variables; variables with discrete categories, known as CONTROL variables and variables with continuous values, known as OBSERVATION variables.

Tabulate determines the tabulation type of a variable from the information in the variable schema according to the following rules:

Most variables are either control or observation. For example, salary would be an observation variable and department would be a control variable. However, some variables may be reasonably used in either way. A variable containing number of persons per family might be used as a control variable in one table but might be aggregated in another table along with a count of families, in order to get the average persons per family. Such a variable can be defined as observation type in the VisualPQL program or database and then reclassified in the TABULATE statement as a control variable.

If a control variable contains missing values, the record is normally excluded from the body of the tabulation. Categories can be generated to show cells for missing control variables. If an observation variable contains missing values, the record is normally included but the variable is not computed into the cell contents. This can give rise to situations where overall totals do not agree with counts in the body of the table. Specify the DELETEMISSING keyword to exclude records where any cross-tabulated variable has a missing or undefined value.

homecontents start chapter top of pagebottom of pagenext page index

Cell Contents

Different types of variables combine to give different cell contents and a different table.

Control variables provide classifications which determine the structure of the table and the presence of a particular cell. Observation variables give the result that appears in the cell.

When two control variables are cross tabulated or nested, the content of the cell is a count of the number of occurrences of data which match that combination of values.

When an observation variable is cross tabulated or nested with a control variable, the total (sum) for the observation variable is the default. That is, the structure of the table is unaltered by the inclusion of an observation variable but, instead of a count of data appearing in the cell contents, a total of the observation variable is calculated and an appropriate header label printed. Other statistics can be specified for observation variables.

Two observation variables cannot be cross tabulated or nested within each other and therefore observation variables may be specified in one and only one expression (dimension) of the TABULATE statement.

Different types of information are reported in the cell contents for Control and Observation variables. For Control variables, which have discrete values, counts or percentages are reported. For Observation variables, which have continuous values, a variety of cell statistics are reported, the default being a sum of the values. For any cell that cross tabulates a control and an observation variable, the observation variable statistics are reported.

Example: Observation Variables

In the following three examples, the observation variable SALARY is included in the expression. The number in any given cell is the sum of the SALARY values from each record having the combination of values for the control variables for that particular cell. The placement of the observation variable in the header expression determines the placement of the label, but does not alter the content of the table. The observation variable does not alter the number of cells. Observation variables and counts may be specified in a single expression in various ways:

HEADER = (SALARY BY AGE BY SEX) STUB = (REGION)
SALARY
Young Old
Male Female Male Female
North..........
South..........
2800
2200
5650
11700
20100
4950
3200
5000

HEADER = (SEX THEN (SALARY BY SEX) STUB = (AGE)

Male

Female
Current Monthly Salary
Male Female
Young...
Old.....
5
8
5
3
13050
21900
14450
8200

HEADER = (AGE BY (SEX THEN SALARY)) STUB = (REGION)
Young Old

Male

Female
Current Monthly Salary
Male

Female
Current Monthly Salary
North....
South....
3
2
2
3
13600
13900
6
2
1
2
20150
9950

homecontents start chapter top of pagebottom of pagenext page index

Variable Modifiers

Modifiers change the attributes of a variable in an expression. These have the form:

variable[.modifier][.modifier]...
Several modifiers can be appended to a single variable. The modifiers and their abbreviations are:

.CONTROL   (.C)
.OBS       (.O)
.FIRST     (.F)
.TOTAL     (.T)
.MISSING   (.M)
.UNDEFINED (.U)
.SPAN      (.S)
.NOSPAN    (.NOS)
The .CONTROL and .OBS modifiers alter the default tabulation type of a variable (i.e. control or observation). The .CONTROL modifier may only be specified for an observation variable if valid ranges have been specified for it.

The .FIRST modifier is used in conjunction with the percent base marker as described in the section on percentages. It specifies that the first occurrence of the percent base marker is used.

The .TOTAL modifier automatically concatenates a TOTAL control variable as described in the section on TOTAL. The TOTAL is displayed before the variable being modified.

When a control variable has a missing or undefined value, it is normally excluded from the detail cells in the table. Specify the .MISSING or .UNDEFINED modifiers to generate additional cells in the table. The .UNDEFINED modifier groups all missing and undefined values into a single cell. The .MISSING modifier displays undefined and each defined missing value as separate cells. These modifiers override any DELETEMISSING specification for that variable.

Example: Undefined Modifier

HEADER = (AGE.T.U)
TOTAL Young Old AGE=
UNDEFINED
22 10 11 1

By default, the rows and columns of a table are labeled with value labels (if they exist) or with the "variable=value" notation. The .SPAN modifier generates a second level of heading for a variable. This heading is either the variable label (if it exists) or the variable name. This new heading "spans" all the categories of the variable, hence the name .SPAN.

Example: SPAN Modifier

The next two tables illustrate the effect of .SPAN. The only difference between them is the .SPAN modifier in the second table.

HEADER = (AGE BY SEX) STUB = (REGION)
Young Old
Male Female Male Female
North..........
South..........
3
2
2
3
6
2
1
2

HEADER = (AGE.S BY SEX.S) STUB = (REGION.S)
AGE
Young Old
Gender
Male Female Male Female
REGION
North..........
South..........

3
2

2
3

6
2

1
2

The SPANNERS option can be used to turn on spanning labels for all variables. The .NOSPAN modifier turns off spanning labels for a particular variable when the SPANNERS option is in effect.

homecontents start chapter top of pagebottom of pagenext page index

Statistics

Other than
TOTAL, statistics are used with observation variables. Statistics are specified with the BY keyword to associate them to an observation variable.

There are two methods of specifying statistics.

The statistical clauses are:

COUNTA count of records in this category.

CSSCorrected Sum of Squares, where the
cell = sum of squares - ((sum**2)/n).

CV Coefficient of Variance, where the
cell = (standard deviation/mean) * 100.

CVERRCoefficient of Error, where the
cell = (standard error/mean) * 100.

MAXIMUMMaximum value of the variable.

MEANMean value of the variable.

MEDIANMedian value of the variable.

MINIMUMMinimum value of the variable.

MISSINGCount of missing values in the variable.

NORMALIZED*Normalized quantiles (by median value) where cell = (quantile/median) * 100.

PERCENT*Percentage, where cell = (cell/base) * 100.

QUANTILE*Quantile, equal or unequal quantiles can be specified.

RANGE*Range produces the range of the variable expressed as the difference between the lowest and the highest values encountered. The format of the range clause is:
RANGE = variable (lo,hi) ['label']
The specification of the lo and hi values defines the outer limits of any values to be included in the calculation. This can be used, for example, to exclude negative or zero values or very high numbers used as some coding convention.

STDERRStandard Error where cell = standard deviation/sqrt(n).

STDEVStandard Deviation about the mean.

TOTALCount of records. When cross tabulated with an observation variable, gives sum of values of the observation variable.

TSTATISTICT-Statistic where cell = mean/standard error.

USSUncorrected Sum of Squares where cell = sum of squares.
* Cannot be used as keywords directly in an expression.

Example Statistical Clauses

The TOTAL clause specifies that the pseudo-variable N is a total, with the label 'Valid Cases'. The MEAN and STDEV clauses assign labels for these pseudo-variables to 'X Bar' and 'Sigma'.

TABULATE HEADER = (N THEN SALARY BY (MEAN THEN STDEV)) STUB = (SEX)
         TOTAL  = N 'Valid Cases'
         MEAN   = MEAN 'X Bar'
         STDEV  = STDEV 'Sigma'

Valid
Cases
Current monthly
salary
X Bar Sigma
Male...........
Female.........
13
8
2688
2831
486
746

The pseudo-variables created can be used anywhere that is valid. If the pseudo-variable is a TOTAL it is a control variable, otherwise it is an observation variable. Multiple pseudo-variables can be created and assigned to the same keyword. In the next example, both N and SUM are totals, N is the count of records and SUM is the total SALARY.

TABULATE HEADER = (N THEN SALARY BY (SUM THEN MEAN THEN STDEV)) STUB = (SEX)
         TOTAL = N 'Valid Cases'  SUM 'Sum'
         MEAN  = MEAN 'X Bar'
         STDEV = STDEV 'Sigma'

Valid
Cases
Current monthly salary
Sum X Bar Sigma
Male...........
Female.........
13
8
34950
22650
2688
2831
486
746

Example: Statistical keywords

HEADER = (SALARY BY (MIN THEN MAX)) STUB = (SEX)
Current monthly
salary
MINIMUM MAXIMUM
Male...........
Female.........
2000
1650
3600
4000

Keywords must be nested within an observation variable. Keywords may not be nested within each other; they may be concatenated. For example, SALARY BY MEAN BY STDEV is not allowed; SALARY BY (MEAN THEN STDEV) is allowed.

Accuracy and ISDNUMBER

Medians and quantiles are estimates of the true values. The accuracy of the estimates is governed by a value called the interval size designator or ISD number
. This number is initially set to 4 which means that the maximum relative error in a median or quantile estimate should not exceed 12.5%. The value is usually more accurate if the values are distributed smoothly. Set the ISD number explicitly by using the clause:

ISDNUMBER = n /
that sets the ISD number to n. Increasing the ISD number by 1 cuts the maximum relative error in half while doubling the internal storage required to compute the medians or quantiles. The ISDNUMBER clause must precede the clauses it is to affect. For example,

TABULATE HEADER    = (SALARY BY (MEAN THEN Q)) STUB = (TOTAL THEN REGION)
         ISDNUMBER = 5
         QUANTILES = Q 'Quantiles' 4 

homecontents start chapter top of pagebottom of pagenext page index

Total

TOTAL can be specified as an independent control variable which is a count of records. It is specified in exactly the same way as other statistics (using either pseudo-variables or a keyword). However, since it is a control variable rather than an observation variable, it operates in a different manner.

Example: TOTAL

HEADER = (AGE THEN TOTAL)
Young Old TOTAL
10 11 22

The control variable, AGE, with two classifications is concatenated with TOTAL. The total column equals the count of all occurrences at the appropriate level of nesting. (Note that records with undefined values have still been counted.)

HEADER = (SALARY THEN TOTAL)
Current
monthly
salary

TOTAL
57600 22

SALARY is an observation variable and thus the sum is displayed by default. The keyword TOTAL is a control variable and displays a count. Cross tabulating or nesting TOTAL with either control or observation variables is allowed since it is a control variable. As per the standard rules for cell contents, if TOTAL is nested with a control variable, a count is produced; if it is nested with an observation variable, a sum of the observation variable is produced.

Example: TOTAL with BY and THEN

The following header expression uses the observation variable SALARY and the control variables AGE and SEX:

HEADER = (SALARY BY AGE BY (TOTAL THEN SEX))
Current Monthly Salary
Young Old
TOTAL Male Female TOTAL Male Female
27500 13050 14450 30100 21900 8200

Since TOTAL is nested within an observation variable, this column contains the total SALARY for each AGE group.

TOTAL shorthand notation

There is a shorthand notation for TOTAL. Appending a single T to the variable name separated by a period specifies that a TOTAL is produced before the variable. The following two expressions are identical:

TOTAL THEN AGE
AGE.T
A tabulate statement can be shortened as follows:

TABULATE  HEADER = (SALARY BY AGE BY (TOTAL THEN SEX))
TABULATE  HEADER = (SALARY BY AGE BY SEX.T)

homecontents start chapter top of pagebottom of pagenext page index

Normalized and Quantiles

Normalized and Quantile specifications are very similar to other statistics but have to supply some additional information. This can take one of two forms:

In the first form, the quantity "n" specifies the number of equal-sized quantiles to be produced. Of these, n-1 are printed. Thus for n=4, TABULATE prints output for the 25%, 50% and 75% quantiles. In the second form a number of non-equal sized quantiles are produced and each n specifies the quantiles to produce. For example, specifying Ns of 15 and 85 produces the 15th and 85th percentage quantiles. Both forms allow specification of labels. In the first form, when labels are used for individual quantiles, the commas separating 'label1', 'label2', etc. are required.

1)  keyword = variable ['label'] n ['label1', ...]
2)  keyword = variable ['label'] (n1) ['label'] (n2)...

Example: Quantiles

TABULATE HEADER = (SALARY BY (MEAN THEN Q)) STUB = (REGION.T)
         QUANTILES = Q 'Quantiles' 4
Current monthly salary


MEAN
Quantiles
QUANTILE-
=25
QUANTILE-
=50
QUANTILE-
=75
TOTAL.......
North........
South........
2743
2813
2650
2311
2238
2369
2678
2825
2538
3222
3315
2913

A specification in the first form , with multiple labels:

TABULATE  HEADER    = (SALARY BY (MEAN THEN Q)) STUB = (REGION.T)
          QUANTILES = Q 'Quantiles' 4 '25%', 'Median', '75%'
Current monthly salary


MEAN
Quantiles
25% Median 75%
TOTAL........
North........
South........
2743
2813
2650
2311
2238
2369
2678
2825
2538
3222
3315
2913

A specification in the second form:

TABULATE HEADER    = (SALARY BY (MEAN THEN Q)) STUB = (REGION.T)
         QUANTILES = D 'Deciles' (10) '10th' (50) '50th' (90) '90th'
Current monthly salary


MEAN
Deciles
10th 50th 90th
TOTAL........
North........
South........
2743
2813
2650
2051
1973
2267
2678
2825
2538
3407
3647
3265

homecontents start chapter top of pagebottom of pagenext page index

Percentages

TABULATE can calculate and display percents. Tables can be specified which display just percentages or which also include the original values.

Tables can be specified which contain row percentages, column percentages or overall percentages. Multiple percentages can be displayed in a single table. A single table can have percentages computed from more than one base value. However a table cannot have any cell which is an intersect of two percentage calculations. This means that all percentage cells must appear in a single expression; there cannot be percentages in both a row and column expression.

To display percentages, specify where the percents appear and the base of each percent calculation. Use the PERCENT option to define a pseudo-variable and then reference this in an expression to display the percent. Specify the base variable to use to calculate the percentage by appending a percentage symbol (%) and the same pseudo-variable name to a base variable. The only valid base variables are TOTAL control variables, typically cross tabulated against the observation variable that is the percentage variable.

Shorthand Notation for Base Marker Specification

If a percent base marker is appended to a variable, it specifies that a TOTAL precedes the variable and that this is the base. For example, the following expressions are equivalent:

TOTAL%X THEN REGION
REGION%X
This shorthand notation can simplify the tabulate expressions. For example, both the following statements are equivalent:

TABULATE HEADER = (TOTAL%X THEN REGION) STUB = (X BY (TOTAL%X THEN AGE))
TABULATE HEADER = (REGION%X)            STUB = (X BY AGE%X)
The table output is the same in both cases. With the shorthand form, the label of the created TOTAL category is always "TOTAL".

A brief review of .T and %X notation: An expression such as AGE.T, produces three rows or columns - TOTAL,Young and Old. An expression such as REGION%X, produces three categories - TOTAL, North and South. It also specifies the TOTAL column as the percent base. The sub-expression, X.T, is the same as TOTAL THEN X and produces two cells for each category.

The basics of producing percentages in tables are covered in simple percentages. You can also get percentages with observation variables and tables with values and percentages. Tables can have percentages for row totals and can be displayed in different wafers. There are numerous examples of different tables using percentages in percentage examples.

homecontents start chapter top of pagebottom of pagenext page index

Percentages with Counts

The simplest percentage tables are produced with control variables and cell contents are therefore counts. Consider the following standard table:

HEADER = (TOTAL THEN REGION) STUB = (TOTAL THEN AGE)
TOTAL North South
TOTAL..........
Young..........
Old............
22
10
11
12
5
7
9
5
4

To add percentages to this table, define a pseudo-variable, say X, and reference this on PERCENT clause, for example:

PERCENT = X 'Percent'
Choose the base row or column for the percentage computations. Base cells are indicated within an expression by appending a percentage symbol (%) and the pseudo-variable to the base variable. For row percentages, there must be a total column for the base or 100% column. In the example, the TOTAL variable in the header expression is the total column. Therefore, append %X to TOTAL to make it the base for percentage calculations.

Decide where to place the percent variable X, ensuring that it is nested correctly. Specifying it in the header expression or in the stub expression only affects the labeling of the table.

Row Percents

TABULATE HEADER  = (X BY (TOTAL%X THEN REGION)) STUB = (TOTAL THEN AGE)
         PERCENT =  X 'Percent'
Percent
TOTAL North South
TOTAL..........
Young..........
Old............
100
100
100
55
50
64
41
50
36

The total (100%) and regional percentages for each age group (row) are displayed in the table. The table contains only percentages because the percent variable, X, is at the highest level of the header expression nesting. The remainder of the header, (TOTAL THEN REGION), is nested within X. The percent marker, %X, specifies the header column TOTAL as the 'set' of base cells for this table's percent calculations. Each cell value for TOTAL is the base for the percents in its row. For example, the calculations for the first row of the table are:

TOTAL  = 2200/2200 * 100 = 100
         North  = 1200/2200 * 100 = 55
         South  =  800/2200 * 100 =  41
Note: The above illustrates the effect of missing values in a control variable. Some records had missing values in REGION and thus the columns do not reflect these records. If required, specify the MISSING modifier on REGION to generate an extra column for those records.

Moving X to the stub expression changes the labeling of the table but not its contents:

TABULATE HEADER = (TOTAL%X THEN REGION) STUB = (X BY (TOTAL THEN AGE))
         PERCENT = X 'Percent'
TOTAL North South
Percent
TOTAL..........
Young..........
Old............

100
100
100

55
50
64

41
50
36

Column Percents

To display column percentages instead of row percentages, use a total row to be the base or 100% row.

In the example, move the base marker, %X, from TOTAL in the header and append it to TOTAL in the stub:

TABULATE HEADER = (X BY (TOTAL THEN REGION)) STUB = (TOTAL%X THEN AGE)
         PERCENT = X 'Percent'
Percent
TOTAL North South
TOTAL..........
Young..........
Old............
100
45
50
100
42
58
100
56
44

As in the preceding table, the percent variable, X, is nested with all cells of the table so that all cells contain percents. In this table, however, the percent marker, %X, specifies the TOTAL row in the stub expression as the set of base cells for this table's percent calculations. For example, the calculations for the first column of the table are:

          TOTAL            = 2200/2200 * 100 = 100
          Young            = 600/2200 * 100 =  45
          Old              = 1400/2200 * 100 =  50

Base Markers in Both the Header and the Stub

To get overall or grand total percentages, place the percent base markers in both the stub and the header. By convention, the cell (or cells) at the intersection of the base row and column are considered the base cells for the table.

TABULATE  HEADER = (X BY (TOTAL%X THEN REGION)) STUB = (TOTAL%X THEN AGE)
          PERCENT = X 'Percent'
Percent
TOTAL North South
TOTAL..........
Young..........
Old............
100
45
50
55
23
32
41
23
18

In this table, all percentages are measured using the first cell of the table as the base. This cell is at the intersection of the stub TOTAL row and the header TOTAL column.

homecontents start chapter top of pagebottom of pagenext page index

Percentages with Observation Variables

Percentages of observation variables can be produced but at least one control variable must be nested with the percent variable.

Example: Percentages for Observation Variables

The following shows salary by age group:

TABULATE  HEADER = (SALARY BY (TOTAL THEN AGE))
Monthly salary
TOTAL Young Old
57600 27500 30100

The corresponding percentage table is:

TABULATE  HEADER = (X BY SALARY BY (TOTAL%X THEN AGE))
PERCENT = X 'Percent' /
Percent
Monthly salary
TOTAL Young Old
100 48 52

Moving the clause (TOTAL%X THEN AGE) to the front of the expression, results in the same data in the table with different labeling:

TABULATE HEADER = ((TOTAL%X THEN AGE) BY X BY SALARY)
         PERCENT = X 'Percent'
TOTAL Young Old
Percent Percent
Monthly salary Monthly salary Monthly salary
100 48 52

homecontents start chapter top of pagebottom of pagenext page index

Original Values and Percents

Original values can be included as well as the percents. In the examples that follow, all of the manipulations are performed in the header expression. However, they could equally be done in stub or wafer expressions.

The following has alternating columns of original values and percents. Note only the cells nested within the percent variable X contain percents.

TABULATE HEADER  = (REGION%X BY X.T) STUB = (AGE.T)
         PERCENT =  X 'Pct' /
TOTAL North South
TOTAL Pct TOTAL Pct TOTAL Pct
TOTAL......
Young......
Old........
22
10
11
100
100
100
12
5
7
55
50
64
9
5
4
41
50
36

Instead of presenting the data in alternating columns, it could be grouped into separate blocks of frequencies and percents by inverting the order of the header expression.

TABULATE HEADER  = (X.T BY REGION%X) STUB = (AGE.T)
         PERCENT =  X 'Pct' /
TOTAL Pct
TOTAL North South TOTAL North South
TOTAL.....
Young.....
Old.......
22
10
11
12
5
7
9
5
4
100
100
100
55
50
64
41
50
36

homecontents start chapter top of pagebottom of pagenext page index

Percents for Row Totals.

TABULATE HEADER  = (TOTAL THEN (X BY REGION%X)) STUB = (AGE.T)
         PERCENT =  X 'Pct' /
TOTAL Pct
TOTAL North South
TOTAL..........
Young..........
Old............
22
10
11
100
100
100
55
50
64
41
50
36

An analogous set of tables can be designed for percentages and totals of an observation variable. The following table displays alternating columns of total salary and percentage.

TABULATE HEADER  = (SALARY BY SEX%X BY X.T STUB) = (AGE.T)
         PERCENT =  X 'Pct'
Current monthly salary
TOTAL Male Female
TOTAL Pct TOTAL Pct TOTAL Pct
TOTAL.....
Young.....
Old.......
57600
27500
30100
100
100
100
34950
13050
21900
61
47
73
22650
14450
8200
39
53
27

This table groups all salary columns followed by all the percent columns.

TABULATE HEADER  = (SALARY BY X.T BY SEX%X) STUB = (AGE.T)
         PERCENT =  X 'Pct'
Current monthly salary
TOTAL Pct
TOTAL Male Female TOTAL Male Female
TOTAL.....
Young.....
Old.......
57600
27500
30100
34950
13050
21900
22650
14450
8200
100
100
100
61
47
73
39
53
27

The following shows total salary for both sexes and percentage by sex.

TABULATE HEADER  = (SALARY BY (TOTAL THEN (X BY SEX%X))) STUB = (AGE.T)
         PERCENT =  X 'Pct'
Current monthly salary

TOTAL
Pct
TOTAL Male Female
TOTAL.....
Young.....
Old.......
57600
27500
30100
100
100
100
61
47
73
39
53
27

homecontents start chapter top of pagebottom of pagenext page index

Percentages in Wafers

It is possible to display frequencies and percentages in different wafers of the same table. For example:
TABULATE HEADER  = (REGION%X) STUB = (AGE%X) WAFER = (X.T)
         PERCENT =  X 'Percent'
The wafer expression is X.T which stands for TOTAL THEN X. Thus the table consists of two wafers, the first wafer contains totals and the second contains percentages. Each wafer is a two-dimensional table of AGE (in the stub) by REGION (in the header). Since the base marker, %X, appears in both the stub and the header, the percentage wafer contains overall percentages. The following two wafers are the output of the previous tabulate statement:

HEADER  = (REGION%X) STUB = (AGE%X) WAFER = (X.T)
Table of Contents:
TOTAL........................................  page 1
Percent......................................  page 2

HEADER  = (REGION%X) STUB = (AGE%X) WAFER = (X.T)
TOTAL..............

TOTAL
North
South

TOTAL..........
Young..........
Old............

22
10
11

12
5
7

9
5
4

(On next page)

HEADER = (REGION%X) STUB = (AGE%X) WAFER = (X.T)
Percent............

TOTAL North South
TOTAL..........
Young..........
Old............
100
45
50
55
23
32
41
23
18

homecontents start chapter top of pagebottom of pagenext page index

Example percentage tables

The following examples illustrate some of the different ways to display percentages. The examples are all based on the following table of frequency counts.
TABULATE HEADER = (REGION.T) STUB = (AGE.T BY SEX.T)
TOTAL North South
TOTAL
TOTAL...........
Male............
Female..........
Young
TOTAL...........
Male............
Female..........
Old
TOTAL...........
Male............
Female..........

22
13
8

10
5
5

11
8
3

12
9
3

5
3
2

7
6
1

9
4
5

5
2
3

4
2
2

Base marker appended to AGE

This table contains nine base (100%) cells. The TOTAL for AGE is the base cell (AGE%X). Since SEX.T is nested within AGE, there are three base cell rows, and since the header contains three columns (REGION.T) there is a total of nine base cells (three columns times three rows).

TABULATE HEADER = (X BY REGION.T)  STUB = (AGE%X BY SEX.T)  PERCENT = X 'Percent'
Percent
TOTAL North South
TOTAL
TOTAL..........
Male...........
Female.........
Young
TOTAL..........
Male...........
Female.........
Old
TOTAL..........
Male...........
Female.........
100
100
100
45
38
63
50
62
38
100
100
100
42
33
67
58
67
33
100
100
100
56
50
60
44
50
40

Base marker appended to SEX

Note that the nine base cells now occur where TOTAL for SEX appears.

TABULATE HEADER = (X BY REGION.T) STUB = (AGE BY SEX%X) PERCENT = X 'Percent'
Percent
TOTAL North South
Young
TOTAL..........
Male...........
Female.........
Old
TOTAL..........
Male...........
Female.........

100
50
50

100
73
27

100
60
40

100
86
14

100
40
60

100
50
50


Use of FIRST

In the previous table, base rows appeared at every occurrence of the TOTAL variable for SEX. The next table uses .FIRST to restrict the base row to the first occurrence of TOTAL for SEX.

TABULATE HEADER  = (X BY REGION.T) STUB = (AGE BY SEX%X.F)
         PERCENT =  X 'Percent'
Percent
TOTAL North South
Young
TOTAL..........
Male...........
Female.........
Old
TOTAL....... ...
Male...........
Female.........

100
50
50

110
80
30

100
60
40

140
120
20

100
40
60

80
40
40

Multiple Base Markers

The next three tables are similar to the first three with the addition of a second base marker appended to REGION in the header. The base cells occur at the intersections of the base rows and columns.

TABULATE HEADER  = (X BY REGION%X) STUB = (AGE%X BY SEX.T)
         PERCENT =  X 'Percent'
Percent
TOTAL North South
TOTAL
TOTAL..........
Male...........
Female.........
Young
TOTAL..... .....
Male...........
Female.........
Old
TOTAL..........
Male...........
Female.........

100
100
100

45
38
63

50
62
38

55
69
38

23
23
25

32
46
13

41
31
63

23
15
38

18
15
25

TABULATE HEADER  = (X BY REGION%X) STUB = (AGE BY SEX%X)
         PERCENT = X 'Percent'
Percent
TOTAL North South
Young
TOTAL..........
Male...........
Female.........
Old
TOTAL....... ...
Male...........
Female.........

100
50
50

100
73
27

50
30
20

64
55
9

50
20
30

36
18
18

TABULATE HEADER  = (X BY REGION%X) STUB = (AGE BY SEX%X.F)
         PERCENT = X 'Percent'
Percent
TOTAL North South
Young
TOTAL..........
Male...........
Female.........
Old
TOTAL....... ...
Male...........
Female.........

100
50
50

110
80
30

50
30
20

70
60
10

50
20
30

40
20
20

Multiple Percent Variables Within a Table

More than one percent distribution can be displayed in a single table by specifying more than one percent variable. For example, the following two variables could be specified representing row and column percentages:

PERCENT=  PR 'Row Pct' PC 'Col Pct' /
Since PR represents row percentages, the base marker associated with PR must be in the header expression. Similarly, since PC, represents column percentages, the base marker associated with PC must be in the stub expression. In the example below, the results are presented in two tables concatenated together. The first table contains row percentages, the second contains column percentages.

TABULATE HEADER  = ((PR + PC) BY REGION%PR) STUB = (AGE%PC)
         PERCENT = PR 'Row Pct' PC 'Col Pct'
Row Pct Col Pct
TOTAL North South TOTAL North South
TOTAL.....
Young.....
Old.......
100
100
100
55
50
64
41
50
36
100
45
50
100
42
58
100
56
44

Multiple percents in Rows

The next example presents the same data in a more conventional form. Notice (PR + PC) is moved to the stub expression.

TABULATE HEADER  = (REGION%PR) STUB = (AGE%PC BY (PR + PC))
         PERCENT = PR 'Row Pct' /
         PERCENT = PC 'Col Pct' /
TOTAL North South
TOTAL
Row Pct.........
Col Pct.........
Young
Row Pct.........
Col Pct.........
Old
Row Pct.........
Col Pct.........

100
100

100
45

100
50

55
100

50
42

64
58

41
100

50
56

36
44

Example: Overall Percents

Overall percentages can be included in the same table by adding a third percent variable, PT. Since PT is to represent overall percentages, append the PT base marker to both the stub and the header variables.

TABULATE HEADER =  (REGION%PR%PT) STUB = (AGE%PC%PT BY (PR + PC + PT))
         PERCENT = PR 'Row Pct'
         PERCENT = PC 'Col Pct'
         PERCENT = PT 'Tot Pct'
TOTAL North South
TOTAL
Row Pct.........
Col Pct.........
Tot Pct.........
Young
Row Pct.........
Col Pct.........
Tot Pct.........
Old
Row Pct.........
Col Pct.........
Tot Pct.........

100
100
100

100
45
45

100
50
50

55
100
55

50
42
23

64
58
32

41
100
41

50
56
23

36
44
18

Counts and Multiple Percents

Counts can be included by adding the TOTAL keyword:

TABULATE HEADER = (REGION%PR%PT) STUB = (AGE%PC%PT BY (TOTAL + PR + PC + PT))
         TOTAL = TOTAL 'Count'
         PERCENT = PR 'Row Pct'
         PERCENT = PC'Col Pct'
         PERCENT = PT 'Tot Pct'
TOTAL North South
TOTAL
Count...........
Row Pct.........
Col Pct.........
Tot Pct.........
Young
Count...........
Row Pct.........
Col Pct.........
Tot Pct.........
Old
Count...........
Row Pct.........
Col Pct.........
Tot Pct.........

22
100
100
100

10
100
45
45

11
100
50
50

12
55
100
55

5
50
42
23

7
64
58
32

9
41
100
41

5
50
56
23

4
36
44
18

Percentages in both Columns and Rows

It is not possible to produce a table which has a column of percentages on the total and a row of percentages on the same total. The solution to this is to produce a second table on the same file with a single row - the final percentages.

Suppose the table is a breakdown of males and females under and over forty. The basic table (without percentages) is easily produced with a table statement such as:

TABULATE HEADER = (AGEGROUP.T)  STUB = (GENDER.T)
However it is impossible to produce both sets of percentages in the same table. The required output is the following:

Total Young and Old Under 40 40+ % BY AGE
Total Males and Females 20 6 14 100%
Male 12 2 10 60%
Female 8 4 4 40%
% BY GENDER 100% 30% 70%

It is simple to add percentages to either the row totals, giving percentages of males and females or the column totals giving percentages of age groups but not both at once. The solution is to produce two tables in the one run:

TABULATE HEADER =(TOTY THEN AGEGROUP THEN X) STUB = (TOTX%X THEN GENDER)
         FILENAME = A
         TOTAL = TOTX 'Total Males and Females'
         PERCENT = X '% BY AGE'
TABULATE HEADER = (TOTY%Y THEN AGEGROUP) STUB = (Y)
         TOTAL = TOTY 'Total Young and Old'
         PERCENT = Y '% BY GENDER'
Note that the second tabulate does not have a FILE clause and thus writes the output to the same file. This second tabulate produces one row which is the required set of final percentages.

homecontents start chapter top of pagebottom of pagenext page index

Record Filtering

BOOLEAN  = (logical expression)
SAMPLE   = fraction
WEIGHT   = varname
BOOLEAN Specifies which procedure table records are used by the procedure. The procedure table records for which the logical expression is true are used by the procedure. If this option is not specified, all procedure table records are used.

SAMPLE Specifies that a random sample of the procedure table records are used by the procedure.
The fraction specifies the percent of records used and is specified as a positive decimal number less than or equal to 1 (one). .25, for example specifies that a 25% sample be used.

WEIGHT Specifies an integer procedure variable used to weight the frequency counts and aggregations. By default, each record adds a count of 1 to frequency counts. Specifying a weight variable, adds the value of that variable rather than 1. For example, if WEIGHT = FAMSIZE were specified, then, in a table of RACE by REGION, a procedure table record would contribute a count of 5 to the RACE=1, REGION=2 cell of the table if it contained the data:
RACE=1   REGION=2   FAMSIZE =5

homecontents start chapter top of pagebottom of pagenext page index

Print Formatting

PRINTFORMATS = var list (options,...) .../
By default, cell contents are printed as integers. The PRINTFORMATS option is used to alter the defaults. The options and their abbreviations are:

COMMAS (C) Separates groups of 3 digits by commas.

DECIMAL=n (n) Specifies the number of decimal places to be printed. A decimal point is only printed when the number of decimal places is non-zero.

DOLLAR (D) Places a floating dollar sign before the number.

PERCENT (P) Places a trailing percent sign after the number.
By default the DOLLAR, PERCENT and COMMAS options are not in effect and DECIMAL= 0 (i.e. numbers are printed as integers).

Use of the PRINTFORMATS clause.

TABULATE HEADER = (SALARY BY (TOTAL THEN (X BY SEX%X))) STUB = (AGE.T)
         PERCENT = X 'Pct' /
         PRINTFORMATS = SALARY (D,C) X (P,2) /
Current monthly salary
TOTAL Pct
TOTAL Male Female
TOTAL..........
Young..........
Old............
$57,600
$27,500
$30,100
100.00%
100.00%
100.00%
60.68%
47.45%
72.76%
39.32%
52.55%
27.24%

TOTAL under SALARY is printed with dollar signs and commas. However, TOTAL under PCT is printed with a percent sign and two decimal places because of the precedence rules for print formats.

Precedence rules for print formats

For example:

TABULATE HEADER  = (SALARY BY REGION BY (MIN THEN MAX))
         MINIMUM = MIN 'Lowest'
         MAXIMUM = MAX 'Highest'
         PRINTFORMATS = SALARY (D) MIN (2)MAX (2)
The salaries are listed with 2 decimal points and no dollar sign. To print the salaries with dollar signs and no decimal points the expression is:

TABULATE HEADER = (REGION BY (MIN THEN MAX) BY SALARY)

Labels

When control variable values are printed, they are labeled by their value labels. If there are no value labels, the name of the variable and the individual value is displayed.

Observation variables are displayed in a summary cell labeled with the variable label. If there is no variable label, the variable name is used.

When not enough room is provided in a table for label or title information, the label or title is broken into two or more segments so that it can fit in the allotted space. The VALUE LABELS command defines value labels for missing values including BLANK and UNDEFINED.

homecontents start chapter top of pagebottom of pagenext page index

Page Formatting

PAGETITLE = 'string'['string'['string']]
PAGELENGTH = n
PAGEWIDTH = n
COLLAPSE

The following options alter the default formatting of the page.

PAGETITLE Specifies the title printed left justified at the top of the page. By default, the page title is the string "SIR/XS Tabulate Procedure" and the page number is printed at the top right-hand side of the page. The page title may consist of up to 3 lines. The first line contains the page number. The second and third lines are printed if there are wafers and/or if the wafers must be broken into chunks (i.e. the wafer does not fit on a single page).

PAGELENGTH The maximum number of lines that can appear on a page. Specifying PAGELENGTH = NOEJECT turns off paging. The default PAGELENGTH value is the current output file page length. This must include space for:
the page title and page number line(s),
the wafer title, wafer label, and the blank lines between,
the table (wafer) itself and any WAFERNOTES specified.

PAGEWIDTH Controls the width of the printed page. By default, tables are printed centred within PAGEWIDTH print positions. The default value is the current output file page width.

COLLAPSE If a wafer exceeds either PAGELENGTH or PAGEWIDTH, an attempt is made to break the wafer into multiple 'chunks', each printed on a separate page. The chunks appear in the output in order from left to right, then top to bottom. The wafer and stub labels appear as needed. The COLLAPSE option allows chunks of broken up wafers to be printed on the same page. No more than 2 chunks are printed per page. The default is not to collapse chunks into a single page.

homecontents start chapter top of pagebottom of pagenext page index

Header Formatting

HEADERWIDTH = n
HEADERINDENTATION = n
HEADERDIVIDER = 'character'
NODIVIDERS
NOHEADERCENTER
The following options alter the default formatting of the column header.

HEADERWIDTH Specifies the width to be used in printing each column of the table. This width does not include the column divider. The default header width is 10 print positions.
If HEADERWIDTH is set too small, there may be insufficient room to print a number in the table, in which case the field is filled with Xs. Also, if HEADERWIDTH is too small, there may not be enough room for the header labels to be printed and they may be severely segmented. Conversely, if HEADERWIDTH is set too large, the table header may contain too many print positions for one page. When this occurs, the page is broken into chunks. Chunks are printed on successive pages left to right, top to bottom.

HEADERINDENTATION Specifies the number of print lines for each level of the header nesting. The default is 3 and it cannot be set smaller.

HEADERDIVIDER Specifies the character used as the column divider. The default is a vertical bar (|). This character is used for both the label area of the header as well as between the columns of data.

NODIVIDERS Sets the divider character between columns of data to blank. It does not affect the label area of the header.

NOHEADERCENTER Left justifies header labels in a column. By default header labels are centred in the column.

homecontents start chapter top of pagebottom of pagenext page index

Stub Formatting

STUBTITLE = option 'string'
STUBWIDTH = n
STUBINDENTATION = n
STUBCONTINUATION = n
STUBHEIGHT = n
STUBFILLER = 'character'
STUBDIVIDER = 'character'
The following options alter the default formatting of the stub:

STUBTITLE Specifies a title which appears in the boxed in area to the left of the header and directly above the stub labels. The options allowed are CENTER, LEFT and RIGHT and control the justification of the stub label. If the option is omitted, it is assumed to be CENTER. If the stub title is too long, it is automatically segmented over two or more lines. There is no default stub title.

STUBWIDTH Specifies the number of print positions for stub labels. The default is 20 print positions.
If "too much" stub label segmentation occurs, TABULATE aborts with a WAFER/STUB OUTPUT FORMATTING ERROR. The solution is to increase the STUBWIDTH.

STUBINDENTATION Specifies the indentation for each level of nesting. The default is 2 print positions.

STUBCONTINUATION Specifies the indentation for continuation lines. A label which exceeds the stub width is segmented into two or more lines. Continuation lines are indented by the parameter. The default indentation is 3 print positions.

STUBHEIGHT Specifies the number of print lines used for each stub label. The default is 1 line. Label segmentation works independently from the STUBHEIGHT parameter and, therefore, some stub labels may occupy more than STUBHEIGHT lines.

STUBFILLER Stub labels (on lines which contain data) are filled with the character to the full width of the stub. The default stub filler is a period (.).

STUBDIVIDER Specifies the horizontal divider character. The default is the dash (-).

Stub Formatting

TABULATE STUB = (AGE BY SEX) HEADER = (REGION.T)
         STUBTITLE = 'Age by Sex'
         STUBWIDTH = 20
Age by Sex TOTAL North South
Young
Male............
Female..........
Old
Male............
Female..........

5
5

8
3

3
2

6
1

2
3

2
2

TABULATE STUB = (AGE BY SEX) HEADER = (REGION.T
         STUBFILLER = ' '

TOTAL North South
Young
Male
Female
Old
Male
Female

5
5

8
3

3
2

6
1

2
3

2
2

homecontents start chapter top of pagebottom of pagenext page index

Wafer Formatting

WAFERTITLE = option 'string'
NOWAFERTITLE
WAFERCENTER
WAFERSTART = n
WAFERINDENTATION = n
WAFERCONTINUATION = n
WAFERHEIGHT = n
When a tabulate statement contains a wafer expression, each wafer produced contains a wafer label in the upper left corner below the wafer title. If there is no wafer expression in the tabulate statement, a wafer label is not printed.

Wafer labels are formed in the same way that header and stub labels are produced.

WAFERTITLE Specifies a wafer title printed above each wafer. By default, the tabulate statement is printed as the wafer title, centred and one line above the wafer (table). The option may be CENTER, LEFT or RIGHT to position the wafer title over the wafer.

NOWAFERTITLE Suppresses printing of the wafer title.

WAFERCENTER Centres the wafer on the page. By default wafers are left justified on the page.

WAFERSTART Specifies the starting print position for the wafer. Does not apply when WAFERCENTER is specified.

WAFERINDENTATION Specifies the indentation used for each level of nesting in the wafer title. The default is 2 print positions.

WAFERCONTINUATION Specifies the indentation for continuation lines when a wafer label is broken up into multiple lines. The default is 3 print positions.

WAFERHEIGHT Specifies the number of print lines for the wafer label. The default is 1. This height is automatically adjusted if a label does not fit in the width of the wafer title and has to be segmented.
The number of print positions available for printing the wafer label is the same as the width of the stub titles. If the wafer title does not fit, TABULATE reports a WAFER/STUB OUTPUT FORMATTING ERROR. The solution is to increase the size of STUBWIDTH.

Wafers

The following table would consist of four wafers labeled as follows:

TABULATE  WAFER = (AGE BY REGION) STUB = (SEX) HEADER = (RACE)
         Under 21
          North.......

         Under 21
          South.......

         21 and Over
           North.......

         21 and Over
           South.......

homecontents start chapter top of pagebottom of pagenext page index

Other Options

HTML 
NOBOTTOMBORDER 
NOLEFTBORDER 
NORIGHTBORDER 
NOTES     = 'string'
FOOTNOTES = 'string' 
WAFERNOTES = 'string' 
NOZEROS 
ZEROCHARACTER = 'character' 
NOEMPTYROWS 

DELETEMISSING 
UPPERCASE 
SPANNERS 
DUMMY   = variable 'label' ...
LANGUAGE 

HTML

Specifies that the output file contains data in html format for viewing by a browser or other package which expects this format.

When output is in html format, then the print formatting, page formatting and many of the general output control options can be specified but have no effect. There is no control over the detailed appearance of the table. The general shape of the table is dictated by the table expressions and the specifics of the table appearance depend on the software used to view and/or print the resulting html.

Border Generation

By default all borders are generated. The NOBOTTOMBORDER, NOLEFTBORDER and NORIGHTBORDER options alter the production of the left, right and bottom borders of each wafer. They can be used in conjunction with the HEADERDIVIDER and STUBDIVIDER.

Footnotes

There are no default footnotes. NOTES specifies text printed at the end of the Table of Contents which is produced if there are multiple wafers or a NOTES option specified. FOOTNOTES specifies text printed on a new page following all wafers. WAFERNOTES specifies text printed at the end of each wafer or chunk of a wafer.

Zero Printing

By default zeros are printed as a 0 (or as specified in the PRINTFORMAT statement). Specify NOZEROS to print zeros as dashes (-). Specify ZEROCHARACTER to print another character for zero cells.

Suppression of Empty Rows and Wafers

An empty row or wafer is one that contains zero in all cells. By default empty rows are printed. In a table with a single wafer or with no wafer expression at all, the table is printed whether it is empty or not. If multiple wafers are produced, empty wafers are automatically deleted. The deletion is noted in the table of contents. Empty columns cannot be deleted. NOEMPTYROWS suppresses the printing of empty rows.

Exclusion of Records with Missing Values

Specify DELETEMISSING to exclude records where any of the cross tabulation variables have missing or undefined values in a variable.

Converting Text to Uppercase

Specify UPPERCASE to convert text to uppercase. By default, both upper and lower case are used.

Spanner Labels

Specify the SPANNERS option to produce spanner labels for all control variables. Specify spanner labels on a per variable basis with the .SPAN modifier. When the SPANNERS option is in effect, suppress spanner labels for an individual variable with the .NOSPAN modifier.

Dummy Spanner Labels

Specify the DUMMY clause to create label information that is not related to any specific variable. The DUMMY clause creates a dummy variable whose sole purpose is to carry label information into the table. Dummy spanner labels can be used to span concatenated expressions.

Example: Dummy Spanner Labels

TABULATE HEADER = (J BY (SEX THEN REGION))
         DUMMY  = J 'Sex and region Data'
Sex and region Data
Male Female North South
Dummy.......... 13 8 12 9

Language

If a language is specified, the effect is to produce translated versions of the following words or phrases:
SIR/XS Tabulate Procedure, Footnotes, Notes,Page, empty/deleted, Table of Contents, cont'd, of, Wafer, and, Chunk.
Recognised language keywords are HEBREW, DUTCH and GERMAN.

String Specifications

Whenever an option includes the specification of strings or labels, a string can be specified in multiple segments using the following format:

'segment' option 'segment'
Option can be a plus (+), blank or minus(-); the different characters specify where strings can be broken across lines:

Plus (+) simply concatenates two segments; blank forces a line break between the two segments; minus (-) specifies optional segmentation as per the following rules.

Label Segmentation

If a label does not fit, it is broken into segments by splitting the text at an appropriate breakpoint. The breakpoint is chosen by searching the label from right to left looking for one of the following conditions taken in sequence:

If no breakpoint is found, the label is truncated at a point which allows a hyphen to be printed.

After the first line segment has been formed, all the remaining characters of the original label to the right of the break character are considered next. If the remaining segment fits in the allocated space, it is printed, otherwise, the first n characters of this portion of the label (where n is the space available) are also scanned from right to left and segmented as described above. This process continues until the entire label has been printed.

homecontents start chapter top of pagebottom of pagenext page index

Error Processing

There are two stages at which errors in the TABULATE command are detected, during compilation and during execution. Errors detected during compilation cause error messages to be displayed and the VisualPQL program is not executed. Errors detected during execution cause the program to terminate.

If during compilation, a CELL CONTENTS ERROR error occurs, check that:

The table is formatted after the program is run, and some problems are diagnosed at this stage. The error WAFER/STUB OUTPUT FORMATTING ERROR implies that there is a problem inserting the wafer and/or stub labels within their allocated space in each wafer. Usually there are too many levels of nesting, within the wafer or stub, for the width of the STUBWIDTH.

If there is not enough room on a page to print an entire row or column, TABULATE breaks the wafer into + "chunks". If a BY clause has been specified, all of the rows or columns beneath the highest level in a nesting must fit on a single page, otherwise a CHUNK FORMATTING ERROR is reported. This implies that a wafer is too large to print on a single page and that it cannot be broken at any 'clean' place to produce multiple 'chunks' of output. A wafer can be broken into chunks only along the highest variable nested within the stub or header. The solutions are to:

homecontents start chapter top of pagebottom of pagenext page index