TWO-VARIABLE DATA
Data sets are limited to arrays of at most twenty
columns and 300 rows. The Rows and Cols values in the
main menu serves to show how much data is currently on
display. These indices can be reset to define the size
of a new array, or to limit the display to the first two
or three columns of a large array. (Even if only two
columns are on display, the other columns are not
forgotten, however.) Any data set can be Edited; in
particular, a new data set can be created by editing an
array of Zeros. When in editing mode, move the cursor
with the arrow keys. To provide input, just start
typing the numerical data; the program will go into
input mode. Press RETURN after each entry; the cursor
automatically jumps to the next entry. The default mode
is row-by-row entry; pressing Ctrl-I will switch to
column-by-column entry. In addition to data entry, you
can also Swap columns, and Insert and Delete rows when
in editing mode. In the event that data has been input
in transposed form, requesting Transpose will right it.
Press Escape when the required editing is done; or
press U to Undo all the edits and recover the original
data set. An edited data set is NOT automatically saved
to disk, not even if the source file resides there; you
must use Ctrl-S.
If an array does not fit on the screen, the arrow keys
will bring parts into view; PgUp and PgDn move a screen
at a time. The display always warns of hidden entries.
When a data set is saved, it is given a file name; you
can also Label the columns of each array. Each heading
is limited to at most twelve characters. When the
column index is changed, the current heading is put on
display. The default headings are "Column #". The
headings are stored with the data.
Column indices are not changed by the usual input-
return process. Instead, the C keypress simply advances
the display (cyclically) until the desired value is
reached.
You can obtain scatter Plots for any data sets. Each of
the coordinate axes has a variable assigned to it; these
assignments are designated as Hor and Ver, and can be
changed at any time. Each column has an associated
icon (a box for column 2, a plus sign for column 3, and
so on); these are not adjustable, and they repeat after
column 11. The program uses the icon associated with
the Ver variable to draw the scatter plot. To obtain
simultaneous scatter plots, see the discussion for Ctrl-
A and Ctrl-W below.
Press Data to refresh the numerical display.
There are two other 2-variable menus: Fit Equations and
Transform Data, described next.
The Fit Equations menu is where you try to discover
algebraic relationships between the variables. You can
superimpose the graph of any equation of the form Y=f(X)
on the current scatter plot; request Other and enter the
desired formula for f(X). Most likely you want to fit
a straight line to the data. This can be done in two
automatic ways: the Median-Median line and the Least-
Squares (regression) line. The resulting equation is
put on display at the top of the screen after the line
is plotted. In the least-squares case, the correlation
coefficient is also displayed (along with the slope and
the Y-intercept). In the median-median case, you can
request Summary to see the three summary points and
their coordinates. There are five other automatic
curve-fitting modes, besides the linear Type. They are
semilog, log-log, and power, quadratic, and exponential.
In the first case, the Y data is transformed (by
applying the logarithm function Ln to it) before the
linear method (median-median or least-squares) is
applied; in the second and third cases, logarithms of
both X and Y are calculated. The power fit is really a
special case of the log-log fit, in which the parameters
of y = k*x^m are displayed directly. The exponential
fit is a redundant version of the semilog fit. The
quadratic least-squares fit finds the quadratic function
that minimizes the sum of the squares of the residuals.
The median-median quadratic fit just passes a quadratic
function through the three summary points.
Whatever the fitted equation, you will want to see the
errors of approximation (the differences between the Y
data and the Y fit). These Residuals may be added to
the data set, by assigning them to a particular Column.
When R is pressed, the new data set appears on the
screen; the column heading is automatically set to
"Residuals". (You can always refresh either the Plot or
the Data, by the way, when one or the other disappears
from view.) You can also calculate a predicted Y-value
based on an arbitrary X-value, or vice-versa. The
second problem is subject to difficulty, for you are
searching for X-values that might not exist, or that
might exist outside the displayed domain, or that might
exist in profusion. In any event, it may be necessary
to Escape the search process if it does not seem to be
getting anywhere.
Pressing either Plot or Data brings the requested item
into view. (For instance, you might want to erase an
undesirable fitted curve, or you might want to take a
look at the data without returning to the main menu.)
In the former case, it is helpful that the Horizontal
and Vertical variables are visible in the menu, and are
changeable (in case you want to display a different
plot, for example).
The Transform Data menu allows you to apply arbitrary
functions to data columns. Press F to input a function.
The variables X, Y, and Z can be assigned to any of the
columns. For instance, you can find the difference
between columns 2 and 3 by setting F = X-Y and assigning
X and Y to columns 2 and 3. You may also assign an
Offset to each variable, which internally assigns a
vertical shift to the assigned column. For example,
assign both X and Y to column 2, but assign Y an offset
of -1; now the function X-Y tables the difference
between each entry of column 2 and the entry just above
it (entries above or below the matrix are assumed to be
zeros). In any case, you must specify which column is
to serve as the Destination. Finally, Go executes the
transformation; the destination column is relabelled
"Transf". An incidental feature is that the variable I
is interpreted as the row number of any entry, so that
you can create an indexed column by applying the
function F = I, for example.
Before an unsaved data set can be discarded (by leaving
the program or by requesting Old, for example), the
program will ask for permission to discard. A response
of "No" leaves things as they are, and the data can then
be saved.
Each Plot request automatically sizes the window so that
the data points just fit, which makes it difficult to
produce simultaneous scatter plots. Two special keys
are helpful here: When the Ctrl-Accumulator switch is
on, every Plot request resizes the window outward only.
In other words, all data sets plotted since the Ctrl-A
switch was turned on will fit in the most recent window.
Simultaneous scatter plots are still impossible,
however, if the program continues to refresh the screen
for each plot, so you can turn off the automatic refresh
with Ctrl-W (once the window is large enough for all the
data), and the program will only plot points. The
multiple-scatter-plot procedure: Turn Ctrl-A on, request
all the individual scatter plots, turn Ctrl-W off after
the last one, then replot the earlier ones on the fixed
axes. This may involve retrieving some Old files, by
the way. If different icons are desired for the final
version, this must be planned before the Ctrl-W switch
is turned off. Note the following:
The screen-refresh switch is active only in the main 2-
variable menu, and only when there is a plot on the
screen. When it is off, many of the other menu items
are also disabled (anything that involves putting data
back on the screen: Editing, Transforming data, Help,
etc).
Special Ctrl-Keys:
Ctrl-Accumulator is a switch. When it is on, each plot
request can only enlarge the previous graphing window.
Ctrl-Dot Density alters the speed of curve-plotting, by
determining how many points to be calculated. A lower
density speeds things up, of course.
Ctrl-Format is for specifiying the appearance of decimal
output. Here you choose the total width of decimal
output and also the number of decimal places when the
decimal point is fixed in position (Alt-F switches
between floating and fixed mode). For data displays,
you can Skip a line between rows.
Ctrl-Overlay allows you to superimpose text on scatter
plots. Use the arrow keys to move the cursor to the
desired location, press T to edit the text, then press W
to center the text at the cursor position. This text is
part of the data set, and is saved with the figure.
When the scatter plot is refreshed, the text can be put
back on the diagram by requesting All in the Overlay
menu. To make the graphics text larger, press Alt-B.
Use Ctrl-Print for hard copy. When an array is on the
screen, the program will print it directly; if a plot is
displayed, however, you must look at the print menu,
just in case the target printer (Device) needs to be
identified. The default is the standard dot matrix, in
which case simply pressing P should produce the desired
result.
Ctrl-Save data sets to disk files. In addition to the
two dimensions and the entries themselves, the program
also stores the two format specifications (fieldwidth
and decimal places). The column headings are saved to a
file as well.
N.B. The program only saves the data that is on
display; if Rows and Columns have been set so as to
display only PART of a larger data set, the remainder
will probably be lost.
Ctrl-W is active only in the Main 2-Variable Menu, and
only if there is a plot on the screen. It disables
further screen refreshes until Ctrl-W is pressed again.
ONE-VARIABLE DATA
As above, data is arranged in rectangular arrays, but
now the columns are of no significance. (Only a single
column label is ever on display.) A data file can have
at most 6000 = 300*20 items in it.
Given a data set, you usually want to see a Histogram.
First the data must be organized into a specified number
of Groups, each with a specified Width. Setting either
value causes the other to be adjusted. The only values
allowed for Groups are 2..100. You must also specify
the Minimum start value for the lowest group.
Instead of the graphical Histogram, you may wish to see
the underlying Frequency table.
Another description of the data is by Quantiles, whose
number is set by the Groups value. For instance,
requesting Quantiles when Groups=4 produces quartiles,
or centiles when Groups=100. The second quartile and
the fiftieth centile are the same as the median, of
course.
The Box and Whisker plot is a graphical method for
seeing the data divided by quartiles.
Requesting Statistics displays a list of standard data.
In addition to the number of items, you will see the
quartiles, min and max values, range, midrange, mean,
standard deviation, and mean deviation.
For purposes of comparison, you can Overlay a Normal
distribution on the histogram; the overlay has the same
mean, standard deviation and total weight (area).
Use Ctrl-P to print the screen graphics. Press P for a
numerical printout of the frequency table and quantiles
for the specified number of groups.
SIMULATION
Rolling Dice: You can change the Number of faces and
the number of dice Rolled with each toss. You can also
select a Statistic to be tallied - sum of dice, low
value, high value, number of different values, and
product of values (if the number of dice rolled is too
high, this statistic will produce meaningless data,
however). As the trials proceed, the value of the
statistic is kept up to date, as well as its range (low
and high) and average. The tabulations are reset to
zero if the dice data is changed (number of faces or
number of dice). You can proceed One trial at a time,
in which case the results are displayed in the right
window, or else you can press M for many trials, in
which case you must stop the trials with Escape. While
the trials are running, pressing S will switch between
an active display and a silent display; the latter runs
more rapidly, of course. You can ask for a specific
number of Trials. The Frequency table displays how many
times each value of the statistic have occurred. If you
press W, you will put the program into one of two
Waiting modes. When One turn is requested now, the
program runs as many trials as it takes for the
statistic (sum, product, etc) to achieve a displayed
objective - either matching a given Value (mode 1) or
attaining all possible values (mode 2). It is possible
to set an improbable (or impossible) goal, in which case
Escape will be needed to halt the search. The variable
that is now displayed (low, average, high) is the
Waiting Time - how many trials are needed to accomplish
the goal. Press W to change modes or to return to the
non-waiting mode.
The data produced by simulation can be put on file and
examined using the data analysis subprograms. Press
Alt-V to list the variables you wish to save; each is
specified by a single letter: A for average, L for low,
H for high, R for number of dice per Roll, N or F for
the Number of Faces, T for the number of Trials, etc
(other simulations augment this list). At any time, you
can enter a data point (which is actually an n-tuple of
values) by pressing Alt-New. Begin a new list of data
points by pressing Alt-B. Save the entire list (as a
matrix in which each column represents a variable) by
pressing Alt-S and providing a file name.
You can play with doctored dice: Press Ctrl-F to bring
the Funny dice into play (or to switch them back out).
You will also have to press Ctrl-A to Alter the pips on
the faces; initially, they are standard - one pip on
face number 1, two pips on face number 2, etc.
Dealing Cards: This is essentially the same as the
rolling dice simulation, EXCEPT that the cards are not
independent of one another (are not replaced in the
deck) the way dice are. In other words, each trial
consists of removing a specified number of cards from
the deck (before the next trial, they are of course put
back). Dealing just 1 card per trial is equivalent to
rolling one die. For data collection, variable D
replaces R.
You can play with doctored cards: Press Ctrl-F to bring
the Funny deck into play (or to switch it back out).
You will also have to press Ctrl-A to Alter the values
on the cards; initially, they are standard - card number
1 is an ace, card number 2 is a deuce, etc.
Throwing Darts: The principal parameters in this menu
are the Radius of the bullseye and the Side length of
the containing square (R and S in the data collection
list). A single trial consists of a random throw at the
square; a success (hit) occurs when the throw lies
inside the circular target; the statistic has two
values, 1 for hit and 0 for miss; the average over many
trials is of course just the percentage of throws that
are hits. The throws appear as dots on the screen. In
the Waiting mode, you are waiting for a hit.
Pitching Pennies: The principal parameters are the coin
Radius and the Side length of a single square cell (R
and S in the data collection list). A trial is
classified as a hit if the coin lands entirely within a
square, a miss otherwise; the statistic takes on only
the values 0 and 1, as in throwing darts. When the
trials are examined One at a time, they are displayed as
circles on the screen. In the Waiting mode, you are
waiting for a hit.
PROBABILITY
When a frequency distribution is plotted, only X-values
for which the probability is large enough to be seen are
actually plotted. This is why the displayed MinX and
MaxX values do not always correspond to the tabled High
and Low values for X. The graphing scale is chosen so
that the modal frequency will fill the available space.
Some of the routines have difficulty when large
parameter values are input. This is usually because the
program is being asked to deal with quantities that are
very large and quantities that are very small in the
same routine. If the total probability for a given
distribution does not add up to 100%, that is a clear
signal that the data is suspect.
Binomial Menu: The parameters are N = number of trials
and p = probability of success on an individual trial.
The variable X is the total number of successes.
Hypergeometric Menu: We are sampling from a two-stratum
population, whose types are called Red and Blue. The
parameters are R = number of reds, B = number of blues,
and S = size of the sample. The variable X is the
number of reds found in the sample.
Dice Sum Menu: We toss N dice, each with faces numbered
1..k (equally likely, of course). X is the sum of the N
values. When N=1, the distribution of X is uniform;
when N=2, it is triangular. As N increases, the distri-
bution approaches normality.
Normal Menu: Given a normal variable X, the probability
that X lies between Lo and Hi is calculated. The Mean
and the Sigma (standard deviation) of the distribution
are adjustable parameters.
Dice Match Menu: We toss N dice, each with faces
numbered 1..k (equally likely). Let X be the number of
different values among the N obtained. The probability
of finding a repetition is displayed in the menu. This
routine runs into accuracy problems when N and k get
large. K=365 corresponds to the classical birthday
problem.
Card Match Menu: A deck of N cards is shuffled. Let X
be the number of cards that are found in their starting
positions. The expected value of X is 1, regardless of
the deck size. The probability that X=0 is virtually
the same (1/e), once the deck size is ten or more.
First Ace Menu: A deck of N cards that contains A aces
is shuffled and dealt; let X be the position of the
first ace in the deal.
First Binomial Success Menu: A binomial experiment is
repeated until the first success occurs; let X be the
numbers of trials necessary. The probability of an
individual success is of course p. The expected value
of X is 1/p.
Complete Set Menu: There are k equally likely different
prizes available, one in each box of cereal. Let X be
the number of boxes that must be bought, in order to
obtain at least one of each prize; the possible values
of X are k, k+1, ... . In other words, we roll a k-
sided die until every face has turned up at least once;
X is the number of rolls required.
First Dice Match Menu: A k-sided die is rolled until
some face has occurred for the second time. Let X be
the number of rolls necessary for this to happen; the
possible values of X are 2 ... k+1. If k=365, we have
another birthday variable.
There are a couple of Miscellaneous menus, which do not
display probability distributions, but do give answers
to probability questions:
If a k-sided die is rolled X times (k <= X), what is the
probability that every face will have appeared at least
once? This is the cumulative distribution function for
the probability function described above.
If a k-sided die is rolled X times, what is the
probability that some face will appear more than once (M
times)? Answers given for M=2 and M=3. Accuracy
problems occur in extreme examples.
The following special Ctrl-Keys apply to the current
probability distribution; if the example has not yet
been calculated, there is a pause while this is done
first:
Ctrl-D: displays a bar graph on the screen.
Ctrl-T: displays the table of values on the screen.
Ctrl-H: prints the table of values.
GENERAL INFORMATION
PEANUT software should run on all IBM compatibles. It
is only necessary that the appropriate graphics
interface file be present. If these programs are
copied, it is therefore important that the appropriate
file *.BGI be copied, too. It is advisable but not
necessary that the *.BGI files be in the same directory
as the program file *.EXE; if the program can not find
the desired *.BGI file there, it will search the root
directory and the parent directory before giving up.
The programs automatically try to select the finest
graphics mode; to override the default selection, press
Ctrl-G (this will be necessary with the ATT 6300, for
example). The programs are compiled with version 7.0 of
Borland Pascal. If the host computer has a numeric
coprocessor (in other words, an 8087 chip), these
programs will try to take advantage of it. All of the
programs have associated documentation files *.DOC; you
are reading part of one now! These files can be edited
or printed with your word processor.
Interaction with the computer takes two forms: Either
the user is making menu selections or else the user is
providing buffered input (that is, numbers or names).
In the former case, no ENTER is required - touching a
single key (perhaps in combination with the Ctrl key)
does the job. In the latter case, however, the computer
has to be told when the input is complete, and this
requires ENTER as a signal. When the computer is
waiting for this type of input, a box will open up on
the screen, into which the necessary information is to
be typed. One may edit the data in the box, using the
left and right arrow keys to move the cursor. If the
first keypress of an editing session is not an editing
keypress (an arrow, say), the input box is emptied.
There are a few standard two-key combinations. For
example, Ctrl-E erases the graphics window, Ctrl-P is
for printing, Ctrl-W gets the window reset menu, Ctrl-F
gets function library menus, and Ctrl-END ends programs.
Other Ctrl-keys are described below. Alt-C allows the
user to assign new values to the twenty-six variables
A..Z. Pressing the desired letter displays the current
value of that letter, and pressing = activates the input
process. Alt-F toggles between fixed point and floating
point display formats (see below). Alt-M calls up a
list of memory data. In each program, Ctrl-K calls up a
menu of active Ctrl-Key combinations, and Alt-K calls up
a menu of active Alt-Key combinations. These keys are
usually not mentioned elsewhere in the menus.
Whenever the program is in a scrolling mode (the arrow
keypad used to examine a text or a table), one can
request a search by pressing ENTER. The program finds
the first instance of the string you enter, and places
it in the window, usually on the top display line. The
search is not case-sensitive. For example, to scroll
through THIS file, together with a program-specific help
file, just press ?. The necessary *.DOC files must be
found in the current directory.
The function interpreter built into the programs has
been taught to understand most elementary function names
(sin, cos, tan, csc, sec, cot, ln, log, exp, sinh, cosh,
tanh, arcsin, arccos, arctan, int, sqr = square root,
abs, and !) as well as some unconventional ones:
root(n,x) = nth root of x; pow(n,x) = nth power of x;
iter(n,f(x)) = n-fold iteration of f(x); max(a,b,..);
min(a,b,..); sgn(x) = x/abs(x); frac(x) = x-int(x);
binom(n,r) = n!/r!/(n-r)!; join(f|c,g|d,...,h) =
function defined by y=f(x) for x<=c, y=g(x) for c
Return to The Skeptic Tank's main Index page.
The views and opinions stated within this web page are those of the
author or authors which wrote them and may not reflect the views and
opinions of the ISP or account user which hosts the web page. The
opinions may or may not be those of the Chairman of The Skeptic Tank.