Release notes for StatCrunch mid-march 2015 update A major StatCrunch update was made on March 18, 2015. This document describes the content of the update including major additions to StatCrunch that were a part of the update as well other minor fixes and enhancements. Major additions: Markers and Dividers are new tools for adding and extracting more information from StatCrunch graphs. See page 2 for details. New options for editing graph characteristics have been made available. See page 3 for more information. Data sorting and other data manipulation procedures can now produce results in a new browser tab. See page 4 for details. StatCrunch now allows you to easily create a new data set from a subset of an existing data set. See page 5 for details. Multiple expressions can now be easily computed at one time. See page 6 for more information. Data from the popular Google Drive cloud platform can now be loaded directly into StatCrunch. See page 7 for details. Minor fixes and enhancements: For the frequency table procedure, a new cumulative percent of total option has been added to the list of statistics that can be computed. The QQ plot dialog now offers the ability to place the normal quantiles on the y axis. Polynomial regression now provides an option to graph Residuals vs. X values. A Safari issue when doing simple linear regression with transformations has been fixed. The Sampling Distributions applet has been corrected to allow for the proper upper limit when doing a uniform distribution. Three new functions are now available for use with StatCrunch expressions: shuffle(x) shuffles a vector of values and returns a new vector sample(x,n,flag) returns a vector of n values sampled from the vector x. Sampling is done with replacement if flag is true. elementat(x,n) returns the value in the nth position in a vector x Next planned release: A minor release is anticipated in mid April 2015. Plans for this release include some small additions to applets and calculators along with color schemes enhancements. 1
Markers and Dividers Markers and dividers are tools that allow users to overlay and extract more information from graphs. A marker is represented by a triangle anchor on a graph s axis with a line that extends from it. The location of the anchor can be calculated based on characteristics of the data (e.g. the mean or the median of the data) or it can be set to a specific numeric value. To use data characteristics, an appropriate expression must be entered where x is used to generically represent the data (e.g. mean(x) and/or median(x) ). The color of the anchor and its corresponding line can also be set by the user using a basic color chooser. An unlimited number of custom markers can be added to a graph with the mean and median provided as default options. Dividers are a tool that allow data values in a graph to be split into user specified regions with the count/percent of the data in each region displayed. The regions are defined using two square buttons positioned at the top of the graph with lines that extend below them. The two split points can be set explicitly or interactively by the user by either double clicking on a divider button or by dragging the button to a specific location. The percentage falling in a particular region can also be specified using a double click on the associated label which will lead to a repositioning of the divider. Dividers are designed to work in unison with markers. If a divider button is dragged in the vicinity of a marker location, the value of the divider will snap to the value of the marker. These new tools are considered to be most useful in plots that display distributions in some manner so they have been added as dialog options for histograms, dot plots, boxplots and scatter plots. The example below shows how the two tools can be used together in the context of a histogram to illustrate the empirical rule associated with normally distributed data. The options for markers and dividers are shown in the histogram dialog. Using the Add Custom button, two custom purple markers have been specified at the mean plus/minus one standard deviation. Dividers showing percentages have also been added to the graph. The snap feature has been used to position the dividers at the custom marker locations. The resulting percentages show that approximately 68% of the data falls within one standard deviation of the mean. 2
New options for editing graph characteristics StatCrunch has long offered the ability to change axis properties in graphics by double clicking on the appropriate region. This somewhat hidden capability has been given a great deal more visibility through the addition of a small icon shown as three parallel lines in the bottom left corner of StatCrunch graphs. (Note this icon is only available on non mobile browsers. On mobile devices, the same options discussed below are available when a user taps a graph one time.) Clicking this icon leads to a menu of options for changing properties of the x axis, y axis and title for the graph as well options for adding dividers to the graph when appropriate. While all StatCrunch graph dialogs allow for the specification of axis labels and titles, these labels are applied to all graphs, which may not be ideal when more than one graph is created. The new menu of options allows the user to override these specifications for specific graphs in the situation of multiple graphs. The same is also true of dividers in that users can choose to apply dividers to only a single graph when more than one graph is created. When editing axis properties for a quantitative axis, the user can change the range of the axis by adjusting the minimum and maximum values. These adjustments allow the user to zoom in if a more compact range is specified or zoom out if a wider range is specified. The user may also override the default tick marks by providing a comma separated list of values. The user may at any time return to the default values for an axis by clicking the Restore defaults above button. With this release, one may now also add horizontal lines crossing the y axis at specific points or vertical lines crossing the x axis at specific points. In each case, the user simply supplies a comma separated list of locations along with a size and a color for the lines. Dividers will also snap to the values of lines when the two are applied to the same axis. The histogram example below shows how to use the new menu of graph options to add horizontal red lines crossing the y axis at the values of 400 and 600. 3
Make a new data set from a subset of an existing one StatCrunch has traditionally allowed users to filter the data used in particular procedures using a Where expression. Repeatedly applying the same expression to analyze a subset of larger data set can become somewhat tiresome. The new Data > Arrange > Subset procedure selects a user specified subset of rows/columns in the data table and stores that data in newly created columns in the existing data table or in a new data table. In the example below using the Most Populous U.S. Cities data set, a new data has been created containing all of the original data columns for the four U.S. cities that grew by at least 10% between 2010 and 2013 and had a population of over 200,000 in 2013. 4
Open sorted data in a new data table StatCrunch has historically added sorted data columns to the existing data table so that row associations in any existing graphics will be clearly preserved. This has created a less than ideal sorting environment in situations where the user sorts a large number of columns. To improve on this functionality, the Open in a new data table option has been added to the data sorting procedure. This option is available with a number of procedures used to manipulate data including: Data > Sort Data > Recode Data > Sample Data > Simulate Data > Arrange > Split Data > Arrange > Stack Data > Arrange > Slice Data > Arrange > Combine Data > Arrange > Transpose In the example below, all columns in the Most Populous U.S. Cities data set have been sorted by the percentage change in city s population from 2010 to 2013 with the results opened in a new data table. Note that the new data set is not automatically saved to a user s My Data listing. Use the Data > Save menu option to save the sorted data set to this listing. 5
Multiple expressions can now be computed at one time Traditionally, StatCrunch has allowed users to compute expressions in a one at a time fashion using the Data > Compute Expression menu. This menu has now been broken out into a submenu with the original Expression option and a newly created Multiple Expressions option. The new options provides a way for users to generate and save the results of multiple expressions all in one dialog window. Users may enter a sequence of name/expression pairs as shown in the dialog below before creating data from a regression model. When the corresponding Save option is checked, the results of an expression will be saved to the data table in a new column with the corresponding name. Note that in the example below, the results of the error expression are not saved to the data table. Expressions can be deleted or inserted within the listing using the + and x buttons. Expressions lower in the listing may refer by name to the results of expressions defined higher in the listing. In the example below, the expression for y refers to the values of x and error defined above it. Names are case sensitive and should be enclosed in double quotes if they contain spaces or special characters, or if they are entirely numeric. 6
Load data from Google Drive StatCrunch now allows users to import data files from a Google Drive account. After the Google Drive option is selected from a user s My StatCrunch or My Data listing, the resulting page below shows the layout of options for loading data from a Google Drive account. First, click on the button labeled Select a file from Google Drive. If the user is not signed in to a Google account, they will be prompted to do so in a new window. The user then must authorize StatCrunch access to their Google Drive files. Once the authorization process is completed, a Google file chooser will appear as shown below. The file chooser has two tabs titled Spreadsheets and Google Drive. The Spreadsheets tab provides a listing of Excel files and Google Spreadsheets. The Google Drive tab lists other types of files including.txt and.csv files. After selecting a file, the user may specify other inputs such as the delimiter (required for text files only) separating data values in the file and the Use first line as column names option. The selected file can then be loaded into StatCrunch by clicking the Load File button at the bottom of the page. 7