Package ScottKnottESD

Similar documents
Package pwrab. R topics documented: June 6, Type Package Title Power Analysis for AB Testing Version 0.1.0

Package balance. October 12, 2018

Package kirby21.base

Package strat. November 23, 2016

Package datasets.load

Package cattonum. R topics documented: May 2, Type Package Version Title Encode Categorical Features

Package rpst. June 6, 2017

Package MOST. November 9, 2017

Package understandbpmn

Package fusionclust. September 19, 2017

Package ClusterBootstrap

Package GFD. January 4, 2018

Package PCADSC. April 19, 2017

Package ECctmc. May 1, 2018

Package TVsMiss. April 5, 2018

Package nonlinearicp

Package ConvergenceClubs

Package vinereg. August 10, 2018

Package TANDEM. R topics documented: June 15, Type Package

Package frequencyconnectedness

Package PedCNV. February 19, 2015

Package quickreg. R topics documented:

Package mdftracks. February 6, 2017

Package weco. May 4, 2018

Package bisect. April 16, 2018

Package rankdist. April 8, 2018

Package nlnet. April 8, 2018

Package svalues. July 15, 2018

Package milr. June 8, 2017

Package TrafficBDE. March 1, 2018

Package spark. July 21, 2017

Package fastdummies. January 8, 2018

Package PDN. November 4, 2017

Package simr. April 30, 2018

Package gtrendsr. October 19, 2017

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Package bigreadr. R topics documented: August 13, Version Date Title Read Large Text Files

Package CuCubes. December 9, 2016

Package calpassapi. August 25, 2018

EECS730: Introduction to Bioinformatics

Package PTE. October 10, 2017

Package ArrayBin. February 19, 2015

Package sfc. August 29, 2016

Package IATScore. January 10, 2018

Package SEMrushR. November 3, 2018

Product Catalog. AcaStat. Software

Package robets. March 6, Type Package

Package omu. August 2, 2018

Package ClustGeo. R topics documented: July 14, Type Package

Package intccr. September 12, 2017

Package optimus. March 24, 2017

Package robustreg. R topics documented: April 27, Version Date Title Robust Regression Functions

Package BASiNET. October 2, 2018

Package clusternomics

Package repec. August 31, 2018

Package MonteCarlo. April 23, 2017

Package condir. R topics documented: February 15, 2017

Package EFS. R topics documented:

Package splithalf. March 17, 2018

Package tidyimpute. March 5, 2018

Package SFtools. June 28, 2017

Unsupervised Learning and Clustering

Package dbx. July 5, 2018

Package gggenes. R topics documented: November 7, Title Draw Gene Arrow Maps in 'ggplot2' Version 0.3.2

Package sigqc. June 13, 2018

Package OLScurve. August 29, 2016

Package epitab. July 4, 2018

Package jstree. October 24, 2017

Package preprosim. July 26, 2016

Package reval. May 26, 2015

Package simsurv. May 18, 2018

Package semisup. March 10, Version Title Semi-Supervised Mixture Model

Package pcr. November 20, 2017

Package CvM2SL2Test. February 15, 2013

Package SSLASSO. August 28, 2018

Package textstem. R topics documented:

Package GWRM. R topics documented: July 31, Type Package

Package messaging. May 27, 2018

Package rprojroot. January 3, Title Finding Files in Project Subdirectories Version 1.3-2

Package qualmap. R topics documented: September 12, Type Package

Package blocksdesign

Package enpls. May 14, 2018

Package censusapi. August 19, 2018

Package egonet. February 19, 2015

Package biotic. April 20, 2016

Package casematch. R topics documented: January 7, Type Package

Package ecoseries. R topics documented: September 27, 2017

Package multdyn. R topics documented: August 24, Version 1.6 Date Title Multiregression Dynamic Models

Package gains. September 12, 2017

Package mgc. April 13, 2018

Ivy s Business Analytics Foundation Certification Details (Module I + II+ III + IV + V)

Package modmarg. R topics documented:

Package MTLR. March 9, 2019

Package deductive. June 2, 2017

Package Mondrian. R topics documented: March 4, Type Package

Package clustering.sc.dp

Package pdfsearch. July 10, 2018

Package fbroc. August 29, 2016

Package subtype. February 20, 2015

Package validara. October 19, 2017

Transcription:

Type Package Package ScottKnottESD May 8, 2018 Title The Scott-Knott Effect Size Difference (ESD) Test Version 2.0.3 Date 2018-05-08 Author Chakkrit Tantithamthavorn Maintainer Chakkrit Tantithamthavorn <kla@chakkrit.com> The Scott-Knott Effect Size Difference (ESD) test is a mean comparison approach that leverages a hierarchical clustering to partition the set of treatment means (e.g., means of variable importance scores, means of model performance) into statistically distinct groups with nonnegligible difference [Tantithamthavorn et al., (2018) <doi:10.1109/tse.2018.2794977>]. License GPL (>= 2) Depends reshape2, effsize, stats, car Imports forecast LazyData true URL https://github.com/klainfo/scottknottesd BugReports https://github.com/klainfo/scottknottesd/issues RoxygenNote 6.0.1 NeedsCompilation no Repository CRAN Date/Publication 2018-05-08 07:48:19 UTC R topics documented: ScottKnottESD-package.................................. 2 "check.anova.assumptions"............................... 3 "long2wide"......................................... 4 "normalize"......................................... 5 example........................................... 5 maven............................................ 6 print.sk_esd......................................... 7 sk_esd............................................ 8 1

2 ScottKnottESD-package Index 10 ScottKnottESD-package The Scott-Knott Effect Size Difference (ESD) Test Details The Scott-Knott Effect Size Difference (ESD) test is a mean comparison approach that leverages a hierarchical clustering to partition the set of treatment means (e.g., means of variable importance scores, means of model performance) into statistically distinct groups with non-negligible difference [Tantithamthavorn et al., (2018) <doi:10.1109/tse.2018.2794977>]. It is an alternative approach of the Scott-Knott test that considers the magnitude of the difference (i.e., effect size) of treatment means with-in a group and between groups. Therefore, the Scott-Knott ESD test (v2.x) produces the ranking of treatment means while ensuring that (1) the magnitude of the difference for all of the treatments in each group is negligible; and (2) the magnitude of the difference of treatments between groups is non-negligible. The mechanism of the Scott-Knott ESD test (v2.x) is made up of 2 steps: (Step 1) Find a partition that maximizes treatment means between groups. We begin by sorting the treatment means. Then, following the original Scott-Knott test, we compute the sum of squares between groups (i.e., a dispersion measure of data points) to identify a partition that maximizes treatment means between groups. (Step 2) Splitting into two groups or merging into one group. Instead of using a likelihood ratio test and a Chi-square distribution as a splitting and merging criterion (i.e., a hypothesis testing of the equality of all treatment means), we analyze the magnitude of the difference for each pair for all of the treatment means of the two groups. If there is any one pair of treatment means of two groups are non-negligible, we split into two groups. Otherwise, we merge into one group. We use the Cohen effect size an effect size estimate based on the difference between the two means divided by the standard deviation of the two treatment means (d = (mean(x_1) - mean(x_2))/s.d.). Unlike the earlier version of the Scott-Knott ESD test (v1.x) that post-processes the groups that are produced by the Scott-Knott test, the Scott-Knott ESD test (v2.x) pre-processes the groups by merging pairs of statistically distinct groups that have a negligible difference. Package: ScottKnottESD Type: Package Version: 2.0.3 Date: 2017-07-03 License: GPL (>= 2) Author(s) Chakkrit (Kla) Tantithamthavorn

"check.anova.assumptions" 3 Maintainer: Chakkrit (Kla) Tantithamthavorn <kla@chakkrit.com> References Chakkrit Tantithamthavorn, Shane McIntosh, Ahmed E. Hassan, Kenichi Matsumoto, An Empirical Comparison of Model Validation Techniques for Defect Prediction Models. IEEE Transactions on Software Engineering. 43(1): 1-18 (2017). <doi:10.1109/tse.2016.2584050> Chakkrit Tantithamthavorn, Shane McIntosh, Ahmed E. Hassan, Kenichi Matsumoto, The Impact of Automated Parameter Optimization for Defect Prediction Models. IEEE Transactions on Software Engineering. Early Access. (2018). <doi:10.1109/tse.2018.2794977> See Also - Examples library(scottknottesd) sk <- sk_esd(example) plot(sk) sk <- sk_esd(maven) plot(sk) "check.anova.assumptions" Check basic ANOVA assumptions Check the normality assumption of the input dataset using the Kolmogorov-Smirnov Test and the homogeneity of variances assumption of the input dataset using the Levene s test. check.anova.assumptions(x, alpha = 0.05,...) Arguments x Value alpha The significance level.... Optional parameters.

4 "long2wide" Author(s) Chakkrit Tantithamthavorn (kla@chakkrit.com) Examples check.anova.assumptions(example) "long2wide" Convert data from long format to wide format Convert data from long format to wide format long2wide(x,...) Arguments x A long-format data frame.... Optional parameters. Value Author(s) Chakkrit Tantithamthavorn (kla@chakkrit.com) Examples long2wide(melt(example, id.vars=0))

"normalize" 5 "normalize" Normalize non-normal distributions using the Box-Cox Power Transformation Normalize non-normal distributions using the Box-Cox Power Transformation normalize(x,...) Arguments x... Optional parameters. Value Author(s) Chakkrit Tantithamthavorn (kla@chakkrit.com) Examples normalized.data <- normalize(example) example An example dataset of Breiman s variable importance scores A dataset containing software metrics of 1,000 calculation of Breiman s variable importance scores example

6 maven Format A data frame with 1,000 rows and 9 variables: LOC lines of code Components the numbers of components Subsystem the numbers of subsystems Files the numbers of files Commit the numbers of commits Churn the numbers of churns Ownership ownership Authorship authorship Experience developer s experience... Source https://github.com/klainfo/scottknottesd/ maven An example dataset of Breiman s variable importance scores A dataset containing software metrics of 1,000 calculation of Breiman s variable importance scores maven Format A data frame with 1,000 rows and 27 variables: Avg_CloneLineCount An average physical lines of clone siblings of a clone. Avg_CountLineComment An average comment lines in the methods that contain clone siblings of a clone. Avg_Cyclomatic McCabe Cyclomatic complexity of the method that contains the clone. Avg_ImproveCommitCount Number of commits that impact the method containing the clone. Avg_LineAdded Number of lines added into the method that contains the clone. Avg_LineCodeCount Number of source code lines in the method that contains the clone. Avg_MaxNesting Maximum nesting level of control constructs in the method that contains the clone. Avg_NewFeatureCommitCount Number of commits that introduce new feature and that impact the method containing the clone.

print.sk_esd 7 Source Avg_RatioCommentToCode Ratio of CommentLineCount to LineCodeCount. Avg_RatioLineCodeCount Ratio of LineCount to CloneLineCount. Avg_TokenCount Number of tokens in the clone. CloneType Type of clone class to which the clone belongs. Diff_CloneLineCount Number of physical lines in the clone. Diff_CountLineComment Number of comment lines in the method that contains the clone. Diff_Cyclomatic McCabe Cyclomatic complexity of the method that contains the clone. Diff_DeveloperCount Number of distinct developers who modified the method that contains the clone. Diff_Essential Numberical measure of structuredness of the method that contains the clone. Diff_FanIn Number of unique methods that call the method containg the clone. Diff_FanOut Number of unique methods that are called by the method containing the clone. Diff_FixCommitCount Number of commits with a description of fixing bugs and that impact the method containing the clone. Diff_LineCodeDeclCount Number of declarative source code lines in the method that contains the clone. Diff_LineCount Number of lines in the method that contains the clone. Diff_LineDeleted Number of lines deleted from the method that contains the clone. Diff_NewFeatureCommitCount Number of commits that introduce new feature and that impact the method containing the clone. Diff_TokenCount Number of tokens in the clone. Max_DirectoryDistance Number of directories that are traversed from the method containing one sibling to the method containing another sibling of the clone. SiblingCount Number of clone siblings in the clone. https://github.com/klainfo/scottknottesd/ print.sk_esd Print sk_esd objects S3 method to print sk_esd objects. ## S3 method for class 'sk_esd' print(x,...)

8 sk_esd Arguments x A sk_esd object... Optional parameters. Value The sk.esd ranks sk_esd A function to check the magnitude of the difference for all pairs of treatments A function to check the magnitude of the difference for all pairs of treatments An enhancement of the Scott-Knott test (which cluster distributions into statistically distinct ranks) that takes effect size into consideration. checkdifference(ranking, data) sk_esd(x, alpha = 0.05,...) Arguments ranking A ranking that is produced by the Scott-Knott ESD test data a data frame of treatment means x alpha The significance level.... Optional parameters. Value A result of the magnitude of the difference for all pairs of treatments. A sk_esd object. Author(s) Chakkrit Tantithamthavorn (kla@chakkrit.com)

sk_esd 9 Examples sk <- sk_esd(example) checkdifference(sk$groups, example) sk <- sk_esd(example) plot(sk) sk <- sk_esd(maven) plot(sk)

Index Topic ScottKnottESD ScottKnottESD-package, 2 Topic datasets example, 5 maven, 6 checkdifference (sk_esd), 8 example, 5 maven, 6 print.sk_esd, 7 ScottKnottESD (ScottKnottESD-package), 2 ScottKnottESD-package, 2 SK.ESD (sk_esd), 8 sk_esd, 8 10