Recap - helloworld - Have all tried?

Similar documents
Biostatistics 615/815 Implementing Fisher s Exact Test

Biostatistics 615/815 - Lecture 3 C++ Basics & Implementing Fisher s Exact Test

Biostatistics 615/815 - Lecture 3 C++ Basics & Implementing Fisher s Exact Test

Biostatistics 615/815 - Lecture 2 Introduction to C++ Programming

Biostatistics 615/815 - Lecture 2 Introduction to C++ Programming

Biostatistics 615/815 Statistical Computing

Biostatistics 615/815 Lecture 13: R packages, and Matrix Library

Biostatistics 615/815 Lecture 4: User-defined Data Types, Standard Template Library, and Divide and Conquer Algorithms

Biostatistics 615/815 - Statistical Computing Lecture 1 : Introduction to Statistical Computing

Biostatistics 615/815 - Statistical Computing Lecture 1 : Introduction to Statistical Computing

Biostatistics 615/815 Lecture 5: More on STLs, and Divide and Conquer Algorithms

Announcements. Biostatistics 615/815 Lecture 5: More on STLs, and Divide and Conquer Algorithms. Recap on STL : vector.

Biostatistics 615/815 Lecture 6: Linear Sorting Algorithms and Elementary Data Structures

Biostatistics 615/815 Lecture 5: Divide and Conquer Algorithms Sorting Algorithms

Announcements. Biostatistics 615/815 Lecture 8: Hash Tables, and Dynamic Programming. Recap: Example of a linked list

Announcements. CSCI 334: Principles of Programming Languages. Lecture 18: C/C++ Announcements. Announcements. Instructor: Dan Barowy

Biostatistics 615/815 Lecture 19: Multidimensional Optimizations

Announcements. Biostatistics 615/815 Lecture 7: Data Structures. Recap : Radix sort. Recap : Elementary data structures. Another good and bad news

Call-by-Type Functions in C++ Command-Line Arguments in C++ CS 16: Solving Problems with Computers I Lecture #5

Biostatistics 615/815 Lecture 9: Dynamic Programming

Biostatistics 615/815 Lecture 13: Programming with Matrix

More Examples Using Functions and Command-Line Arguments in C++ CS 16: Solving Problems with Computers I Lecture #6

Recap: Hash Tables. Recap : Open hash. Recap: Illustration of ChainedHash. Key features. Key components. Probing strategies.

CS 161 Exam II Winter 2018 FORM 1

Exercise 1.1 Hello world

17 February Given an algorithm, compute its running time in terms of O, Ω, and Θ (if any). Usually the big-oh running time is enough.

Arrays and Pointers (part 2) Be extra careful with pointers!

Biostatistics 615/815 Lecture 20: Expectation-Maximization (EM) Algorithm

Introduction to Programming

Biostatistics 615/815 Lecture 13: Numerical Optimization

Overloading Functions & Command Line Use in C++ CS 16: Solving Problems with Computers I Lecture #6

PROGRAMMING IN C++ CVIČENÍ

Welcome to MCS 360. content expectations. using g++ input and output streams the namespace std. Euclid s algorithm the while and do-while statements

Chapter 3 - Functions

ENEE 150: Intermediate Programming Concepts for Engineers Spring 2018 Handout #27. Midterm #2 Review

INTRODUCTION TO C++ FUNCTIONS. Dept. of Electronic Engineering, NCHU. Original slides are from

3.Constructors and Destructors. Develop cpp program to implement constructor and destructor.

Project 1. due date Sunday July 8, 2018, 12:00 noon

Biostatistics 615/815 Lecture 16: Importance sampling Single dimensional optimization

FORM 1 (Please put your name and section number (001/10am or 002/2pm) on the scantron!!!!) CS 161 Exam II: True (A)/False(B) (2 pts each):

Physics 234: Computational Physics

CONTENTS: What Is Programming? How a Computer Works Programming Languages Java Basics. COMP-202 Unit 1: Introduction

Introduction to C++ Introduction to C++ 1

Arrays and Pointers (part 2) Be extra careful with pointers!

Lecture 10: Recursion vs Iteration

Introduction to C ++

Computer Science II Lecture 1 Introduction and Background

CS 376b Computer Vision

GE U111 Engineering Problem Solving & Computation Lecture 6 February 2, 2004

CHAPTER 4 FUNCTIONS. Dr. Shady Yehia Elmashad

CSE548, AMS542: Analysis of Algorithms, Fall 2012 Date: October 16. In-Class Midterm. ( 11:35 AM 12:50 PM : 75 Minutes )

BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, Pilani Pilani Campus Instruction Division. SECOND SEMESTER Course Handout Part II

More on Arrays CS 16: Solving Problems with Computers I Lecture #13

More Flow Control Functions in C++ CS 16: Solving Problems with Computers I Lecture #4

Biostatistics 615/815 Lecture 23: Using C++ code in R

C++ Tutorial AM 225. Dan Fortunato

ENCM 369 Winter 2019 Lab 6 for the Week of February 25

CSCI-1200 Data Structures Spring 2016 Lecture 6 Pointers & Dynamic Memory

CS3 Midterm 2 Summer 2008

CS3157: Advanced Programming. Outline

Lecture 2. COMP1406/1006 (the Java course) Fall M. Jason Hinek Carleton University

QUIZ Lesson 4. Exercise 4: Write an if statement that assigns the value of x to the variable y if x is in between 1 and 20, otherwise y is unchanged.

Functions in C. Lecture Topics. Lecture materials. Homework. Machine problem. Announcements. ECE 190 Lecture 16 March 9, 2011

C++ Programming Lecture 11 Functions Part I

Programming with Arrays Intro to Pointers CS 16: Solving Problems with Computers I Lecture #11

Biostatistics 615/815 Lecture 23: The Baum-Welch Algorithm Advanced Hidden Markov Models

1 #include <iostream > 3 using std::cout; 4 using std::cin; 5 using std::endl; 7 int main(){ 8 int x=21; 9 int y=22; 10 int z=5; 12 cout << (x/y%z+4);

OpenVL User Manual. Sarang Lakare 1. Jan 15, 2003 Revision :

Class Information ANNOUCEMENTS

the gamedesigninitiative at cornell university Lecture 7 C++ Overview

CSCI-1200 Data Structures Spring 2015 Lecture 2 STL Strings & Vectors

Floating-point numbers. Phys 420/580 Lecture 6

Multiple Choice (Questions 1 14) 28 Points Select all correct answers (multiple correct answers are possible)

CS 103 Lab The Files are *In* the Computer

Bits, Bytes, and Integers Part 2

Programming and Data Structure Solved. MCQs- Part 2

CPSC213/2014W1 Midterm EXTRA Practice

Reviewing all Topics this term

Slide Set 8. for ENCM 339 Fall 2017 Section 01. Steve Norman, PhD, PEng

Sample Problems for Quiz # 2

Q1 Q2 Q3 Q4 Q5 Total 1 * 7 1 * 5 20 * * Final marks Marks First Question

CSE373 Fall 2013, Midterm Examination October 18, 2013

CPSC 2380 Data Structures and Algorithms

Class Information ANNOUCEMENTS

CS 107 Lecture 2: Bits and Bytes (continued)

Programming in C Quick Start! Biostatistics 615 Lecture 4

Pre- Defined Func-ons in C++ Review for Midterm #1

CSCI-1200 Data Structures Fall 2017 Lecture 2 STL Strings & Vectors

CS 222: More File I/O, Bits, Masks, Finite State Problems

Announcements. Assignment 2 due next class in CSIL assignment boxes

ENCM 335 Fall 2018 Lab 2 for the Week of September 24

EE355 Lab 5 - The Files Are *In* the Computer

Your first C and C++ programs

This is CS50. Harvard University Fall Quiz 0 Answer Key

by Marina Cholakyan, Hyduke Noshadi, Sepehr Sahba and Young Cha

PHY4321 Summary Notes

CSCI Wednesdays: 1:25-2:15 Keller Thursdays: 4:00-4:50 Akerman 211

G52CPP C++ Programming Lecture 18

Professor Terje Haukaas University of British Columbia, Vancouver C++ Programming

Advanced Flow Control CS 16: Solving Problems with Computers I Lecture #5

Transcription:

- helloworld - Have all tried? Biostatistics 615/815 Implementing Fisher s Exact Test Hyun Min Kang Januray 13th, 2011 Hyun Min Kang Biostatistics 615/815 - Lecture 3 Januray 13th, 2011 1 / 22 Writing helloworldcpp // import input/output handling library std::cout << "Hello, World" << std::endl; // program exits normally Compiling helloworldcpp Install Cygwin (Windows), Xcode (MacOS), or nothing (Linux) Running helloworld user@host:~/$ g++ -o helloworld helloworldcpp user@host:~/$ /helloworld Hello, World Hyun Min Kang Biostatistics 615/815 - Lecture 3 Januray 13th, 2011 2 / 22 - precisionexamplecpp precisionexamplecpp float smallfloat = 1e-8; // a small value float largefloat = 1; Running precisionexample // difference in 8 (>72) decimal figures std::cout << "smallfloat = " << smallfloat << std::endl; smallfloat = smallfloat + largefloat; smallfloat = smallfloat - largefloat; std::cout << "smallfloat = " << smallfloat << std::endl; // similar thing happens for doubles (eg 1e-20 vs 1) user@host:~/$ /precisionexample smallfloat = 1e-08 smallfloat = 0 Hyun Min Kang Biostatistics 615/815 - Lecture 3 Januray 13th, 2011 3 / 22 - Handling command line arguments echocpp - echoes command line arguments to the standard output for(int i=1; i < argc; ++i) { // i=1 : 2nd argument (skip program name) if ( i > 1 ) // print blank if there is an item already printed std::cout << " "; std::cout << argv[i]; Compiling and running echocpp // print each command line argument std::cout << std::endl; // print end-of-line at the end user@host:~/$ g++ -o echo echocpp user@host:~/$ /echo you need to try this out by yourself! you need to try this out by yourself! Hyun Min Kang Biostatistics 615/815 - Lecture 3 Januray 13th, 2011 4 / 22

Annoucements Let s implement Fisher s exact Test 815 Projects will be annouced in the next lecture Midterm date will be scheduled on March 10 - any objections? Final exame date will be April 21st, 10:30am-12:30pm, just like the official schedule Homework #1 will be annouced today Input - A 2 2 table Placebo Treatment Total Diseased a b a+b Cured c d c+d Total a+c b+d n Desired Program Interface and Results user@host:~/$ /fishersexacttest 1 2 3 0 Two-sided p-value is 04 user@host:~/$ /fishersexacttest 2 7 8 2 Two-sided p-value is 00230141 user@host:~/$ /fishersexacttest 20 70 80 20 Two-sided p-value is 590393e-16 Hyun Min Kang Biostatistics 615/815 - Lecture 3 Januray 13th, 2011 5 / 22 Hyun Min Kang Biostatistics 615/815 - Lecture 3 Januray 13th, 2011 6 / 22 Simple Math under Fisher s Exact Test Possible 2 2 tables Hypergeometric distribution Placebo Treatment Total Diseased x a+b-x a+b Cured a+c-x d-a+x c+d Total a+c b+d n Given a + b, c + d, a + c, b + d and n = a + b + c + d, Pr(x) = (a + b)!(c + d)!(a + c)!(b + d)! x!(a + b x)!(a + c x)!(d a + x)!n! Fishers s Exact Test (2-sided) p FET (a, b, c, d) = x Pr(x)I[Pr(x) Pr(a)] Hyun Min Kang Biostatistics 615/815 - Lecture 3 Januray 13th, 2011 7 / 22 cpp - main() function double hypergeometricprob(int a, int b, int c, int d); // defined later // read input arguments int a = atoi(argv[1]), b = atoi(argv[2]), c = atoi(argv[3]), d = atoi(argv[4]); int n = a + b + c + d; // find cutoff probability double pcutoff = hypergeometricprob(a,b,c,d); double pvalue = 0; // sum over probability smaller than the cutoff for(int x=0; x <= n; ++x) { // among all possible x if ( a+b-x >= 0 && a+c-x >= 0 && d-a+x >=0 ) { // consider valid x double p = hypergeometricprob(x,a+b-x,a+c-x,d-a+x); if ( p <= pcutoff ) pvalue += p; std::cout << "Two-sided p-value is " << pvalue << std::endl; Hyun Min Kang Biostatistics 615/815 - Lecture 3 Januray 13th, 2011 8 / 22

cpp hypergeometricprob() function int fac(int n) { // calculates factorial int ret; for(ret=1; n > 0; --n) { ret *= n; double hypergeometricprob(int a, int b, int c, int d) { int num = fac(a+b) * fac(c+d) * fac(a+c) * fac(b+d); int den = fac(a) * fac(b) * fac(c) * fac(d) * fac(a+b+c+d); return (double)num/(double)den; user@host:~/$ / 1 2 3 0 Two-sided p-value is 04 // correct user@host:~/$ / 2 7 8 2 Two-sided p-value is 441018 // INCORRECT Hyun Min Kang Biostatistics 615/815 - Lecture 3 Januray 13th, 2011 9 / 22 Considering Precision Carefully factorialcpp int fac(int n) { // calculates factorial int ret; for(ret=1; n > 0; --n) { ret *= n; int n = atoi(argv[1]); std::cout << n << "! = " << fac(n) << std::endl; user@host:~/$ /factorial 10 10! = 362880 // correct user@host:~/$ /factorial 12 12! = 479001600 // correct user@host:~/$ /factorial 13 13! = 1932053504 // INCORRECT : 13! > INT_MAX == 2147483647 Hyun Min Kang Biostatistics 615/815 - Lecture 3 Januray 13th, 2011 10 / 22 cpp new hypergeometricprob() function double fac(int n) { // main() function remains the same double ret; // use double instead of int for(ret=1; n > 0; --n) { ret *= n; double hypergeometricprob(int a, int b, int c, int d) { double num = fac(a+b) * fac(c+d) * fac(a+c) * fac(b+d); double den = fac(a) * fac(b) * fac(c) * fac(d) * fac(a+b+c+d); return num/den; // use double to calculate factorials user@host:~/$ / 2 7 8 2 Two-sided p-value is 0023041 user@host:~/$ / 20 70 80 20 Two-sided p-value is 0 (fac(190) > 1e308 - beyond double precision) Hyun Min Kang Biostatistics 615/815 - Lecture 3 Januray 13th, 2011 11 / 22 How to perform Fisher s exact test with large values Problem - Limited Precision int handles only up to fac(12) double handles only up to fac(170) Solution - Calculate in logarithmic scale log Pr(x) = log(a + b)! + log(c + d)! + log(a + c)! + log(b + d)! log x! log(a + b x)! log(a + c x)! log(d a + x)! log n! [ ] log(p FET ) = log Pr(x)I(Pr(x) Pr(a)) x [ ] = log Pr(a) + log exp(log Pr(x) log Pr(a))I(log Pr(x) log Pr(a)) x Hyun Min Kang Biostatistics 615/815 - Lecture 3 Januray 13th, 2011 12 / 22

cpp - main() function #include <cmath> // for calculating log() and exp() double loghypergeometricprob(int a, int b, int c, int d); // defined later int a = atoi(argv[1]), b = atoi(argv[2]), c = atoi(argv[3]), d = atoi(argv[4]); int n = a + b + c + d; double logpcutoff = loghypergeometricprob(a,b,c,d); double pfraction = 0; for(int x=0; x <= n; ++x) { // among all possible x if ( a+b-x >= 0 && a+c-x >= 0 && d-a+x >=0 ) { // consider valid x double l = loghypergeometricprob(x,a+b-x,a+c-x,d-a+x); if ( l <= logpcutoff ) pfraction += exp(l - logpcutoff); double logpvalue = logpcutoff + log(pfraction); std::cout << "Two-sided log10-p-value is " << logpvalue/log(10) << std::endl; std::cout << "Two-sided p-value is " << exp(logpvalue) << std::endl; Hyun Min Kang Biostatistics 615/815 - Lecture 3 Januray 13th, 2011 13 / 22 Filling the rest loghypergeometricprob() double logfac(int n) { double ret; for(ret=0; n > 0; --n) { ret += log((double)n); double loghypergeometricprob(int a, int b, int c, int d) { return logfac(a+b) + logfac(c+d) + logfac(a+c) + logfac(b+d) - logfac(a) - logfac(b) - logfac(c) - logfac(d) - logfac(a+b+c+d); user@host:~/$ / 2 7 8 2 Two-sided log10-p-value is -163801, p-value is 00230141 user@host:~/$ / 20 70 80 20 Two-sided log10-p-value is -152289, p-value is 590393e-16 user@host:~/$ / 200 700 800 200 Two-sided log10-p-value is -147563, p-value is 273559e-148 Hyun Min Kang Biostatistics 615/815 - Lecture 3 Januray 13th, 2011 14 / 22 Even faster Computational speed for large dataset time / 2000 7000 8000 2000 Two-sided log10-p-value is -146613, p-value is 0 real 0m42614s time / 2000 7000 8000 2000 Two-sided log10-p-value is -146613, p-value is 0 real 0m0007s How to make it faster? Most time consuming part is the repetitive computation of factorial # of loghypergeometricprobs calls is a + b + c + d = n # of logfac call 9n # of log calls 9n 2 - could be billions in the example above Key Idea is to store logfac values to avoid repetitive computation Hyun Min Kang Biostatistics 615/815 - Lecture 3 Januray 13th, 2011 15 / 22 cpp - main() function // everything remains the same except for lines marked with *** #include <cmath> double loghypergeometricprob(double* logfacs, int a, int b, int c, int d); // *** void initlogfacs(double* logfacs, int n); // *** New function *** int a = atoi(argv[1]), b = atoi(argv[2]), c = atoi(argv[3]), d = atoi(argv[4]); int n = a + b + c + d; double* logfacs = new double[n+1]; // *** dynamically allocate memory logfacs[0n] *** initlogfacs(logfacs, n); // *** initialize logfacs array *** double logpcutoff = loghypergeometricprob(logfacs,a,b,c,d); // *** logfacs added double pfraction = 0; for(int x=0; x <= n; ++x) { if ( a+b-x >= 0 && a+c-x >= 0 && d-a+x >=0 ) { double l = loghypergeometricprob(x,a+b-x,a+c-x,d-a+x); if ( l <= logpcutoff ) pfraction += exp(l - logpcutoff); double logpvalue = logpcutoff + log(pfraction); std::cout << "Two-sided log10-p-value is " << logpvalue/log(10) << std::endl; std::cout << "Two-sided p-value is " << exp(logpvalue) << std::endl; Hyun Min Kang Biostatistics 615/815 - Lecture 3 Januray 13th, 2011 16 / 22

cpp - other functions so far function initlogfacs() void initlogfacs(double* logfacs, int n) { logfacs[0] = 0; for(int i=1; i < n+1; ++i) { logfacs[i] = logfacs[i-1] + log((double)i); // only n times of log() calls function loghypergeometricprob() double loghypergeometricprob(double* logfacs, int a, int b, int c, int d) { return logfacs[a+b] + logfacs[c+d] + logfacs[a+c] + logfacs[b+d] - logfacs[a] - logfacs[b] - logfacs[c] - logfacs[d] - logfacs[a+b+c+d]; Hyun Min Kang Biostatistics 615/815 - Lecture 3 Januray 13th, 2011 17 / 22 Algorithms are computational steps towerofhanoi utilizing recursions Data types and floating-point precisions Operators, if, for, and while statements Arrays and strings Pointers and References Functions Fisher s Exact Test - works only tiny datasets - handles small datasets - handles hundreds of observations - equivalent to logfisherexacttest but faster At Home : Reading material for novice C++ users : http://wwwcpluspluscom/doc/tutorial/ - Until Control Structures Hyun Min Kang Biostatistics 615/815 - Lecture 3 Januray 13th, 2011 18 / 22 Homework #1 Homework #1 Problem #1 Implement a program fullfastfishersexacttest which Prints out an error message and return -1 when number of input arguments are not 4 (excluding the program name itself) and outputs the two-sided p-value, and one-sided p-values p 2sided (a, b, c, d) = x Pr(x)I[Pr(x) Pr(a)] p greater (a, b, c, d) = x a Pr(x) p less (a, b, c, d) = x a Pr(x) based on cpp Problem #1 - Example Program Interface user@host:~/$ /fullfastfishersexacttest 2 7 8 2 Two-sided log10(p) = -163801, p-value = 00230141 One-sided (less) log10(p) = -173232, p-value = 00185217 One-sided (greater) log10(p) = -0000428027, p-value = 0999015 user@host:~/$ /fullfastfishersexacttest 20 70 80 20 Two-sided log10(p) = -152289, p-value = 590393e-16 One-sided (less) log10(p) = -153764, p-value = 420368e-16 One-sided (greater) log10(p) = 80232e-14, p-value = 1 user@host:~/$ /fullfastfishersexacttest Usage: fullfastfishersexacttest [#row1col1] [#row1col2] [#row2col1] [#row2col2] Hyun Min Kang Biostatistics 615/815 - Lecture 3 Januray 13th, 2011 19 / 22 Hyun Min Kang Biostatistics 615/815 - Lecture 3 Januray 13th, 2011 20 / 22

More Homework Problems Next Lecture Several example codes asking for the outputs (with explanations) Absolutely no discussion with your classmates But you can consult to your computer (to compiler, not google!) If you are certain what the answer is, you can just write your answer If you are not certain, you can write the code, and see what happens Turn in your hard copy of your answers before Thursday (19th) lecture And email the source code (cpp only) for Homework #1 to hmkang@umichedu with title BIOSTAT 615/815 Homework #1 - [your name] More on C++ Programming Standard Template Library User-defined data types Divide and Conquer Algorithms Binary Search Insertion Sort (skipped in lecture 1) Merge Sort Hyun Min Kang Biostatistics 615/815 - Lecture 3 Januray 13th, 2011 21 / 22 Hyun Min Kang Biostatistics 615/815 - Lecture 3 Januray 13th, 2011 22 / 22