Parallel Processing Neural Networks on SIMD/GPU Architectures by Derek Kern CSC7551, December 8th, 2011
|
|
- Kenneth Stewart
- 5 years ago
- Views:
Transcription
1 Parallel Processing Neural Networks on SIMD/GPU Architectures by Derek Kern CSC7551, December 8th, 2011 Project Description Neural networks can often have hundreds, if not thousands of neurons when used to solve a pattern matching task. Specifically, backpropogation neural networks must, when responding to an input, 'ripple' the effect of the input across each and every layer before producing an output. Furthermore, when training, this 'rippling' must go from input to output and then back from the output into the hidden layers. Obviously, depending up the size of the network, these tasks can be computationally daunting. In this project, a backpropogation neural network will be modelled and computed with the use of a GPU vector-processor such that each neuron will occupy one or many individual PEs. This is thought to be an interesting parallel computation task for a number of reasons: (1) Since some layers of neurons (PEs) must fire while others remain idle, it will require significant effort to coordinate PE behavior; (2) Since each neuron (PE) in a layer must be able to read the output of many or all of the neurons in the previous layer, there is a significant risk of memory access collisions; and (3) given the number of computations needed to determine the output weight for a neuron, there is a chance that multilevel parallelism may be used, i.e. for each neuron being handled in parallel, multiple PEs may be used to compute its weight. Analysis and Results Broad Results The overall goals of the project were: (1) to achieve a basic vectorization of a backpropogation neural network; (2) to explore the coordination and other issues that arise by running the neural network on a GPU; and (3) to achieve an extreme vectorization of a backpropogation neural network. During the project all three of these goals were met. On top of this, both the basic and extreme vector versions of the neural network vastly outperformed the sequential version. Furthermore, an thorough understanding of the GPU hardware was gained. The GPU threading model is something that is not well covered in most texts. It was through this project that an understanding of how to fully exploit the threading model of the GPU was gained; blocks and threads need to be specified so that the streaming-multiprocessors and the cores within are used with the greatest efficiency. It was also through this project that the details of kernel thread synchronization were learned; the only way to synchronize across blocks is via kernel calls; the procedure syncthreads() only synchronizes threads within blocks. Detailed Results The simplest and most straightforward vector version is called vectorized simple. Below is the runtime comparison of it versus the sequential version. 1
2 From the chart above, it is clear that this simple vectorization outperforms the sequential version across all of the test networks. The next two vector versions, vectorized warp bad and vectorized warp good, are meant to display the effects of allocating blocks and threads within the GPU and how these settings can affect the utilization of the GPU s streaming multiprocessors (SMs). For the record, the version vectorized simple does a poor job of allocating blocks and threads; its block/thread configuration results each thread residing within its own warp. Vectorized warp bad allocates 50 threads per block. This means that each SM that is doing its processing will end up with two warps (one of 32 threads and one of 18 threads) to manage; the SM can only run one warp at a time so the other warp will remain idle. However, this is still better than vectorized simple. Vectorized warp good allocates 32 threads per block. This means that each SM that is doing its processing will end up with one warp to manage, unless more than 448 threads are needed (which is the case for test networks, Net 4, Net 6, and Net 8). However, even if say 500 threads are needed, most SMs will remain with only one warp to manage; only two will be saddled with an extra warp. This means that most warps can be fully processed without waiting on other warps to finish. Below is the runtime comparison of the vectorized simple, vectorized warp bad and vectorized warp good. 2
3 As the chart shows, the results aren t as stark as one might imagine. However, it is clear that as the need for parallelism increases (like in the wide test networks Net 4, Net 6, and Net 8), vectorized warp good version does outperform the other versions. Still, it isn t yet clear why it doesn t perform as well for the versions that require less parallelism. However, the theory is that the warped versions given the higher active thread to size of memory to be accessed (density) experience a slow down due to memory bank collisions. This is especially the case for the test networks that have 200 or fewer neurons per layer (Nets 1, 2, 3, 5, and 7). As the number of neurons per layer increases (say to 500, like in Nets 4, 6, and 8), the warped versions are able to spread out their memory accesses over a great space of memory, which results in fewer collisions and better runtime. This is a significant result. In essence, it means that even though there isn t significant documentation on the exact layout of global memory on the GPU, faster access can still be achieved, in certain circumstances, by deliberately choosing a sparse data structure. Certainly, if the neural network software were to be redesigned today, this is something that would drive the design of the neural network data structure. The next vector version, vectorized kcm, is meant to display the overhead of making repeated kernel calls. The vectorized simple was written so that weight adjustment step is done with two loops over all of the layers in the network; each of the iterations invokes another kernel call. The vectorized kcm version combines these loops and the kernel calls within. Below is the runtime comparison of the vectorized simple and vectorized kcm. 3
4 The vectorized kcm version does indeed yield modest results, but not as stark as hoped. The vectorized kcm version led to the creation of a version that was initially called vectorized full-kcm. However, this version was ultimately dubbed unworkable since it required block-level synchronization, which is not possible on NVidia GPUs without separate kernel calls. This version was eventually redubbed vectorized kcm failed. Just to see whether it could be made to work at all, it was run within a single block. Below is the runtime comparison of the vectorized simple, vectorized kcm, and vectorized kcm failed versions. From the chart, it is easy to see that vectorized kcm failed was total failure. Running it within a single block doomed it to a very modest parallelism (However, it still outperforms the sequential version). The next vector version, vectorized mass, is meant to be a more fully parallelized version of vectorized simple. While vectorized simple parallizes the neurons only, vectorized mass parallizes the processing of the weights as well. 4
5 Below is the runtime comparison of the vectorized simple and vectorized mass versions. Clearly, from the chart, vectorized mass was a complete success. It outperforms vectorized simple asymptotically with the size of the neural network. The next and final vector version, vectorized kcm mass, iis meant to combine what was learned from vectorized mass with what was learned from vectorized kcm. Essentially, it is the vectorized mass version with weight adjustment step combined. This version, though it is only a modest improvement upon vectorized mass, was the version that ultimately performed the best. Below is the runtime comparison of the vectorized mass and vectorized kcm mass versions. 5
6 Now that all of the versions have been compared locally, below is a global comparison of all versions. Again, all of the vector versions outperform the sequential version. The versions that employ massive parallelism outperform all comers. Below is a chart that compares the speedups offered by the various vector versions. As expected, the chart shows that the versions employing massive parallelism enjoy the largest speedups against the sequential version. In fact, on Net 8, vectorized mass and vectorized kcm mass 6
7 offer more than a 20 times speedup. Finally, now that the runtimes and speedups of the vector versions are known, it is worth noting how efficiently each uses the parallel resources of the GPU. Below is a chart that compares the efficiencies of the various versions. As is obvious from chart, the vectorized kcm and vectorized simple versions offer the most efficiency; but, of course, this comes with a smaller speedup. The vectorized mass and vectorized kcm mass are the least efficient but offer the most significant speedup. As is typical in parallel processing, with the commitment of more resources comes more speed. Overall, the project was a success. Neural networks can be effectively processed on GPUs. Furthermore, not only can they be processed on GPUs, it appears to be desirable to do so. GPUs offer very significant speedups over sequential processing. Down the road, one can imagine, for very large networks, using OpenMP to distribute portions of the network to various nodes. However, of simply passing the network portions off to the cores on each node, perhaps it would be more desirable to pass the network portions off to the various GPUSs on each node. Compiling and Running Instructions Compiling To compile the sequential version, execute the following: g++ RunNNetwork.cpp NNetworkUtils.cpp NNetwork.cpp -o RunNNetwork To compile any of the vector versions, execute the following: nvcc -arch sm_20 RunNNetwork.cu NNetworkCuda.cu NNetwork.Utils.cpp NNetwork.cu -o RunNNetwork Note that the architecture switch is specified because doubles are used and because it makes placing printf statements in kernel code possible. 7
8 Running Whether running the sequential or one of the vector versions, two arguments are required. One is a configuration file and the other is a test file. The configuration file contains the information necessary for building and training a neural network. The test file contains the information necessary for testing the neural network. To run the sequential version, execute the following: bpsh <node> <path to>/runnnetwork <path to>/network_config.cfg <path to>/network_test.tst Below is a good example: bpsh 6 /home/derek.kern/csc7551/project/sequential/runnnetwork /home/derek.kern/csc7551/project/ nnetwork1.cfg /home/derek.kern/csc7551/project/nnetwork1.tst Running the vector versions requires a node with a GPU. Also, all of the vector versions take a final optional argument: GPU number. This allows the parallel code to be run on either GPU #0 or GPU #1 on the respective node. To run the sequential version, execute the following: bpsh <node> <path to>/runnnetwork <path to>/network_config.cfg <path to>/network_test.tst <gpu #> Below is a good example: bpsh 14 /home/derek.kern/csc7551/project/vectorized_simple/runnnetwork /home/derek.kern/csc7551/project/ nnetwork1.cfg /home/derek.kern/csc7551/project/nnetwork1.tst 1 Code Sequential Version RunNNetwork.cpp #include "NNetwork.h" #include "NNetworkUtils.h" bool check_command_line( int argc, char* argv[] ) { Make sure that the correct arguments were passed. FILE *fp = NULL; bool ok = true; if( argc < 3 ) { cout << "Format: RunNNetwork <network configuration file> <network test file>" << endl; cout << "Arguments:" << endl; cout << " network configuration file - This file should contain parameters for" << endl; cout << " network size, training rate, etc as " << endl; cout << " a set of data to train the network" << endl; cout << " network test file - This file should contain data for testing the " << endl; cout << " network after it has been trained" << endl; ok = false; else { Make sure that the configuration file exists. if( fp = fopen( argv[1], "r" ) ) { fclose( fp else { cout << "Specified network configuration file [" << argv[1] << "] doesn't exist or cannot be opened" << endl; ok = false; Make sure that the test file exists if( fp = fopen( argv[2], "r" ) ) { 8
9 fclose( fp else { cout << "Specified network test file [" << argv[2] << "] doesn't exist or cannot be opened" << endl; ok = false; return ok; int main( int argc, char* argv[] ) { Main function for running the network First make sure that the user has provided the necessary input. if (!check_command_line( argc, argv ) ) { return 1; Read in the network configuration. NNetworkConfig nnc = read_network_configuration( argv[1] Read in the network tests. TestInputs tests = read_network_tests( argv[2], nnc->layer_config->input_layer_size(), nnc->layer_config->output_layer_size() Build the neural network. NeuralNetwork net = build_neural_network( nnc->layer_config Initialize the network to begin with. initialize_neural_network( net Train the network. do_network_training( net, nnc->tests, nnc->params Test the network and report on results. cout << "Applying test data to network:" << endl; apply_network_tests( net, tests Free up the memory associated with the neural network. destroy_neural_network( net free( net return 0; NNetworkUtils.h #ifndef nnetworkutils_h #define nnetworkutils_h #include <stdlib.h> #include <string.h> #define LINE_SIZE 1024 NNetworkConfig read_network_configuration( char *config_filename TestInputs read_network_tests( char *test_filename, int input_layer_size, int output_layer_size 9
10 TestInputs _read_network_tests( FILE *test_file, int input_layer_size, int output_layer_size #endif NNetworkUtils.cpp #include "NNetwork.h" NeuralNetwork build_neural_network( NetworkLayerConfig layer_config ) { Build the neural network that corresponds to the layer configuration. NeuralNetwork net = (NeuralNetwork) malloc( sizeof( struct NeuralNetwork ) int total_neurons_needed = layer_config->total_neurons_needed_for_network( int total_weights_needed = layer_config->total_neuron_weights_needed_for_network( Setup the basic layer layout. net->layer_count = layer_config->layer_count; Copy the sizes of the layers. net->layer_sizes = (int*) malloc( sizeof( int ) * net->layer_count for( int i = 0; i < net->layer_count; i++ ) { net->layer_sizes[i] = layer_config->layer_sizes[i]; Setup the memory for the neuronal weights Total weight slots needed is given by the following: Sum from i to layer_count: layer_sizes[i - 1] * layer_sizes[i] See total_neuron_weights_needed_for_network() for details. net->weights = (double*) malloc( sizeof( double ) * total_weights_needed Setup the memory for caching of the neuronal weights. Total weight slots needed is given by the following: Sum from i to layer_count: layer_sizes[i - 1] * layer_sizes[i] See total_neuron_weights_needed_for_network() for details. net->cached_weights = (double*) malloc( sizeof( double ) * total_weights_needed Setup the memory of the outputs of the neurons. Total output slots needed is given by the following: Sum from i to layer_count: layer_sizes[i] See total_neurons_needed_for_network() for details net->outputs = (double*) malloc( sizeof( double ) * total_neurons_needed Setup the memory of the errors of the neurons. Total error slots needed is given by the following: Sum from i to layer_count: layer_sizes[i] See total_neurons_needed_for_network() for details net->errors = (double*) malloc( sizeof( double ) * total_neurons_needed return net; void destroy_neural_network( NeuralNetwork net ) { Free memory from the network network. 10
11 Delete the memory for the neuron weights. free( net->weights Delete the memory for the weight caching. free( net->cached_weights Delete the memory for the neuron outputs. free( net->outputs Delete the memory for the error (differences). free( net->errors Finally, clear out the layer sizes. free( net->layer_sizes void initialize_neural_network( NeuralNetwork net ) { Initial the weights of the network with random values and zero out the cache. int i_offset, j_offset; Seed the random number generator. srand( (unsigned) time( NULL ) Set the neuronal weights to random values. for( int i = 1; i < net->layer_count; i++ ) { i_offset = net->total_neuron_weights_before_layer( i for( int j = 0; j < net->layer_sizes[i]; j++ ) { This is the total number of weights in this layer prior to this neuron. j_offset = j * net->layer_sizes[i - 1]; for( int k = 0; k < net->layer_sizes[i - 1] + 1; k++ ) { net->weights[i_offset + j_offset + k] = (double) ( rand() ) / ( RAND_MAX / 2 ) - 1; Zero out the weight cache. for( int i = 1; i < net->layer_count; i++ ) { i_offset = net->total_neuron_weights_before_layer( i for( int j = 0; j < net->layer_sizes[i]; j++ ) { This is the total number of weights in this layer prior to this neuron. j_offset = j * net->layer_sizes[i - 1]; for( int k = 0; k < net->layer_sizes[i - 1] + 1; k++ ) { net->cached_weights[i_offset + j_offset + k] = 0.0f; 11
12 void feedforward( NeuralNetwork net, double *inputs ) { Feed the inputs forward through the neural network until the outpus are determined. double weighted_sum; Start by putting the inputs onto the input layer. for( int j = 0; j < net->layer_sizes[0]; j++ ) { net->outputs[0 + j] = inputs[j]; Now ripple the effect of the input across the layers. for( int i = 1; i < net->layer_count; i++ ) { Figure out the layer-based weight and output offsets int iw_offset = net->total_neuron_weights_before_layer( i int io_offset = net->total_neurons_before_layer( i int io_prev_offset = net->total_neurons_before_layer( i - 1 Apply the result to each neuron in the current layer. for( int j = 0; j < net->layer_sizes[i]; j++ ) { Mock up the kernel computation. kernel_feedforward( i, net->outputs, net->weights, iw_offset, io_offset, io_prev_offset, net- >layer_sizes[i - 1], j void kernel_feedforward( int layer_number, double *outputs, double *weights, int iw_offset, int io_offset, int io_prev_offset, int prev_layer_size, int j ) { Do the feedforward, but model it for kernel computation. double weighted_sum; Figure out the neuron-based weight int jw_offset = j * prev_layer_size; Reset the sum. weighted_sum = 0.0f; Sum the outputs from the previous layer, adjusted by the connection weights. for( int k = 0; k < prev_layer_size; k++ ) { weighted_sum += outputs[io_prev_offset + k] * weights[iw_offset + jw_offset + k]; Now, for this neuron, set the output. outputs[io_offset + j] = calculate_sigmoid( weighted_sum + weights[iw_offset + jw_offset + prev_layer_size] void backpropogate( NeuralNetwork net, double *inputs, double *desired_outputs, TrainingParameters params ) { 12
13 Feed the inputs forward through the neural network until the outpus are determined. Afterwards, turn around and neuro-connection weights so that they more reliably produce the desired output. double weighted_sum; Start by feeding forward the input values. This will put values onto the output nodes. We can then compare these to the desired values and backpropogate the changes. feedforward( net, inputs Calculate the error values for the output layer. int i_offset = net->total_neurons_before_layer( net->layer_count - 1 for( int j = 0; j < net->layer_sizes[net->layer_count - 1]; j++ ) { net->errors[i_offset + j] = ( net->outputs[i_offset + j] * ( 1 - net->outputs[i_offset + j] ) * ( desired_outputs[j] - net->outputs[i_offset + j] ) Calculate the error values for the hidden layers. for( int i = net->layer_count - 2; i > 0; i-- ) { Figure out layer-based weight and output/error offsets int iw_next_offset = net->total_neuron_weights_before_layer( i + 1 int io_offset = net->total_neurons_before_layer( i int io_next_offset = net->total_neurons_before_layer( i + 1 Calculate the error for each neuron in the layer. for( int j = 0; j < net->layer_sizes[i]; j++ ) { Mock up the kernel computation. kernel_backpropogation_backfeed_errors( i, net->outputs, net->weights, net->errors, iw_next_offset, io_offset, io_next_offset, net->layer_sizes[i], net->layer_sizes[i + 1], j Adjust the weights according to the learning momentum for( int i = 1; i < net->layer_count; i++ ) { Figure out the layer-based weight and output/error offsets int iw_offset = net->total_neuron_weights_before_layer( i Adjust the weight for each neuron within the current layer. for( int j = 0; j < net->layer_sizes[i]; j++ ) { Mock up the kernel computation. kernel_backpropogation_apply_momentum( i, net->weights, net->cached_weights, params->learning_momentum, iw_offset, net->layer_sizes[i - 1], j Adjust weights according to the learning rate. Also, cache the weights. 13
14 for( int i = 1; i < net->layer_count; i++ ) { Figure out the layer-based weight and output/error offsets int iw_offset = net->total_neuron_weights_before_layer( i int io_offset = net->total_neurons_before_layer( i int io_prev_offset = net->total_neurons_before_layer( i - 1 Adjust the weight for each neuron within the current layer. for( int j = 0; j < net->layer_sizes[i]; j++ ) { Mock up the kernel computation. kernel_backpropogation_apply_rate( i, net->weights, net->cached_weights, net->outputs, net->errors, params->learning_rate, io_offset, io_prev_offset, iw_offset, net->layer_sizes[i - 1], j void kernel_backpropogation_backfeed_errors( int layer_number, double *outputs, double *weights, double *errors, int iw_next_offset, int io_offset, int io_next_offset, int current_layer_size, int next_layer_size, int j ) { Do the backfeed of errors, but model it for kernel computation. double weighted_sum = 0.0f; Sum the weighted errors from the layer after the current one. for( int k = 0; k < next_layer_size; k++ ) { Figure out the neuron-based weight offset int kw_offset = k * current_layer_size; weighted_sum += errors[io_next_offset + k] * weights[iw_next_offset + j + kw_offset]; Set the error. errors[io_offset + j] = outputs[io_offset + j] * ( 1 - outputs[io_offset + j] ) * weighted_sum; void kernel_backpropogation_apply_momentum( int layer_number, double *weights, double *cached_weights, double learning_momentum, int iw_offset, int prev_layer_size, int j ) { Apply the momentum to the weights, but model it for kernel computation. double weighted_sum = 0.0f; Figure out the neuron-based weight int jw_offset = j * prev_layer_size; for( int k = 0; k < prev_layer_size; k++ ) { weights[iw_offset + jw_offset + k] += ( learning_momentum * cached_weights[iw_offset + jw_offset + k] void kernel_backpropogation_apply_rate( int layer_number, double *weights, double *cached_weights, double *outputs, double *errors, double learning_rate, int io_offset, int io_prev_offset, int iw_offset, int prev_layer_size, int j ) { 14
15 Apply the momentum to the weights, but model it for kernel computation. double weighted_sum = 0.0f; Figure out the neuron-based weight int jw_offset = j * prev_layer_size; for( int k = 0; k < prev_layer_size; k++ ) { cached_weights[iw_offset + jw_offset + k]= ( learning_rate * errors[io_offset + j] * outputs[io_prev_offset + k] weights[iw_offset + jw_offset + k] += cached_weights[iw_offset + jw_offset + k]; double calculate_sigmoid( double value ) { Calculate the sigmoid function for the value. return (double) ( 1 / ( 1 + exp( -value ) ) double get_mean_square_error( NeuralNetwork net, double *desired_outputs ) { Get the mean square error of the network based upon the desired outputs. double error = 0; Sum the error up from the output layer int i_offset = net->total_neurons_before_layer( net->layer_count - 1 for( int j = 0; j < net->layer_sizes[net->layer_count - 1]; j++ ) { error += ( ( desired_outputs[j] - net->outputs[i_offset + j] ) * ( desired_outputs[j] - net->outputs[i_offset + j] ) return error / 2; double get_output_value( NeuralNetwork net, int index ) { Return the specified output value from the network. int i_offset = net->total_neurons_before_layer( net->layer_count - 1 return net->outputs[i_offset + index]; int get_rounded_output_value( NeuralNetwork net, int index ) { Return the specified output value from the network, but rounded into an integer. int i_offset = net->total_neurons_before_layer( net->layer_count - 1 return (int) floor( net->outputs[i_offset + index] double do_network_training( NeuralNetwork net, TestInputs tests, TrainingParameters params ) { Iteratively train the neural network and report on the progress. double error = 0.0f; long iteration = 0, total_iterations = 0; float backprop_runtime, total_backprop_runtime = 0, runtime, total_runtime = 0; cout << endl << "Training the network:" << endl; for ( iteration = 0; iteration < params->training_max_iterations ; iteration++ ) { runtime = ( clock() / (double) ( CLOCKS_PER_SEC / 1000 ) 15
16 Setup to record the time total_iterations += 1; Train through backpropogation backprop_runtime = ( clock() / (float)( CLOCKS_PER_SEC / 1000 ) backpropogate( net, tests->input_values[iteration % tests->test_count], tests->desired_output_values[iteration % tests->test_count], params total_backprop_runtime += ( ( clock() / (double) ( CLOCKS_PER_SEC / 1000 ) ) - backprop_runtime How bad is the error? error = get_mean_square_error( net, tests->desired_output_values[iteration % tests->test_count] if( error < params->training_threshold ) { cout << "Network has been trained. It took " << iteration << " iterations." << endl; cout << "Final error is " << error << endl << endl; break; Report on the training process. if ( iteration % ( params->training_max_iterations / 10 ) == 0 ) { cout << "Current error is " << error << ". Continuing with training..." << endl; Add to the total runtime total_runtime += ( ( clock() / (double) ( CLOCKS_PER_SEC / 1000 ) ) - runtime if ( iteration == params->training_max_iterations ) { error = get_mean_square_error( net, tests->desired_output_values[(iteration - 1) % tests->test_count] cout << "Maximum of " << iteration << " iterations completed with error of " << error << endl; Write out the time for backpropogation. cout << endl << "Total time in backpropogation: " << setiosflags( ios::fixed ) << setprecision( 5 ) << ( total_backprop_runtime / 1000 ) << " seconds" << endl; cout << "Average time per backpropogation: " << setiosflags( ios::fixed ) << setprecision( 7 ) << ( total_backprop_runtime / total_iterations ) << " milliseconds" << endl << endl; cout << "Total time iterating: " << setiosflags( ios::fixed ) << setprecision( 5 ) << ( total_runtime / 1000 ) << " seconds" << endl; cout << "Average time per iteration: " << setiosflags( ios::fixed ) << setprecision( 7 ) << ( total_runtime / total_iterations ) << " milliseconds" << endl << endl; void apply_network_tests( NeuralNetwork net, TestInputs tests ) { Apply the tests to the neural network. Report on the success failure. int total_iterations = 0; float feedforward_runtime, total_feedforward_runtime = 0, runtime, total_runtime = 0; for ( int test_index = 0; test_index < tests->test_count; test_index++ ) { runtime = ( clock() / (double)( CLOCKS_PER_SEC / 1000 ) Setup to record the time total_iterations += 1; Start by feeding forward the provided test inputs. feedforward_runtime = ( clock() / (float)( CLOCKS_PER_SEC / 1000 ) 16
17 feedforward( net, tests->input_values[test_index] total_feedforward_runtime += ( ( clock() / (double)( CLOCKS_PER_SEC / 1000 ) ) - feedforward_runtime Now, report what the expected output is. cout << "For test input " << ( test_index + 1 ) << endl; cout << " Expected = "; for( int i = 0; i < tests->output_value_size; i++ ) { cout << (int) tests->desired_output_values[test_index][i]; cout << endl; Finally, report what the actual output was. cout << " Received = "; for( int i = 0; i < tests->output_value_size; i++ ) { cout << get_rounded_output_value( net, i cout << endl << endl; total_runtime += ( ( clock() / (double)( CLOCKS_PER_SEC / 1000 ) ) - runtime Write out the time for feedforward. cout << endl << "Total time in feedforward: " << setiosflags( ios::fixed ) << setprecision( 7 ) << ( total_feedforward_runtime / 1000 ) << " seconds " << endl; cout << "Average time per feedforward: " << setiosflags( ios::fixed ) << setprecision( 9 ) << ( total_feedforward_runtime / total_iterations ) << " milliseconds " << endl << endl; cout << "Total time iterating: " << setiosflags( ios::fixed ) << setprecision( 5 ) << ( total_runtime / 1000 ) << " seconds" << endl; cout << "Average time per iteration: " << setiosflags( ios::fixed ) << setprecision( 7 ) << ( total_runtime / total_iterations ) << " milliseconds" << endl << endl; NNetwork.h #ifndef nnetwork_h #define nnetwork_h #include <assert.h> #include <iostream> #include <iomanip> #include <stdio.h> #include <math.h> #include <time.h> using namespace std; typedef struct NeuralNetwork { These variables will hold information about the layers int layer_count; int *layer_sizes; This will hold the weights of the neurons. Used to be a double***. double *weights; This will preserve weights for later use. Used to be a double***. double *cached_weights; This will hold the output for the neurons. Used to be a double**. 17
18 double *outputs; This will hold the difference between the target training values and the current outputs. Used to be a double**. double *errors; int input_layer_size() { return layer_sizes[0]; int output_layer_size() { return layer_sizes[layer_count - 1]; int total_neurons_in_network() { int total = 0; for( int i = 0; i < layer_count; i++ ) total += layer_sizes[i]; return total; int total_neurons_before_layer( int layer_number ) { int total = 0; for( int i = 0; i < layer_number; i++ ) total += layer_sizes[i]; return total; int total_neuron_weights_in_network() { int total = 0; for( int i = 1; i < layer_count; i++ ) { total += ( layer_sizes[i - 1] * layer_sizes[i] return total; int total_neuron_weights_before_layer( int layer_number ) { int total = 0; for( int i = 1; i < layer_number; i++ ) { total += ( layer_sizes[i - 1] * layer_sizes[i] return total; *NeuralNetwork; typedef struct TrainingParameters { This setting determines how quickly the network will learn. double learning_rate; This setting determines the momentum of learning. double learning_momentum; This setting determines the point where the network is finished learning. double training_threshold; This setting determines the maximum number of iterations to train. long training_max_iterations; *TrainingParameters; typedef struct TestInput { 18
19 This will hold input values for this training input double **input_values; This will hold desired output values for this training input. double **desired_output_values; This will hold the number of tests stored. int test_count; This will hold the number of values that are stored each of the input and output values vector. int input_value_size; int output_value_size; *TestInputs; typedef struct NetworkLayerConfig { This will hold details about the network config. int layer_sizes[100]; int layer_count; int input_layer_size() { return layer_sizes[0]; int output_layer_size() { return layer_sizes[layer_count - 1]; int total_neurons_needed_for_network() { int total = 0; for( int i = 0; i < layer_count; i++ ) total += layer_sizes[i]; return total; int total_neuron_weights_needed_for_network() { int total = 0; for( int i = 1; i < layer_count; i++ ) { total += ( ( layer_sizes[i - 1] + 1 ) * layer_sizes[i] return total; *NetworkLayerConfig; typedef struct NNetworkConfig { This will hold onto the layer configuration. NetworkLayerConfig layer_config; This will hold onto training parameters. TrainingParameters params; This will hold onto training inputs. TestInputs tests; *NNetworkConfig; Function prototypes NeuralNetwork build_neural_network( NetworkLayerConfig layer_config void initialize_neural_network( NeuralNetwork net 19
20 void destroy_neural_network( NeuralNetwork net void feedforward( NeuralNetwork net, double *inputs void backpropogate( NeuralNetwork net, double *inputs, double *desired_outputs, TrainingParameters params double calculate_sigmoid( double value double get_mean_square_error( NeuralNetwork net, double *desired_outputs double get_output_value( NeuralNetwork net, int index int get_rounded_output_value( NeuralNetwork net, int index double do_network_training( NeuralNetwork net, TestInputs tests, TrainingParameters params void apply_network_tests( NeuralNetwork net, TestInputs tests void kernel_feedforward( int layer_number, double *outputs, double *weights, int iw_offset, int io_offset, int io_prev_offset, int prev_layer_size, int j void kernel_backpropogation_backfeed_errors( int layer_number, double *outputs, double *weights, double *errors, int iw_next_offset, int io_offset, int io_next_offset, int current_layer_size, int next_layer_size, int j void kernel_backpropogation_apply_momentum( int layer_number, double *weights, double *cached_weights, double learning_momentum, int iw_offset, int prev_layer_size, int j void kernel_backpropogation_apply_rate( int layer_number, double *weights, double *cached_weights, double *outputs, double *errors, double learning_rate, int io_offset, int io_prev_offset, int iw_offset, int prev_layer_size, int j #endif NNetwork.cpp #include "NNetwork.h" NeuralNetwork build_neural_network( NetworkLayerConfig layer_config ) { Build the neural network that corresponds to the layer configuration. NeuralNetwork net = (NeuralNetwork) malloc( sizeof( struct NeuralNetwork ) int total_neurons_needed = layer_config->total_neurons_needed_for_network( int total_weights_needed = layer_config->total_neuron_weights_needed_for_network( Setup the basic layer layout. net->layer_count = layer_config->layer_count; Copy the sizes of the layers. net->layer_sizes = (int*) malloc( sizeof( int ) * net->layer_count for( int i = 0; i < net->layer_count; i++ ) { net->layer_sizes[i] = layer_config->layer_sizes[i]; Setup the memory for the neuronal weights Total weight slots needed is given by the following: Sum from i to layer_count: layer_sizes[i - 1] * layer_sizes[i] See total_neuron_weights_needed_for_network() for details. net->weights = (double*) malloc( sizeof( double ) * total_weights_needed Setup the memory for caching of the neuronal weights. Total weight slots needed is given by the following: Sum from i to layer_count: layer_sizes[i - 1] * layer_sizes[i] See total_neuron_weights_needed_for_network() for details. net->cached_weights = (double*) malloc( sizeof( double ) * total_weights_needed Setup the memory of the outputs of the neurons. 20
21 Total output slots needed is given by the following: Sum from i to layer_count: layer_sizes[i] See total_neurons_needed_for_network() for details net->outputs = (double*) malloc( sizeof( double ) * total_neurons_needed Setup the memory of the errors of the neurons. Total error slots needed is given by the following: Sum from i to layer_count: layer_sizes[i] See total_neurons_needed_for_network() for details net->errors = (double*) malloc( sizeof( double ) * total_neurons_needed return net; void destroy_neural_network( NeuralNetwork net ) { Free memory from the network network. Delete the memory for the neuron weights. free( net->weights Delete the memory for the weight caching. free( net->cached_weights Delete the memory for the neuron outputs. free( net->outputs Delete the memory for the error (differences). free( net->errors Finally, clear out the layer sizes. free( net->layer_sizes void initialize_neural_network( NeuralNetwork net ) { Initial the weights of the network with random values and zero out the cache. int i_offset, j_offset; Seed the random number generator. srand( (unsigned) time( NULL ) Set the neuronal weights to random values. for( int i = 1; i < net->layer_count; i++ ) { i_offset = net->total_neuron_weights_before_layer( i for( int j = 0; j < net->layer_sizes[i]; j++ ) { This is the total number of weights in this layer prior to this neuron. j_offset = j * net->layer_sizes[i - 1]; 21
22 for( int k = 0; k < net->layer_sizes[i - 1] + 1; k++ ) { net->weights[i_offset + j_offset + k] = (double) ( rand() ) / ( RAND_MAX / 2 ) - 1; Zero out the weight cache. for( int i = 1; i < net->layer_count; i++ ) { i_offset = net->total_neuron_weights_before_layer( i for( int j = 0; j < net->layer_sizes[i]; j++ ) { This is the total number of weights in this layer prior to this neuron. j_offset = j * net->layer_sizes[i - 1]; for( int k = 0; k < net->layer_sizes[i - 1] + 1; k++ ) { net->cached_weights[i_offset + j_offset + k] = 0.0f; void feedforward( NeuralNetwork net, double *inputs ) { Feed the inputs forward through the neural network until the outpus are determined. double weighted_sum; Start by putting the inputs onto the input layer. for( int j = 0; j < net->layer_sizes[0]; j++ ) { net->outputs[0 + j] = inputs[j]; Now ripple the effect of the input across the layers. for( int i = 1; i < net->layer_count; i++ ) { Figure out the layer-based weight and output offsets int iw_offset = net->total_neuron_weights_before_layer( i int io_offset = net->total_neurons_before_layer( i int io_prev_offset = net->total_neurons_before_layer( i - 1 Apply the result to each neuron in the current layer. for( int j = 0; j < net->layer_sizes[i]; j++ ) { Mock up the kernel computation. kernel_feedforward( i, net->outputs, net->weights, iw_offset, io_offset, io_prev_offset, net- >layer_sizes[i - 1], j void kernel_feedforward( int layer_number, double *outputs, double *weights, int iw_offset, int io_offset, int io_prev_offset, int prev_layer_size, int j ) { Do the feedforward, but model it for kernel computation. double weighted_sum; Figure out the neuron-based weight int jw_offset = j * prev_layer_size; 22
23 Reset the sum. weighted_sum = 0.0f; Sum the outputs from the previous layer, adjusted by the connection weights. for( int k = 0; k < prev_layer_size; k++ ) { weighted_sum += outputs[io_prev_offset + k] * weights[iw_offset + jw_offset + k]; Now, for this neuron, set the output. outputs[io_offset + j] = calculate_sigmoid( weighted_sum + weights[iw_offset + jw_offset + prev_layer_size] void backpropogate( NeuralNetwork net, double *inputs, double *desired_outputs, TrainingParameters params ) { Feed the inputs forward through the neural network until the outpus are determined. Afterwards, turn around and neuro-connection weights so that they more reliably produce the desired output. double weighted_sum; Start by feeding forward the input values. This will put values onto the output nodes. We can then compare these to the desired values and backpropogate the changes. feedforward( net, inputs Calculate the error values for the output layer. int i_offset = net->total_neurons_before_layer( net->layer_count - 1 for( int j = 0; j < net->layer_sizes[net->layer_count - 1]; j++ ) { net->errors[i_offset + j] = ( net->outputs[i_offset + j] * ( 1 - net->outputs[i_offset + j] ) * ( desired_outputs[j] - net->outputs[i_offset + j] ) Calculate the error values for the hidden layers. for( int i = net->layer_count - 2; i > 0; i-- ) { Figure out layer-based weight and output/error offsets int iw_next_offset = net->total_neuron_weights_before_layer( i + 1 int io_offset = net->total_neurons_before_layer( i int io_next_offset = net->total_neurons_before_layer( i + 1 Calculate the error for each neuron in the layer. for( int j = 0; j < net->layer_sizes[i]; j++ ) { Mock up the kernel computation. kernel_backpropogation_backfeed_errors( i, net->outputs, net->weights, net->errors, iw_next_offset, io_offset, io_next_offset, net->layer_sizes[i], net->layer_sizes[i + 1], j 23
24 Adjust the weights according to the learning momentum for( int i = 1; i < net->layer_count; i++ ) { Figure out the layer-based weight and output/error offsets int iw_offset = net->total_neuron_weights_before_layer( i Adjust the weight for each neuron within the current layer. for( int j = 0; j < net->layer_sizes[i]; j++ ) { Mock up the kernel computation. kernel_backpropogation_apply_momentum( i, net->weights, net->cached_weights, params->learning_momentum, iw_offset, net->layer_sizes[i - 1], j Adjust weights according to the learning rate. Also, cache the weights. for( int i = 1; i < net->layer_count; i++ ) { Figure out the layer-based weight and output/error offsets int iw_offset = net->total_neuron_weights_before_layer( i int io_offset = net->total_neurons_before_layer( i int io_prev_offset = net->total_neurons_before_layer( i - 1 Adjust the weight for each neuron within the current layer. for( int j = 0; j < net->layer_sizes[i]; j++ ) { Mock up the kernel computation. kernel_backpropogation_apply_rate( i, net->weights, net->cached_weights, net->outputs, net->errors, params->learning_rate, io_offset, io_prev_offset, iw_offset, net->layer_sizes[i - 1], j void kernel_backpropogation_backfeed_errors( int layer_number, double *outputs, double *weights, double *errors, int iw_next_offset, int io_offset, int io_next_offset, int current_layer_size, int next_layer_size, int j ) { Do the backfeed of errors, but model it for kernel computation. double weighted_sum = 0.0f; Sum the weighted errors from the layer after the current one. for( int k = 0; k < next_layer_size; k++ ) { Figure out the neuron-based weight offset int kw_offset = k * current_layer_size; weighted_sum += errors[io_next_offset + k] * weights[iw_next_offset + j + kw_offset]; Set the error. errors[io_offset + j] = outputs[io_offset + j] * ( 1 - outputs[io_offset + j] ) * weighted_sum; void kernel_backpropogation_apply_momentum( int layer_number, double *weights, double *cached_weights, double learning_momentum, int iw_offset, 24
25 int prev_layer_size, int j ) { Apply the momentum to the weights, but model it for kernel computation. double weighted_sum = 0.0f; Figure out the neuron-based weight int jw_offset = j * prev_layer_size; for( int k = 0; k < prev_layer_size; k++ ) { weights[iw_offset + jw_offset + k] += ( learning_momentum * cached_weights[iw_offset + jw_offset + k] void kernel_backpropogation_apply_rate( int layer_number, double *weights, double *cached_weights, double *outputs, double *errors, double learning_rate, int io_offset, int io_prev_offset, int iw_offset, int prev_layer_size, int j ) { Apply the momentum to the weights, but model it for kernel computation. double weighted_sum = 0.0f; Figure out the neuron-based weight int jw_offset = j * prev_layer_size; for( int k = 0; k < prev_layer_size; k++ ) { cached_weights[iw_offset + jw_offset + k]= ( learning_rate * errors[io_offset + j] * outputs[io_prev_offset + k] weights[iw_offset + jw_offset + k] += cached_weights[iw_offset + jw_offset + k]; double calculate_sigmoid( double value ) { Calculate the sigmoid function for the value. return (double) ( 1 / ( 1 + exp( -value ) ) double get_mean_square_error( NeuralNetwork net, double *desired_outputs ) { Get the mean square error of the network based upon the desired outputs. double error = 0; Sum the error up from the output layer int i_offset = net->total_neurons_before_layer( net->layer_count - 1 for( int j = 0; j < net->layer_sizes[net->layer_count - 1]; j++ ) { error += ( ( desired_outputs[j] - net->outputs[i_offset + j] ) * ( desired_outputs[j] - net->outputs[i_offset + j] ) return error / 2; double get_output_value( NeuralNetwork net, int index ) { Return the specified output value from the network. int i_offset = net->total_neurons_before_layer( net->layer_count - 1 return net->outputs[i_offset + index]; 25
26 int get_rounded_output_value( NeuralNetwork net, int index ) { Return the specified output value from the network, but rounded into an integer. int i_offset = net->total_neurons_before_layer( net->layer_count - 1 return (int) floor( net->outputs[i_offset + index] double do_network_training( NeuralNetwork net, TestInputs tests, TrainingParameters params ) { Iteratively train the neural network and report on the progress. double error = 0.0f; long iteration = 0, total_iterations = 0; float backprop_runtime, total_backprop_runtime = 0, runtime, total_runtime = 0; cout << endl << "Training the network:" << endl; for ( iteration = 0; iteration < params->training_max_iterations ; iteration++ ) { runtime = ( clock() / (double) ( CLOCKS_PER_SEC / 1000 ) Setup to record the time total_iterations += 1; Train through backpropogation backprop_runtime = ( clock() / (float)( CLOCKS_PER_SEC / 1000 ) backpropogate( net, tests->input_values[iteration % tests->test_count], tests->desired_output_values[iteration % tests->test_count], params total_backprop_runtime += ( ( clock() / (double) ( CLOCKS_PER_SEC / 1000 ) ) - backprop_runtime How bad is the error? error = get_mean_square_error( net, tests->desired_output_values[iteration % tests->test_count] if( error < params->training_threshold ) { cout << "Network has been trained. It took " << iteration << " iterations." << endl; cout << "Final error is " << error << endl << endl; break; Report on the training process. if ( iteration % ( params->training_max_iterations / 10 ) == 0 ) { cout << "Current error is " << error << ". Continuing with training..." << endl; Add to the total runtime total_runtime += ( ( clock() / (double) ( CLOCKS_PER_SEC / 1000 ) ) - runtime if ( iteration == params->training_max_iterations ) { error = get_mean_square_error( net, tests->desired_output_values[(iteration - 1) % tests->test_count] cout << "Maximum of " << iteration << " iterations completed with error of " << error << endl; Write out the time for backpropogation. cout << endl << "Total time in backpropogation: " << setiosflags( ios::fixed ) << setprecision( 5 ) << ( total_backprop_runtime / 1000 ) << " seconds" << endl; cout << "Average time per backpropogation: " << setiosflags( ios::fixed ) << setprecision( 7 ) << ( total_backprop_runtime / total_iterations ) << " milliseconds" << endl << endl; cout << "Total time iterating: " << setiosflags( ios::fixed ) << setprecision( 5 ) << ( total_runtime / 1000 ) << " seconds" << endl; 26
27 cout << "Average time per iteration: " << setiosflags( ios::fixed ) << setprecision( 7 ) << ( total_runtime / total_iterations ) << " milliseconds" << endl << endl; void apply_network_tests( NeuralNetwork net, TestInputs tests ) { Apply the tests to the neural network. Report on the success failure. int total_iterations = 0; float feedforward_runtime, total_feedforward_runtime = 0, runtime, total_runtime = 0; for ( int test_index = 0; test_index < tests->test_count; test_index++ ) { runtime = ( clock() / (double)( CLOCKS_PER_SEC / 1000 ) Setup to record the time total_iterations += 1; Start by feeding forward the provided test inputs. feedforward_runtime = ( clock() / (float)( CLOCKS_PER_SEC / 1000 ) feedforward( net, tests->input_values[test_index] total_feedforward_runtime += ( ( clock() / (double)( CLOCKS_PER_SEC / 1000 ) ) - feedforward_runtime Now, report what the expected output is. cout << "For test input " << ( test_index + 1 ) << endl; cout << " Expected = "; for( int i = 0; i < tests->output_value_size; i++ ) { cout << (int) tests->desired_output_values[test_index][i]; cout << endl; Finally, report what the actual output was. cout << " Received = "; for( int i = 0; i < tests->output_value_size; i++ ) { cout << get_rounded_output_value( net, i cout << endl << endl; total_runtime += ( ( clock() / (double)( CLOCKS_PER_SEC / 1000 ) ) - runtime Write out the time for feedforward. cout << endl << "Total time in feedforward: " << setiosflags( ios::fixed ) << setprecision( 7 ) << ( total_feedforward_runtime / 1000 ) << " seconds " << endl; cout << "Average time per feedforward: " << setiosflags( ios::fixed ) << setprecision( 9 ) << ( total_feedforward_runtime / total_iterations ) << " milliseconds " << endl << endl; cout << "Total time iterating: " << setiosflags( ios::fixed ) << setprecision( 5 ) << ( total_runtime / 1000 ) << " seconds" << endl; cout << "Average time per iteration: " << setiosflags( ios::fixed ) << setprecision( 7 ) << ( total_runtime / total_iterations ) << " milliseconds" << endl << endl; Common To All Vector Versions RunNNetwork.cu #include "NNetwork.h" #include "NNetworkUtils.h" #include "NNetworkCuda.h" bool check_command_line( int argc, char* argv[] ) { Make sure that the correct arguments were passed. 27
28 FILE *fp = NULL; bool ok = true; if( argc < 3 ) { cout << "Format: RunNNetwork <network configuration file> <network test file>" << endl; cout << "Arguments:" << endl; cout << " network configuration file - This file should contain parameters for" << endl; cout << " network size, training rate, etc as " << endl; cout << " a set of data to train the network" << endl; cout << " network test file - This file should contain data for testing the " << endl; cout << " network after it has been trained" << endl; ok = false; else { Make sure that the configuration file exists. if( fp = fopen( argv[1], "r" ) ) { fclose( fp else { cout << "Specified network configuration file [" << argv[1] << "] doesn't exist or cannot be opened" << endl; ok = false; Make sure that the test file exists if( fp = fopen( argv[2], "r" ) ) { fclose( fp else { cout << "Specified network test file [" << argv[2] << "] doesn't exist or cannot be opened" << endl; ok = false; return ok; int main( int argc, char* argv[] ) { Main function for running the network First make sure that the user has provided the necessary input. if (!check_command_line( argc, argv ) ) { return 1; If we received a 3rd argument, then it must be the GPU number. if ( argc == 4 ) { Select the GPU that was called for. selectgpubynumber( argv[3] Make sure that CUDA resources get cleaned up on exit. atexit( cleanupcuda Read in the network configuration. NNetworkConfig nnc = read_network_configuration( argv[1] Read in the network tests. TestInputs tests = read_network_tests( argv[2], nnc->layer_config->input_layer_size(), 28
29 nnc->layer_config->output_layer_size() Build the neural network. NeuralNetwork net = build_neural_network( nnc->layer_config Initialize the network to begin with. initialize_neural_network( net Train the network. do_network_training( net, nnc->tests, nnc->params Test the network and report on results. cout << "Applying test data to network:" << endl; apply_network_tests( net, tests Free up the memory associated with the neural network. destroy_neural_network( net free( net return 0; NNetworkCuda.h #ifndef nnetworkcuda_h #define nnetworkcuda_h #include <stdio.h> #include <cuda.h> #define err ) ( HandleError( err, FILE, LINE ) ) Prototypes void HandleError( cudaerror_t err, const char *file, int line void checkcudaerror( const char *msg, bool exitonerror void selectgpubynumber( char *device_number void cleanupcuda( void #endif NNetworkCuda.cu #include "NNetworkCuda.h" void HandleError( cudaerror_t err, const char *file, int line ) { Handle and report on CUDA errors. if ( err!= cudasuccess ) { printf( "%s in %s at line %d\n", cudageterrorstring( err ), file, line exit( EXIT_FAILURE void checkcudaerror( const char *msg, bool exitonerror ) { Check cuda error and print result if appropriate. cudaerror_t err = cudagetlasterror( if( cudasuccess!= err) { fprintf(stderr, "Cuda error: %s: %s.\n", msg, cudageterrorstring(err) if (exitonerror) { 29
Parallel Processing Neural Networks on SIMD/GPU Architectures. CSC7551 Derek Kern December 8th, 2011
Parallel Processing Neural Networks on SIMD/GPU Architectures CSC7551 Derek Kern December 8th, 2011 Quick Apology I have 80 slides and ~75 minutes So, we are going to move pretty fast I apologize in advance
More informationHW4-2. float phi[size][size]={};!
HW 4 #include #include //atoi #include #include #include #include //timing routines #include #include #define SIZE 256 using
More informationCSCI-243 Exam 1 Review February 22, 2015 Presented by the RIT Computer Science Community
CSCI-243 Exam 1 Review February 22, 2015 Presented by the RIT Computer Science Community http://csc.cs.rit.edu History and Evolution of Programming Languages 1. Explain the relationship between machine
More informationECE264 Spring 2014 Exam 2, March 11, 2014
ECE264 Spring 2014 Exam 2, March 11, 2014 In signing this statement, I hereby certify that the work on this exam is my own and that I have not copied the work of any other student while completing it.
More informationChapter 15 - C++ As A "Better C"
Chapter 15 - C++ As A "Better C" Outline 15.1 Introduction 15.2 C++ 15.3 A Simple Program: Adding Two Integers 15.4 C++ Standard Library 15.5 Header Files 15.6 Inline Functions 15.7 References and Reference
More informationCS2141 Software Development using C/C++ C++ Basics
CS2141 Software Development using C/C++ C++ Basics Integers Basic Types Can be short, long, or just plain int C++ does not define the size of them other than short
More informationLab Instructor : Jean Lai
Lab Instructor : Jean Lai Group related statements to perform a specific task. Structure the program (No duplicate codes!) Must be declared before used. Can be invoked (called) as any number of times.
More informationFunctions in C++ Problem-Solving Procedure With Modular Design C ++ Function Definition: a single
Functions in C++ Problem-Solving Procedure With Modular Design: Program development steps: Analyze the problem Develop a solution Code the solution Test/Debug the program C ++ Function Definition: A module
More informationLecture 3. Review. CS 141 Lecture 3 By Ziad Kobti -Control Structures Examples -Built-in functions. Conditions: Loops: if( ) / else switch
Lecture 3 CS 141 Lecture 3 By Ziad Kobti -Control Structures Examples -Built-in functions Review Conditions: if( ) / else switch Loops: for( ) do...while( ) while( )... 1 Examples Display the first 10
More informationScientific discovery, analysis and prediction made possible through high performance computing.
Scientific discovery, analysis and prediction made possible through high performance computing. An Introduction to GPGPU Programming Bob Torgerson Arctic Region Supercomputing Center November 21 st, 2013
More informationFunctions in C C Programming and Software Tools. N.C. State Department of Computer Science
Functions in C C Programming and Software Tools N.C. State Department of Computer Science Functions in C Functions are also called subroutines or procedures One part of a program calls (or invokes the
More informationOpenACC. Arthur Lei, Michelle Munteanu, Michael Papadopoulos, Philip Smith
OpenACC Arthur Lei, Michelle Munteanu, Michael Papadopoulos, Philip Smith 1 Introduction For this introduction, we are assuming you are familiar with libraries that use a pragma directive based structure,
More informationFunctions in C C Programming and Software Tools
Functions in C C Programming and Software Tools N.C. State Department of Computer Science Functions in C Functions are also called subroutines or procedures One part of a program calls (or invokes the
More informationRicardo Rocha. Department of Computer Science Faculty of Sciences University of Porto
Ricardo Rocha Department of Computer Science Faculty of Sciences University of Porto Adapted from the slides Revisões sobre Programação em C, Sérgio Crisóstomo Compilation #include int main()
More information04. CUDA Data Transfer
04. CUDA Data Transfer Fall Semester, 2015 COMP427 Parallel Programming School of Computer Sci. and Eng. Kyungpook National University 2013-5 N Baek 1 CUDA Compute Unified Device Architecture General purpose
More informationCUDA Optimizations WS Intelligent Robotics Seminar. Universität Hamburg WS Intelligent Robotics Seminar Praveen Kulkarni
CUDA Optimizations WS 2014-15 Intelligent Robotics Seminar 1 Table of content 1 Background information 2 Optimizations 3 Summary 2 Table of content 1 Background information 2 Optimizations 3 Summary 3
More informationDr M Kasim A Jalil. Faculty of Mechanical Engineering UTM (source: Deitel Associates & Pearson)
Lecture 9 Functions Dr M Kasim A Jalil Faculty of Mechanical Engineering UTM (source: Deitel Associates & Pearson) Objectives In this chapter, you will learn: To understand how to construct programs modularly
More informationcs3157: another C lecture (mon-21-feb-2005) C pre-processor (3).
cs3157: another C lecture (mon-21-feb-2005) C pre-processor (1). today: C pre-processor command-line arguments more on data types and operators: booleans in C logical and bitwise operators type conversion
More informationFast Introduction to Object Oriented Programming and C++
Fast Introduction to Object Oriented Programming and C++ Daniel G. Aliaga Note: a compilation of slides from Jacques de Wet, Ohio State University, Chad Willwerth, and Daniel Aliaga. Outline Programming
More informationChapter 14 - Advanced C Topics
Chapter 14 - Advanced C Topics Outline 14.1 Introduction 14.2 Redirecting Input/Output on UNIX and DOS Systems 14.3 Variable-Length Argument Lists 14.4 Using Command-Line Arguments 14.5 Notes on Compiling
More informationAdvanced Topics in CUDA C
Advanced Topics in CUDA C S. Sundar and M. Panchatcharam August 9, 2014 S. Sundar and M. Panchatcharam ( IIT Madras, ) Advanced CUDA August 9, 2014 1 / 36 Outline 1 Julia Set 2 Julia GPU 3 Compilation
More informationShort Notes of CS201
#includes: Short Notes of CS201 The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with < and > if the file is a system
More informationIntroduction to C++ Systems Programming
Introduction to C++ Systems Programming Introduction to C++ Syntax differences between C and C++ A Simple C++ Example C++ Input/Output C++ Libraries C++ Header Files Another Simple C++ Example Inline Functions
More informationCS201 - Introduction to Programming Glossary By
CS201 - Introduction to Programming Glossary By #include : The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with
More informationLinked List using a Sentinel
Linked List using a Sentinel Linked List.h / Linked List.h Using a sentinel for search Created by Enoch Hwang on 2/1/10. Copyright 2010 La Sierra University. All rights reserved. / #include
More informationKurt Schmidt. October 30, 2018
to Structs Dept. of Computer Science, Drexel University October 30, 2018 Array Objectives to Structs Intended audience: Student who has working knowledge of Python To gain some experience with a statically-typed
More informationFunctions. Computer System and programming in C Prentice Hall, Inc. All rights reserved.
Functions In general, functions are blocks of code that perform a number of pre-defined commands to accomplish something productive. You can either use the built-in library functions or you can create
More informationFunctions. Angela Chih-Wei Tang ( 唐之瑋 ) Department of Communication Engineering National Central University JhongLi, Taiwan.
Functions Angela Chih-Wei Tang ( 唐之瑋 ) Department of Communication Engineering National Central University JhongLi, Taiwan 2009 Fall Outline 5.1 Introduction 5.3 Math Library Functions 5.4 Functions 5.5
More informationMultiple Choice (Questions 1 14) 28 Points Select all correct answers (multiple correct answers are possible)
Name Closed notes, book and neighbor. If you have any questions ask them. Notes: Segment of code necessary C++ statements to perform the action described not a complete program Program a complete C++ program
More informationChapter 10 - Notes Applications of Arrays
Chapter - Notes Applications of Arrays I. List Processing A. Definition: List - A set of values of the same data type. B. Lists and Arrays 1. A convenient way to store a list is in an array, probably a
More informationMultiple Choice (Questions 1 14) 28 Points Select all correct answers (multiple correct answers are possible)
Name Closed notes, book and neighbor. If you have any questions ask them. Notes: Segment of code necessary C++ statements to perform the action described not a complete program Program a complete C++ program
More informationDistributed Real-Time Control Systems. Lecture 17 C++ Programming Intro to C++ Objects and Classes
Distributed Real-Time Control Systems Lecture 17 C++ Programming Intro to C++ Objects and Classes 1 Bibliography Classical References Covers C++ 11 2 What is C++? A computer language with object oriented
More informationCSCI 171 Chapter Outlines
Contents CSCI 171 Chapter 1 Overview... 2 CSCI 171 Chapter 2 Programming Components... 3 CSCI 171 Chapter 3 (Sections 1 4) Selection Structures... 5 CSCI 171 Chapter 3 (Sections 5 & 6) Iteration Structures
More informationMatlab? Chapter 3-4 Matlab and IPT Basics. Working Environment. Matlab Demo. Array. Data Type. MATLAB Desktop:
Matlab? Lecture Slides ME 4060 Machine Vision and Vision-based Control Chapter 3-4 Matlab and IPT Basics By Dr. Debao Zhou 1 MATric LABoratory data analysis, prototype and visualization Matrix operation
More informationCUDA. Schedule API. Language extensions. nvcc. Function type qualifiers (1) CUDA compiler to handle the standard C extensions.
Schedule CUDA Digging further into the programming manual Application Programming Interface (API) text only part, sorry Image utilities (simple CUDA examples) Performace considerations Matrix multiplication
More informationPCAP Assignment I. 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail.
PCAP Assignment I 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail. The multicore CPUs are designed to maximize the execution speed
More informationCS3157: Advanced Programming. Outline
CS3157: Advanced Programming Lecture #8 Feb 27 Shlomo Hershkop shlomo@cs.columbia.edu 1 Outline More c Preprocessor Bitwise operations Character handling Math/random Review for midterm Reading: k&r ch
More informationCSE123. Program Design and Modular Programming Functions 1-1
CSE123 Program Design and Modular Programming Functions 1-1 5.1 Introduction A function in C is a small sub-program performs a particular task, supports the concept of modular programming design techniques.
More information1 PHASE1PRUNE INTRODUCTION 1
1 PHASE1PRUNE INTRODUCTION 1 1. Introduction. Phase one of Kociemba s two-phase algorithm involves finding a sequence of moves that takes an arbitrary position into the H group, generated by U, F 2, R2,
More informationPointers, Dynamic Data, and Reference Types
Pointers, Dynamic Data, and Reference Types Review on Pointers Reference Variables Dynamic Memory Allocation The new operator The delete operator Dynamic Memory Allocation for Arrays 1 C++ Data Types simple
More informationCS 326 Operating Systems C Programming. Greg Benson Department of Computer Science University of San Francisco
CS 326 Operating Systems C Programming Greg Benson Department of Computer Science University of San Francisco Why C? Fast (good optimizing compilers) Not too high-level (Java, Python, Lisp) Not too low-level
More informationAgenda. The main body and cout. Fundamental data types. Declarations and definitions. Control structures
The main body and cout Agenda 1 Fundamental data types Declarations and definitions Control structures References, pass-by-value vs pass-by-references The main body and cout 2 C++ IS AN OO EXTENSION OF
More informationCOMP322 - Introduction to C++ Lecture 02 - Basics of C++
COMP322 - Introduction to C++ Lecture 02 - Basics of C++ School of Computer Science 16 January 2012 C++ basics - Arithmetic operators Where possible, C++ will automatically convert among the basic types.
More informationProgramming. C++ Basics
Programming C++ Basics Introduction to C++ C is a programming language developed in the 1970s with the UNIX operating system C programs are efficient and portable across different hardware platforms C++
More informationCSE au Midterm Exam Nov. 2, 2018 Sample Solution
Question 1. (16 points) Build tools and make. We re building a C++ software back-end prototype for a new food web site. So far, we ve got the following source files with the code for two main programs
More informationCUDA Lecture 2. Manfred Liebmann. Technische Universität München Chair of Optimal Control Center for Mathematical Sciences, M17
CUDA Lecture 2 Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences, M17 manfred.liebmann@tum.de December 15, 2015 CUDA Programming Fundamentals CUDA
More informationC Review. MaxMSP Developers Workshop Summer 2009 CNMAT
C Review MaxMSP Developers Workshop Summer 2009 CNMAT C Syntax Program control (loops, branches): Function calls Math: +, -, *, /, ++, -- Variables, types, structures, assignment Pointers and memory (***
More informationLecture 04 FUNCTIONS AND ARRAYS
Lecture 04 FUNCTIONS AND ARRAYS 1 Motivations Divide hug tasks to blocks: divide programs up into sets of cooperating functions. Define new functions with function calls and parameter passing. Use functions
More informationC Functions. 5.2 Program Modules in C
1 5 C Functions 5.2 Program Modules in C 2 Functions Modules in C Programs combine user-defined functions with library functions - C standard library has a wide variety of functions Function calls Invoking
More informationCommon Misunderstandings from Exam 1 Material
Common Misunderstandings from Exam 1 Material Kyle Dewey Stack and Heap Allocation with Pointers char c = c ; char* p1 = malloc(sizeof(char)); char** p2 = &p1; Where is c allocated? Where is p1 itself
More informationMain Program. C Programming Notes. #include <stdio.h> main() { printf( Hello ); } Comments: /* comment */ //comment. Dr. Karne Towson University
C Programming Notes Dr. Karne Towson University Reference for C http://www.cplusplus.com/reference/ Main Program #include main() printf( Hello ); Comments: /* comment */ //comment 1 Data Types
More informationThe output: The address of i is 0xbf85416c. The address of main is 0x80483e4. arrays.c. 1 #include <stdio.h> 3 int main(int argc, char **argv) 4 {
Memory A bit is a binary digit, either 0 or 1. A byte is eight bits, and can thus represent 256 unique values, such as 00000000 and 10010110. Computer scientists often think in terms of hexadecimal, rather
More informationTHE C STANDARD LIBRARY & MAKING YOUR OWN LIBRARY. ISA 563: Fundamentals of Systems Programming
THE C STANDARD LIBRARY & MAKING YOUR OWN LIBRARY ISA 563: Fundamentals of Systems Programming Announcements Homework 2 posted Homework 1 due in two weeks Typo on HW1 (definition of Fib. Sequence incorrect)
More informationCSE 333 Final Exam June 6, 2017 Sample Solution
Question 1. (24 points) Some C and POSIX I/O programming. Given an int file descriptor returned by open(), write a C function ReadFile that reads the entire file designated by that file descriptor and
More informationCSE 333 Autumn 2013 Midterm
CSE 333 Autumn 2013 Midterm Please do not read beyond this cover page until told to start. A question involving what could be either C or C++ is about C, unless it explicitly states that it is about C++.
More informationCSE 333 Midterm Exam July 24, Name UW ID#
Name UW ID# There are 6 questions worth a total of 100 points. Please budget your time so you get to all of the questions. Keep your answers brief and to the point. The exam is closed book, closed notes,
More informationChapter Four: Loops. Slides by Evan Gallagher. C++ for Everyone by Cay Horstmann Copyright 2012 by John Wiley & Sons. All rights reserved
Chapter Four: Loops Slides by Evan Gallagher The Three Loops in C++ C++ has these three looping statements: while for do The while Loop while (condition) { statements } The condition is some kind of test
More information10/23/02 21:20:33 IO_Examples
1 Oct 22 22:07 2000 extractor1.c Page 1 istream &operator>>( istream &in, Point &p ){ char junk; in >> junk >> p.x >> junk >> p.y >> junk; return in; 2 Oct 22 22:07 2000 extractor2.c Page 1 istream &operator>>(
More informationESC101N: Fundamentals of Computing End-sem st semester
ESC101N: Fundamentals of Computing End-sem 2010-11 1st semester Instructor: Arnab Bhattacharya 8:00-11:00am, 15th November, 2010 Instructions 1. Please write your name, roll number and section below. 2.
More informationBuilding on the foundation. Now that we know a little about cout cin math operators boolean operators making decisions using if statements
Chapter 5 Looping Building on the foundation Now that we know a little about cout cin math operators boolean operators making decisions using if statements Advantages of Computers Computers are really
More informationIntroduction to GPU Computing. Design and Analysis of Parallel Algorithms
Introduction to GPU Computing Design and Analysis of Parallel Algorithms Sources CUDA Programming Guide (3.2) CUDA Best Practices Guide (3.2) CUDA Toolkit Reference Manual (3.2) CUDA SDK Examples Part
More informationChapter 3 - Functions
Chapter 3 - Functions 1 Outline 3.1 Introduction 3.2 Program Components in C++ 3.3 Math Library Functions 3.4 Functions 3.5 Function Definitions 3.6 Function Prototypes 3.7 Header Files 3.8 Random Number
More informationCPSC 427: Object-Oriented Programming
CPSC 427: Object-Oriented Programming Michael J. Fischer Lecture 10 October 1, 2018 CPSC 427, Lecture 10, October 1, 2018 1/20 Brackets Example (continued from lecture 8) Stack class Brackets class Main
More informationTutorial 13 Salary Survey Application: Introducing One- Dimensional Arrays
Tutorial 13 Salary Survey Application: Introducing One- Dimensional Arrays Outline 13.1 Test-Driving the Salary Survey Application 13.2 Introducing Arrays 13.3 Declaring and Initializing Arrays 13.4 Constructing
More informationLab 6. Review of Variables, Formatting & Loops By: Dr. John Abraham, Professor, UTPA
Variables: Lab 6 Review of Variables, Formatting & Loops By: Dr. John Abraham, Professor, UTPA We learned that a variable is a name assigned to the first byte of the necessary memory to store a value.
More informationMy malloc: mylloc and mhysa. Johan Montelius HT2016
1 Introduction My malloc: mylloc and mhysa Johan Montelius HT2016 So this is an experiment where we will implement our own malloc. We will not implement the world s fastest allocator, but it will work
More informationThe American University in Cairo Department of Computer Science & Engineering CSCI &09 Dr. KHALIL Exam-I Fall 2011
The American University in Cairo Department of Computer Science & Engineering CSCI 106-07&09 Dr. KHALIL Exam-I Fall 2011 Last Name :... ID:... First Name:... Form I Section No.: EXAMINATION INSTRUCTIONS
More informationInteger Data Types. Data Type. Data Types. int, short int, long int
Data Types Variables are classified according to their data type. The data type determines the kind of information that may be stored in the variable. A data type is a set of values. Generally two main
More informationThe following program computes a Calculus value, the "trapezoidal approximation of
Multicore machines and shared memory Multicore CPUs have more than one core processor that can execute instructions at the same time. The cores share main memory. In the next few activities, we will learn
More informationCSCI-243 Exam 2 Review February 22, 2015 Presented by the RIT Computer Science Community
CSCI-43 Exam Review February, 01 Presented by the RIT Computer Science Community http://csc.cs.rit.edu C Preprocessor 1. Consider the following program: 1 # include 3 # ifdef WINDOWS 4 # include
More informationCA341 - Comparative Programming Languages
CA341 - Comparative Programming Languages David Sinclair Dynamic Data Structures Generally we do not know how much data a program will have to process. There are 2 ways to handle this: Create a fixed data
More informationCS 376b Computer Vision
CS 376b Computer Vision 09 / 25 / 2014 Instructor: Michael Eckmann Today s Topics Questions? / Comments? Enhancing images / masks Cross correlation Convolution C++ Cross-correlation Cross-correlation involves
More informationC Syntax Arrays and Loops Math Strings Structures Pointers File I/O. Final Review CS Prof. Jonathan Ventura. Prof. Jonathan Ventura Final Review
CS 2060 Variables Variables are statically typed. Variables must be defined before they are used. You only specify the type name when you define the variable. int a, b, c; float d, e, f; char letter; //
More information1- Write a single C++ statement that: A. Calculates the sum of the two integrates 11 and 12 and outputs the sum to the consol.
1- Write a single C++ statement that: A. Calculates the sum of the two integrates 11 and 12 and outputs the sum to the consol. B. Outputs to the console a floating point number f1 in scientific format
More informationReview Topics. Final Exam Review Slides
Review Topics Final Exam Review Slides!! Transistors and Gates! Combinational Logic! LC-3 Programming!! Original slides from Gregory Byrd, North Carolina State University Modified slides by Chris Wilcox,
More informationChapter Four: Loops II
Chapter Four: Loops II Slides by Evan Gallagher & Nikolay Kirov Chapter Goals To understand nested loops To implement programs that read and process data sets To use a computer for simulations Processing
More informationECE264 Fall 2013 Exam 3, November 20, 2013
ECE264 Fall 2013 Exam 3, November 20, 2013 In signing this statement, I hereby certify that the work on this exam is my own and that I have not copied the work of any other student while completing it.
More informationReference operator (&)
Pointers Each cell can be easily located in the memory because it has a unique address and all the memory cells follow a successive pattern. For example, if we are looking for cell 1776 we know that it
More informationGPU Programming. Rupesh Nasre.
GPU Programming Rupesh Nasre. http://www.cse.iitm.ac.in/~rupesh IIT Madras July 2017 Debugging Debugging parallel programs is difficult. Non-determinism due to thread-scheduling Output can be different
More informationTwo s Complement Review. Two s Complement Review. Agenda. Agenda 6/21/2011
Two s Complement Review CS 61C: Great Ideas in Computer Architecture (Machine Structures) Introduction to C (Part I) Instructor: Michael Greenbaum http://inst.eecs.berkeley.edu/~cs61c/su11 Suppose we had
More informationWhen you add a number to a pointer, that number is added, but first it is multiplied by the sizeof the type the pointer points to.
Refresher When you add a number to a pointer, that number is added, but first it is multiplied by the sizeof the type the pointer points to. i.e. char *ptr1 = malloc(1); ptr1 + 1; // adds 1 to pointer
More informationECE264 Fall 2013 Exam 2, October 24, 2013
ECE Fall 0 Exam, October, 0 If this is an on-line exam, you have 0 minutes to finish the exam. When the time limit is reached, the system will automatically close. If this is a paper exam, you have 0 minutes.
More informationMultiple Choice (Questions 1 13) 26 Points Select all correct answers (multiple correct answers are possible)
Name Closed notes, book and neighbor. If you have any questions ask them. Notes: Segment of code necessary C++ statements to perform the action described not a complete program Program a complete C++ program
More informationCS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS
CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS 1 Last time Each block is assigned to and executed on a single streaming multiprocessor (SM). Threads execute in groups of 32 called warps. Threads in
More informationVariables. Data Types.
Variables. Data Types. The usefulness of the "Hello World" programs shown in the previous section is quite questionable. We had to write several lines of code, compile them, and then execute the resulting
More informationProgramming in C. Pointers and Arrays
Programming in C Pointers and Arrays NEXT SET OF SLIDES FROM DENNIS FREY S FALL 2011 CMSC313 http://www.csee.umbc.edu/courses/undergraduate/313/fall11/" Pointers and Arrays In C, there is a strong relationship
More informationCUDA Programming Model
CUDA Xing Zeng, Dongyue Mou Introduction Example Pro & Contra Trend Introduction Example Pro & Contra Trend Introduction What is CUDA? - Compute Unified Device Architecture. - A powerful parallel programming
More informationFor personnal use only
Inverting Large Images Using CUDA Finnbarr P. Murphy (fpm@fpmurphy.com) This is a simple example of how to invert a very large image, stored as a vector using nvidia s CUDA programming environment and
More informationOptimizing CUDA for GPU Architecture. CSInParallel Project
Optimizing CUDA for GPU Architecture CSInParallel Project August 13, 2014 CONTENTS 1 CUDA Architecture 2 1.1 Physical Architecture........................................... 2 1.2 Virtual Architecture...........................................
More informationA Crash Course in C. Steven Reeves
A Crash Course in C Steven Reeves This class will rely heavily on C and C++. As a result this section will help students who are not familiar with C or who need a refresher. By the end of this section
More informationCopyright 2013 Thomas W. Doeppner. IX 1
Copyright 2013 Thomas W. Doeppner. IX 1 If we have only one thread, then, no matter how many processors we have, we can do only one thing at a time. Thus multiple threads allow us to multiplex the handling
More informationBIL 104E Introduction to Scientific and Engineering Computing. Lecture 4
BIL 104E Introduction to Scientific and Engineering Computing Lecture 4 Introduction Divide and Conquer Construct a program from smaller pieces or components These smaller pieces are called modules Functions
More informationNon-numeric types, boolean types, arithmetic. operators. Comp Sci 1570 Introduction to C++ Non-numeric types. const. Reserved words.
, ean, arithmetic s s on acters Comp Sci 1570 Introduction to C++ Outline s s on acters 1 2 3 4 s s on acters Outline s s on acters 1 2 3 4 s s on acters ASCII s s on acters ASCII s s on acters Type: acter
More informationGPU 1. CSCI 4850/5850 High-Performance Computing Spring 2018
GPU 1 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives
More informationCOMPILER-ASSISTED TEST ACCELERATION ON GPUS FOR EMBEDDED SOFTWARE
COMPILER-ASSISTED TEST ACCELERATION ON GPUS FOR EMBEDDED SOFTWARE VANYA YANEVA Ajitha Rajan, Christophe Dubach ISSTA 2017 10 July 2017 Santa Barbara, CA EMBEDDED SOFTWARE IS EVERYWHERE ITS SAFETY AND CORRECTNESS
More informationC: How to Program. Week /Apr/23
C: How to Program Week 9 2007/Apr/23 1 Review of Chapters 1~5 Chapter 1: Basic Concepts on Computer and Programming Chapter 2: printf and scanf (Relational Operators) keywords Chapter 3: if (if else )
More informationComputing and Statistical Data Analysis Lecture 3
Computing and Statistical Data Analysis Lecture 3 Type casting: static_cast, etc. Basic mathematical functions More i/o: formatting tricks Scope, namspaces Functions 1 Type casting Often we need to interpret
More informationtoday cs3157-fall2002-sklar-lect05 1
today homework #1 due on monday sep 23, 6am some miscellaneous topics: logical operators random numbers character handling functions FILE I/O strings arrays pointers cs3157-fall2002-sklar-lect05 1 logical
More informationProject 1: Convex hulls and line segment intersection
MCS 481 / David Dumas / Spring 2014 Project 1: Convex hulls and line segment intersection Due at 10am on Monday, February 10 0. Prerequisites For this project it is expected that you already have CGAL
More informationCSE 333 Midterm Exam Sample Solution 7/28/14
Question 1. (20 points) C programming. For this question implement a C function contains that returns 1 (true) if a given C string appears as a substring of another C string starting at a given position.
More information