BLOCK-BASED NEURAL NETWORK MAPPING ON GRAPHICS PROCESSOR UNIT ONG CHIN TONG UNIVERSITI TEKNOLOGI MALAYSIA

Similar documents
HARDWARE/SOFTWARE SYSTEM-ON-CHIP CO-VERIFICATION PLATFORM BASED ON LOGIC-BASED ENVIRONMENT FOR APPLICATION PROGRAMMING INTERFACING TEO HONG YAP

ISOGEOMETRIC ANALYSIS OF PLANE STRESS STRUCTURE CHUM ZHI XIAN

A LEVY FLIGHT PARTICLE SWARM OPTIMIZER FOR MACHINING PERFORMANCES OPTIMIZATION ANIS FARHAN BINTI KAMARUZAMAN UNIVERSITI TEKNOLOGI MALAYSIA

HARDWARE AND SOFTWARE CO-SIMULATION PLATFORM FOR CONVOLUTION OR CORRELATION BASED IMAGE PROCESSING ALGORITHMS SAYED OMID AYAT

SUPERVISED MACHINE LEARNING APPROACH FOR DETECTION OF MALICIOUS EXECUTABLES YAHYE ABUKAR AHMED

AUTOMATIC APPLICATION PROGRAMMING INTERFACE FOR MULTI HOP WIRELESS FIDELITY WIRELESS SENSOR NETWORK

THE COMPARISON OF IMAGE MANIFOLD METHOD AND VOLUME ESTIMATION METHOD IN CONSTRUCTING 3D BRAIN TUMOR IMAGE

PRIVACY FRIENDLY DETECTION TECHNIQUE OF SYBIL ATTACK IN VEHICULAR AD HOC NETWORK (VANET) SEYED MOHAMMAD CHERAGHI

DYNAMIC TIMESLOT ALLOCATION TECHNIQUE FOR WIRELESS SENSOR NETWORK OON ERIXNO

HARDWARE-ACCELERATED LOCALIZATION FOR AUTOMATED LICENSE PLATE RECOGNITION SYSTEM CHIN TECK LOONG UNIVERSITI TEKNOLOGI MALAYSIA

AN IMPROVED PACKET FORWARDING APPROACH FOR SOURCE LOCATION PRIVACY IN WIRELESS SENSORS NETWORK MOHAMMAD ALI NASSIRI ABRISHAMCHI

ENHANCING SRAM PERFORMANCE OF COMMON GATE FINFET BY USING CONTROLLABLE INDEPENDENT DOUBLE GATES CHONG CHUNG KEONG UNIVERSITI TEKNOLOGI MALAYSIA

IMPLEMENTATION OF UNMANNED AERIAL VEHICLE MOVING OBJECT DETECTION ALGORITHM ON INTEL ATOM EMBEDDED SYSTEM

IMPROVED IMAGE COMPRESSION SCHEME USING HYBRID OF DISCRETE FOURIER, WAVELETS AND COSINE TRANSFORMATION MOH DALI MOUSTAFA ALSAYYH

OPTIMIZE PERCEPTUALITY OF DIGITAL IMAGE FROM ENCRYPTION BASED ON QUADTREE HUSSEIN A. HUSSEIN

SLANTING EDGE METHOD FOR MODULATION TRANSFER FUNCTION COMPUTATION OF X-RAY SYSTEM FARHANK SABER BRAIM UNIVERSITI TEKNOLOGI MALAYSIA

RECOGNITION OF PARTIALLY OCCLUDED OBJECTS IN 2D IMAGES ALMUASHI MOHAMMED ALI UNIVERSITI TEKNOLOGI MALAYSIA

LOGICAL OPERATORS AND ITS APPLICATION IN DETERMINING VULNERABLE WEBSITES CAUSED BY SQL INJECTION AMONG UTM FACULTY WEBSITES NURUL FARIHA BINTI MOKHTER

ENHANCING TIME-STAMPING TECHNIQUE BY IMPLEMENTING MEDIA ACCESS CONTROL ADDRESS PACU PUTRA SUARLI

COLOUR IMAGE WATERMARKING USING DISCRETE COSINE TRANSFORM AND TWO-LEVEL SINGULAR VALUE DECOMPOSITION BOKAN OMAR ALI

SECURE-SPIN WITH HASHING TO SUPPORT MOBILITY AND SECURITY IN WIRELESS SENSOR NETWORK MOHAMMAD HOSSEIN AMRI UNIVERSITI TEKNOLOGI MALAYSIA

DETECTION OF WORMHOLE ATTACK IN MOBILE AD-HOC NETWORKS MOJTABA GHANAATPISHEH SANAEI

ADAPTIVE ONLINE FAULT DETECTION ON NETWORK-ON-CHIP BASED ON PACKET LOGGING MECHANISM LOO LING KIM UNIVERSITI TEKNOLOGI MALAYSIA

SEMANTICS ORIENTED APPROACH FOR IMAGE RETRIEVAL IN LOW COMPLEX SCENES WANG HUI HUI

INTEGRATION OF CUBIC MOTION AND VEHICLE DYNAMIC FOR YAW TRAJECTORY MOHD FIRDAUS BIN MAT GHANI

ADAPTIVE LOOK-AHEAD ROUTING FOR LOW LATENCY NETWORK ON-CHIP NADERA NAJIB QAID AL AREQI UNIVERSITI TEKNOLOGI MALAYSIA

DYNAMIC MOBILE SERVER FOR LIVE CASTING APPLICATIONS MUHAMMAD SAZALI BIN HISHAM UNIVERSITI TEKNOLOGI MALAYSIA

PERFOMANCE ANALYSIS OF SEAMLESS VERTICAL HANDOVER IN 4G NETWOKS MOHAMED ABDINUR SAHAL

OPTIMIZED BURST ASSEMBLY ALGORITHM FOR MULTI-RANKED TRAFFIC OVER OPTICAL BURST SWITCHING NETWORK OLA MAALI MOUSTAFA AHMED SAIFELDEEN

AMBA AXI BUS TO NETWORK-ON-CHIP BRIDGE NG KENG YOKE UNIVERSITI TEKNOLOGI MALAYSIA

STUDY OF FLOATING BODIES IN WAVE BY USING SMOOTHED PARTICLE HYDRODYNAMICS (SPH) HA CHEUN YUEN UNIVERSITI TEKNOLOGI MALAYSIA

LINK QUALITY AWARE ROUTING ALGORITHM IN MOBILE WIRELESS SENSOR NETWORKS RIBWAR BAKHTYAR IBRAHIM UNIVERSITI TEKNOLOGI MALAYSIA

MODELLING AND REASONING OF LARGE SCALE FUZZY PETRI NET USING INFERENCE PATH AND BIDIRECTIONAL METHODS ZHOU KAIQING

ENHANCING WEB SERVICE SELECTION USING ENHANCED FILTERING MODEL AJAO, TAJUDEEN ADEYEMI

SOLUTION AND INTERPOLATION OF ONE-DIMENSIONAL HEAT EQUATION BY USING CRANK-NICOLSON, CUBIC SPLINE AND CUBIC B-SPLINE WAN KHADIJAH BINTI WAN SULAIMAN

HERMAN. A thesis submitted in fulfilment of the requirements for the award of the degree of Doctor of Philosophy (Computer Science)

IMPLEMENTATION AND PERFORMANCE ANALYSIS OF IDENTITY- BASED AUTHENTICATION IN WIRELESS SENSOR NETWORKS MIR ALI REZAZADEH BAEE

MULTICHANNEL ORTHOGONAL FREQUENCY DIVISION MULTIPLEXING -ROF FOR WIRELESS ACCESS NETWORK MOHD JIMMY BIN ISMAIL

This item is protected by original copyright

FINITE ELEMENT INVESTIGATION ON THE STRENGTH OF SEMI-RIGID EXTENDED END PLATE STEEL CONNECTION USING LUSAS SOFTWARE MOHD MAIZIZ BIN FISHOL HAMDI

UNSTEADY AERODYNAMIC WAKE OF HELICOPTER MAIN-ROTOR-HUB ASSEMBLY ISKANDAR SHAH BIN ISHAK UNIVERSITI TEKNOLOGI MALAYSIA

A NEW STEGANOGRAPHY TECHNIQUE USING MAGIC SQUARE MATRIX AND AFFINE CIPHER WALEED S. HASAN AL-HASAN UNIVERSITI TEKNOLOGI MALAYSIA


CAMERA CALIBRATION FOR UNMANNED AERIAL VEHICLE MAPPING AHMAD RAZALI BIN YUSOFF

SMART AQUARJUM (A UTOMATIC FEEDING MACHINE) SY AFINAZ ZURJATI BINTI BAHARUDDIN


Signature :.~... Name of supervisor :.. ~NA.lf... l.?.~mk.. :... 4./qD F. Universiti Teknikal Malaysia Melaka

A TRUST MODEL FOR BUSINESS TO CUSTOMER CLOUD E-COMMERCE HOSSEIN POURTAHERI

MICRO-SEQUENCER BASED CONTROL UNIT DESIGN FOR A CENTRAL PROCESSING UNIT TAN CHANG HAI

HIGH SPEED SIX OPERANDS 16-BITS CARRY SAVE ADDER AWATIF BINTI HASHIM

DEVELOPMENT OF SPAKE S MAINTENANCE MODULE FOR MINISTRY OF DEFENCE MALAYSIA SYED ARDI BIN SYED YAHYA KAMAL UNIVERSITI TEKNOLOGI MALAYSIA

SYSTEMATIC SECURE DESIGN GUIDELINE TO IMPROVE INTEGRITY AND AVAILABILITY OF SYSTEM SECURITY ASHVINI DEVI A/P KRISHNAN

CLOUD COMPUTING ADOPTION IN BANKING SYSTEM (UTM) IN TERMS OF CUSTOMERS PERSPECTIVES SHAHLA ASADI

DATASET GENERATION AND NETWORK INTRUSION DETECTION BASED ON FLOW-LEVEL INFORMATION AHMED ABDALLA MOHAMEDALI ABDALLA

ONTOLOGY-BASED SEMANTIC HETEROGENEOUS DATA INTEGRATION FRAMEWORK FOR LEARNING ENVIRONMENT

LOCALIZING NON-IDEAL IRISES VIA CHAN-VESE MODEL AND VARIATIONAL LEVEL SET OF ACTIVE CONTOURS WITHTOUT RE- INITIALIZATION QADIR KAMAL MOHAMMED ALI

PROBLEMS ASSOCIATED WITH EVALUATION OF EXTENSION OF TIME (EOT) CLAIM IN GOVERNMENT PROJECTS

ANOMALY DETECTION IN WIRELESS SENSOR NETWORK (WSN) LAU WAI FAN

FUZZY NEURAL NETWORKS WITH GENETIC ALGORITHM-BASED LEARNING METHOD M. REZA MASHINCHI UNIVERSITI TEKNOLOGI MALAYSIA

IMPROVED IMPLEMENTATION OF DIGITAL WATERMARKING TECHNIQUES AHMED SABEEH YOUSIF UNIVERSITI TEKNOLOGI MALAYSIA

RGB COLOR IMAGE WATERMARKING USING DISCRETE WAVELET TRANSFORM DWT TECHNIQUE AND 4-BITS PLAN BY HISTOGRAM STRETCHING KARRAR ABDUL AMEER KADHIM

A SEED GENERATION TECHNIQUE BASED ON ELLIPTIC CURVE FOR PROVIDING SYNCHRONIZATION IN SECUERED IMMERSIVE TELECONFERENCING VAHIDREZA KHOUBIARI

ENHANCEMENT OF UML-BASED WEB ENGINEERING FOR METAMODELS: HOMEPAGE DEVELOPMENT CASESTUDY KARZAN WAKIL SAID

VIRTUAL PRIVATE NETWORK: ARCHITECTURE AND IMPLEMENTATIONS

RESOURCE ALLOCATION SCHEME FOR FUTURE USER-CENTRIC WIRELESS NETWORK WAHEEDA JABBAR UNIVERSITI TEKNOLOGI MALAYSIA

MAGNETIC FLUX LEAKAGE SYSTEM FOR WIRE ROPE INSPECTION USING BLUETOOTH COMMUNICATION MUHAMMAD MAHFUZ BIN SALEHHON UNIVERSITI TEKNOLOGI MALAYSIA

ARM PROCESSOR EMULATOR MOHAMAD HASRUZAIRIN B MOHD HASHIM

AN ENHANCED SIMULATED ANNEALING APPROACH FOR CYLINDRICAL, RECTANGULAR MESH, AND SEMI-DIAGONAL TORUS NETWORK TOPOLOGIES NORAZIAH BINTI ADZHAR

PANDUAN PENGGUNA (SUPPLIER) e-purchase ORDER FOR SERVICES

MINIATURIZED PLANAR CAPACITANCE TOMOGRAPHY SYSTEM USING FAN BEAM PROJECTION TECHNIQUE FOR STAGNANT SAMPLES VISUALIZATION

BORANG PENGESAHAN STATUS TESIS

DESIGN AND IMPLEMENTATION OF A MUSIC BOX USING FPGA TAN KIAN YIAK

ENERGY-EFFICIENT DUAL-SINK ALGORITHMS FOR SINK MOBILITY IN EVENT-DRIVEN WIRELESS SENSOR NETWORKS

FAKULTI TEKNOLOGI & SAINS MAKLUMAT. PROGRAM KELAYAKAN MASUK SENARAI KURSUS Sarjana Sistem Maklumat

HARDWARE/SOFTWARE SYSTEM-ON-CHIP CO-VERIFICATION PLATFORM BASED ON LOGIC-BASED ENVIRONMENT FOR APPLICATION PROGRAMMING INTERFACING TEO HONG YAP

NETWORK-ON-CHIP FLOORPLANNING AND APPLICATION MAPPING USING CROSS-ENTROPY METHOD TAN CHEE WEI UNIVERSITI TEKNOLOGI MALAYSIA

VIDEO DISTORTION MEASUREMENT USING PSNR IN WAVELET DOMAIN MOK YUNG LENG

HYBRID MEDIUM ACCESS CONTROL USING TOKEN APPROACH IN WIRELESS SENSOR NETWORK FOR HIGH TRAFFIC APPLICATIONS NOR SYAHIDATUL NADIAH BINTI ISMAIL

AN INTEGRATED SERVICE ARCHITECTURE FRAMEWORK FOR INFORMATION TECHNOLOGY SERVICE MANAGEMENT AND ENTERPRISE ARCHITECTURE

FAKULTI TEKNOLOGI & SAINS MAKLUMAT

UNIVERSITI PUTRA MALAYSIA MULTI-LEVEL MOBILE CACHE CONSISTENCY SCHEMES BASED ON APPLICATION REQUIREMENTS DOHA ELSHARIEF MAHMOUD YAGOUB

AN ENHANCED CONNECTIVITY AWARE ROUTING PROTOCOL FOR VEHICULAR AD HOC NETWORKS AHMADU MAIDORAWA

Pengguna akan diberikan Username dan Password oleh Administrator untuk login sebagai admin/conference Manager bagi conference yang akan diadakan.

ssk 2023 asas komunikasi dan rangkaian TOPIK 4.0 PENGALAMATAN RANGKAIAN Minggu 11

UNIVERSITI PUTRA MALAYSIA PERFORMANCE ENHANCEMENT OF AIMD ALGORITHM FOR CONGESTION AVOIDANCE AND CONTROL

UNIVERSITI PUTRA MALAYSIA EFFECTS OF DATA TRANSFORMATION AND CLASSIFIER SELECTIONS ON URBAN FEATURE DISCRIMINATION USING HYPERSPECTRAL IMAGERY

THREE DIMENSIONAL INTEGRATED SOFTWARE DEVELOPMENT FOR AIR-PARTICLE FLOW SIMULATION THROUGH IMAGE-BASED UPPER HUMAN AIRWAYS

DETERMINING THE MULTI-CURRENT SOURCES OF MAGNETOENCEPHALOGRAPHY BY USING FUZZY TOPOGRAPHIC TOPOLOGICAL MAPPING

INSTRUCTION: This section consists of FOUR (4) structured questions. Answer ALL questions.

UNIVERSITI SAINS MALAYSIA. CMT322/CMM323 Web Engineering & Technologies [Kejuruteraan & Teknologi Web]

BORANG PENGESAHAN STATUS TESIS

UNIVERSITI SAINS MALAYSIA. CST232 Operating Systems [Sistem Pengendalian]

ssk 2023 asas komunikasi dan rangkaian TOPIK 4.0 PENGALAMATAN RANGKAIAN

A RULE MODELING ENGINE FOR COMPLEX EVENT PROCESSING (A CASE STUDY ON PASSIVE RFID READERS FOR A VIRTUAL SHOPPING MALL)

UNIVERSITI PUTRA MALAYSIA RELIABILITY PERFORMANCE EVALUATION AND INTEGRATION OF ROUTING ALGORITHM IN SHUFFLE EXCHANGE WITH MINUS ONE STAGE

DEVELOPMENT OF VENDING MACHINE WITH PREPAID PAYMENT METHOD AMAR SAFUAN BIN ALYUSI

UNIVERSITI SAINS MALAYSIA. CST333 Distributed & Grid Computing [Perkomputeran Teragih & Grid]

PERFORMANCE EVALUATION OF LEACH PROTOCOL FOR WIRELESS SENSOR NETWORKS USING NS2 MUHAMAD FAIZ BIN RAMDZAN

COMPARISON STUDY OF NEXT GENERATION FTTH PON ARCHITECTURES MOHAMED ELMAGZOUB ABDALLA ZEINELABDIN

Panduan Guru Maker UNO/ Arduino

Transcription:

BLOCK-BASED NEURAL NETWORK MAPPING ON GRAPHICS PROCESSOR UNIT ONG CHIN TONG UNIVERSITI TEKNOLOGI MALAYSIA

BLOCK-BASED NEURAL NETWORK MAPPING ON GRAPHICS PROCESSOR UNIT ONG CHIN TONG A project report submitted in partial fulfilment of the requirements for the award of the degree of Master of Engineering (Electrical-Computer and Microelectronic System) Faculty of Electrical Engineering Universiti Teknologi Malaysia JUNE 2015

iii Dedicated, in thankful appreciation for support, encouragement and understanding to my beloved mother, father, brother and supervisor...

iv ACKNOWLEDGEMENT First and foremost, thank God for giving me the strength to complete this project. I wish to express my greatest appreciation to my supervisor, Assoc Prof Dr. Muhammad Nadzir bin Marsono for his generous guidance, advice and motivation throughout this study. His critical suggestion and comments help me to develop the study project into understandable and workable effort. Besides that, I would also like to thank Dr Vishnu P. Nambiar from Intel for providing me the suggestion of this topic. Thank you for willing to allocate his time to guide me throughout my works. Finally, acknowledgements are also for my family, especially my parents. Thank you for their support, inspiration, and encouragement during the study and completion of this thesis. Ong Chin Tong

v ABSTRACT Block-based neural network (BbNN) was introduced to improve the training speed of artificial neural network. Various works had been carried out by previous researchers to improve training speed of BbNN system. Multithread BbNN training on field-programmable gate array (FPGA) limits training speed due to low performance of Nios II software used for communication between central processing unit (CPU) and FPGA. This project aims to improve training speed of multithread BbNN block by mapping BbNN model into Compute Unified Device Architecture (CUDA) core. In this project, each BbNN block is mapped into a CUDA core with each core running on a single thread. The functional verification of BbNN core is carried out based on the BbNN output accuracy value. Near 100 percent accuracy value obtained is used to verify the CUDA mapped BbNN. The performance trade-off analysis had been carried out by comparing the accuracy value obtained from BbNN evolution on GPU versus CPU implementations. From the results obtained, it is found out that the performance of CUDA-mapped BbNN can only be as fast as CPU-mapped implementation. Although CUDA-mapped BbNN implementation run multiple BbNN blocks training in parallel, large data transfer between CPU and GPU dominates the performance gain in training multiple BbNN blocks in parallel. Besides that, a significant gain in training speed can only be seen if the order of complexity for GPU execution is at a higher order compared to the order of CPU-GPU data transfer. The result obtained in this project provides recommendation for future research works on how to further improve the training speed of CUDA-base BbNN implementation.

vi ABSTRAK Block rangkaian neural (BbNN) telah diperkenalkan untuk menyingkatkan masa pemprosesan rangkaian neural. Pelbagai kerja telah dijalankan oleh penyelidik sebelum ini untuk menyingkatkan masa pemprosesan BbNN. Prestasi multithread BbNN menggunakan field-programmable gate array (FPGA) akan dihadkan oleh prestasi perlahan daripada perisian Nios II yang digunakan untuk berkomunikasi antara central processing unit (CPU) dan FPGA. Projek ini bertujuan untuk menerokai kaedah bagi menyingkatkan masa pemprosesan dengan memetakan BbNN mengunakan teras Compute Unified Device Architecture (CUDA). Dalam projek ini, setiap blok BbNN dipetakan ke dalam teras CUDA dengan setiap teras berjalan dengan satu thread. Dengan mendapat ketepatan yang hampir kepada 100 peratus, BbNN yang dipetakan ke dalam CUDA telah disahkan betul. Perbandingan antara prestasi GPU dan CPU kemudian dijalankan dengan mendapatkan perbezaan ketepatan dan masa pemprosesan BbNN. Daripada keputusan projek ini, didapati kelajuan pemprosesan BbNN yang dipetakan ke dalam teras CUDA hanya seiras dengan masa pemprosesan BbNN CPU. Walaupun BbNN yang dipetakan ke dalam teras CUDA diprocess secara selari, prestasi masa pemprosesan CUDA telah didominasi oleh jumlah besar data yang perlu disampaikan antara CPU dan GPU. Di samping itu, peningkatan prestasi pemprosesan CUDA hanya dapat diperlihat sekiranya kerumitan pembilangan berada dalam order yang lebih tinggi daripada kerumitan data yang perlu disampaikan. Keputusan yang diperolehi daripada projek ini dapat menyediakan cadangan untuk kajian masa hadapan mengenai cara untuk meningkatkan lagi prestasi BbNN yang dipetakan ke dalam teras CUDA.