Speedup of Type-1 Fuzzy Logic Systems on Graphics Processing Units Using CUDA

Similar documents
Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Cluster Analysis of Electrical Behavior

The Codesign Challenge

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

A New Approach For the Ranking of Fuzzy Sets With Different Heights

An Efficient Garbage Collection for Flash Memory-Based Virtual Memory Systems

Evaluation of an Enhanced Scheme for High-level Nested Network Mobility

Assembler. Building a Modern Computer From First Principles.

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

Parallel matrix-vector multiplication

Tuning of Fuzzy Inference Systems Through Unconstrained Optimization Techniques

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Classifying Acoustic Transient Signals Using Artificial Intelligence

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Shadowed Type-2 Fuzzy Logic Systems

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface.

Positive Semi-definite Programming Localization in Wireless Sensor Networks

Centroid Density of Interval Type-2 Fuzzy Sets: Comparing Stochastic and Deterministic Defuzzification

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Scheduling Remote Access to Scientific Instruments in Cyberinfrastructure for Education and Research

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

An Accurate Evaluation of Integrals in Convex and Non convex Polygonal Domain by Twelve Node Quadrilateral Finite Element Method

Parallel Inverse Halftoning by Look-Up Table (LUT) Partitioning

Lecture 5: Multilayer Perceptrons

(1) The control processes are too complex to analyze by conventional quantitative techniques.

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

A Binarization Algorithm specialized on Document Images and Photos

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Outline. Digital Systems. C.2: Gates, Truth Tables and Logic Equations. Truth Tables. Logic Gates 9/8/2011

Wishing you all a Total Quality New Year!

A NOTE ON FUZZY CLOSURE OF A FUZZY SET

AC : TEACHING SPREADSHEET-BASED NUMERICAL ANAL- YSIS WITH VISUAL BASIC FOR APPLICATIONS AND VIRTUAL IN- STRUMENTS

Chinese Word Segmentation based on the Improved Particle Swarm Optimization Neural Networks

Meta-heuristics for Multidimensional Knapsack Problems

Motivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to:

Research Article A Proposal to Speed up the Computation of the Centroid of an Interval Type-2 Fuzzy Set

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss.

Performance Evaluation of an ANFIS Based Power System Stabilizer Applied in Multi-Machine Power Systems

FUZZY LOGIC FUNDAMENTALS

Support Vector Machines

Vectorization of Image Outlines Using Rational Spline and Genetic Algorithm

Optimal Design of Nonlinear Fuzzy Model by Means of Independent Fuzzy Scatter Partition

A fast algorithm for color image segmentation

Heterogeneous Parallel Computing: from Clusters of Workstations to Hierarchical Hybrid Platforms

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Area Efficient Self Timed Adders For Low Power Applications in VLSI

Research Article GPU Acceleration of Melody Accurate Matching in Query-by-Humming

FPGA-based implementation of circular interpolation

THE low-density parity-check (LDPC) code is getting

Enhanced AMBTC for Image Compression using Block Classification and Interpolation

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6)

Novel Fuzzy logic Based Edge Detection Technique

Related-Mode Attacks on CTR Encryption Mode

Study on Fuzzy Models of Wind Turbine Power Curve

A Parallelization Design of JavaScript Execution Engine

Efficient Distributed File System (EDFS)

Concurrent Apriori Data Mining Algorithms

Maintaining temporal validity of real-time data on non-continuously executing resources

Solving two-person zero-sum game by Matlab

Smoothing Spline ANOVA for variable screening

Distributed Resource Scheduling in Grid Computing Using Fuzzy Approach

S1 Note. Basis functions.

Virtual Machine Migration based on Trust Measurement of Computer Node

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

BRDPHHC: A Balance RDF Data Partitioning Algorithm based on Hybrid Hierarchical Clustering

Fuzzy Model Identification Using Support Vector Clustering Method

THE FUZZY GROUP METHOD OF DATA HANDLING WITH FUZZY INPUTS. Yuriy Zaychenko

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Using Fuzzy Logic to Enhance the Large Size Remote Sensing Images

Local Quaternary Patterns and Feature Local Quaternary Patterns

Hybrid Non-Blind Color Image Watermarking

CHAPTER 4 PARALLEL PREFIX ADDER

ELEC 377 Operating Systems. Week 6 Class 3

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

Wavefront Reconstructor

X- Chart Using ANOM Approach

MARS: Still an Alien Planet in Soft Computing?

FUZZY NEURAL NETWORK MODEL FOR MARK-UP ESTIMATION. Chao Li-Chung and Liu Min. School of Building and Real Estate, National University of Singapore,

CACHE MEMORY DESIGN FOR INTERNET PROCESSORS

The Research of Support Vector Machine in Agricultural Data Classification

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Module Management Tool in Software Development Organizations

The Shortest Path of Touring Lines given in the Plane

Modeling, Manipulating, and Visualizing Continuous Volumetric Data: A Novel Spline-based Approach

A SYSTOLIC APPROACH TO LOOP PARTITIONING AND MAPPING INTO FIXED SIZE DISTRIBUTED MEMORY ARCHITECTURES

Accounting for the Use of Different Length Scale Factors in x, y and z Directions

An Application of Fuzzy c-means Clustering to FLC Design for Electric Ceramics Kiln

Hierarchical clustering for gene expression data analysis

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

Transcription:

Speedup of Type-1 Fuzzy Logc Systems on Graphcs Processng Unts Usng CUDA Durlabh Chauhan 1, Satvr Sngh 2, Sarabjeet Sngh 3 and Vjay Kumar Banga 4 1,2 Department of Electroncs & Communcaton Engneerng, SBS State Techncal Campus, Ferozepur 152004, (Punjab) Inda 3 Department of Computer Scence & Engneerng, SBS State Techncal Campus, Ferozepur 152004, (Punjab) Inda 4 Department of Electroncs & Communcaton Engneerng, Amrtsar College of Engg. & Tech., Amrtsar 143001, (Punjab) Inda E-mal: 1 er.durlabh@gmal.com, 2 drsatvr.n@gmal.com, 3 sarabjeet_sngh13@yahoo.co, 4 v_banga@redffmal.com Abstract Parallelcomputng s one of sgnfcant components of the Hgh Performance Computng (HPC) and s beng used to solve problems, whch are large and complex n nature. Fuzzy Logc System (FLS) s a problem that becomes computatonally ntensve wth ncrease n number of nputs and/or fuzzy rules. Runnng an FLS s hghly parallel n nature, therefore, can be mplemented n parallel on GPU usng CUDA. In ths paper, varous fuzzy computatons vz. rule frng, mplcaton, aggregaton and defuzzfcaton are performed n parallel. Multple threads are run at a tme to fre multple fuzzy rules, smultaneously, that reduces the overall FLS executon tme. It s observed from smulaton results that GPU works faster as compared to CPU when ether number of nputs s ncreased or number of fuzzy rules. Keywords: Hgh Performance Computng, Fuzzy Logc Systems, General Purpose Computng on Graphcs Processng Unt, Compute Unfed Desgn Archtecture I. INTRODUCTION Artfcal Intellgence (AI) undoubtedly s the backbone of the emergng technologcal advancements, however, mplementaton nvolves ntensve computaton owng to ncreased data szes. Methods to mprove the runtme performance construe the areas of research to reduce mathematcal computatons and parallelzaton of algorthm n hardware. Graphcs Processor Unts (GPU), plays a major role n gamng and graphcs applcatons and allows for general purpose programmng on remarkably fast parallel hardware usng a Sngle Instructon Multple Data (SIMD) programmng archtecture. Fuzzy Logc, ntroduced by Zadeh [1], [2], s one of the exgent part of AI and possess nherent parallel nature. General Purpose computng on GPU (GPGPU) s targeted by many researchers to speed up complcated algorthms especally, AI algorthms and ther applcatons [3]. Ander-son et al. presented a GPU soluton for the fuzzy C-means clusterng algorthms [4]. Earler ths soluton used OpenGL and Cg (graphcs lbrares) to acheve approxmately two folds of computatonal speedup for some clusterng profles usng NVIDIA 8800 GPU. They later generalzed the system for the use of non- Eucldean metrcs [5]. Further, Sejun Km descrbes the method used to adapt a multlayer tree structure composed of fuzzy adaptve unts nto CUDA (Compute Unfed Devce Archtecture) platforms [6]. In [7], Chosa and Kolb present a framework for mesh clusterng solely mplemented on the GPU wth a new generc multlevel clusterng technque. Cha et al., have proposed the mplementaton of a zero-order TSK-Fuzzy Neural Network (FNN) on GPUs to reduce tranng tme n [8]. Harvey et al., have presented a GPU soluton for fuzzy nference system n [9]. Anderson et al., present a parallel mplementaton of fuzzy nference on GPU usng CUDA n[10]. Two folds of speed mprovement of ths naturally parallel algorthm have been acheved under typcal nference profles. One problem wth ths system and the mplementaton on GPU s that they both rely upon OpenGL and Cg lbrares, whch makes system generalzaton dffcult for new comers to GPGPU. Further, Ngo et al., report an mplementaton of Interval Type-2 FLS on GPU usng CUDA wth a tremendous speedup of 30 folds n [11]. In ths paper, parallel functonalty wth CUDA programmng model, for parallel mplementaton of a Type-1 Sugeno FLS on GPU, s nvestgated wth ncreased number of nputs and fuzzy rules. After ths bref hstorcal background rest of ths paper s outlned as follows: Secton II targets CUDA programmng model for GPGPU. Secton III ntroduces Type 1 FLS along wth the scope of parallelsm wherever possble. Secton IV presents smulaton results and a speedup comparson of seral and parallel computng usng CPU and GPU, respectvely. Fnally, secton V concludes the research workand presents future off-shoots. II. CUDA PROGRAMMING MODEL CUDA s a parallel computng platform and programmng model ntroduced by NVIDIA that ncreases computng performance substantally by harnessng the power of parallelsm of the GPU. CUDA gves program developers the drect access to the vrtual nstructon set and memory of the parallel computatonal elements n CUDA enabled GPUs. The CUDA platform s accessble to software developers through CUDA accelerated lbrares, compler drectves (such as Open ACC), and extensons to ndustry-standard programmng languages, ncludng C, C++ and FORTRAN. C/C++ programmers use

Internatonal Conference on Communcaton, Computng & Systems (ICCCS 2014) CUDAC/C++, compled wth nvcc whch s NVIDIA S LLVM-based C/C++ compler. A C/C++ program usng CUDA can nterface wth one GPU or multple GPUs and can be dentfed and utlzed n parallel, allowng for unprecedented processng power on desktop computers. CUDA allows multple kernels to be run smultaneously on GPU cores. CUDA refers to each kernel as a grd. A grd s a collecton of blocks. Each block runs the same kernel, however, s ndependent of each other (ths has sgnfcance n terms of access to memory types). A block contans threads, whch are the smallest dvsble unt on a GPU. A thread block s a number of SIMD threads that work on core at a gven tme. Threads can exchange nformaton through the shared memory and can be synchronzed. The operatons are systematzed as a grd of thread blocks. For parallel operaton the programmng model allows a developer to partton a program nto several subprograms, each of whch s executed ndependently on a block. Each subprogram can be further dvded nto fner peces that perform the same functon but execute on dfferent threads ndependently wthn the same block. For data set parallelsm, data sets can be dvded n to smaller chunks that are stored n the shared memory, and each chunk s vsble to all threads of the same block. Ths local data arrangement approach reduces the need to access off-chp global memory, whch reduces data access tme. memory reads and wrtes. Ths s typcally acheved by havng each thread read ts data from global memory and store ts content nto shared memory. The basc structure of a CUDA code comprses of allocaton of memory space (usng cuda Malloc functon) on devce (GPU) and (usng regular malloc functon) on host (CPU). Data whch s coped from the host to the devce for the call of kernel routne to be executed on the GPU (usng functon cuda Memcpy) also defnes the number of threads and ther physcal structure. Kernel s prefxed wth the global keyword. Results are transferred from GPU to CPU n the same fashon as data s coped from host to devce. III. SCOPE OF PARALLELISM IN TYPE-1 FLS Theory of FLS gven by Zadeh a fuzzy set s defned for a partcular doman, and t s characterzed by a membershp functon that maps elements from the doman to a real valued numbers [1], [2]. Mendel and many researchers gave numerous methods to desgn and mplement FLS for varous applcatons [12] [15]. Theory of FLS gven by Zadeh a fuzzy set s defned for a partcular doman, and t s characterzed by a membershp functon that maps elements from the doman to a real valued numbers. Fg. 1 CUDA Archtecture The next crtcal component of a CUDA applcaton s the memory model. There are multple types of memory and each has dfferent access tmes. The GPU s broken up nto read-wrte per thread regsters, read-wrte per thread local memory, read-wrte per-block shared memory, read-wrte per-grd global memory, read-only per-grd constant memory, and read-only per-grd texture memory. Texture and constant memory have relatvely small access latency tmes, whle global memory has the largest access latency tme. Applcatons should mnmze the number of global Fg. 2 CUDA Process Flow In ths paper every nput has been fuzzfed usng four arbtrary located Gaussan membershp functons that are characterzed by two parameters,.e., mean (m) and standard devaton (). Gaussan membershp functon s symmetrcal about ts mean and s expressed mathematcally as (1) ( x) ( x m) exp 2 2 2 (1) In ths paper, an FLS wth 4 nputs and 1 output s subjectedto parallel computaton for forecastng of Mackey-Glass tmeseres [14]. For mplcaton and aggregaton, mn and maxoperator are nvestgated. The defuzzfcaton s performedwth the heght defuzzfcaton method as symmetrcal shaped Gaussan fuzzy sets have been used to fuzzfy consequent.output of the heght defuzzfer s gven by (2) 268

Speedup of Type-1 Fuzzy Logc Systems on Graphcs Processng Unts Usng CUDA y M 1 M C ( x ) 1 ( x ) Here C represents the locaton of sngleton consequents fuzzy sets and (x ) represent clppng level for each rule after mplcaton and M represents total number of fuzzy rules. However, for mplementaton of the FLS defned above n CUDA, the foremost and the most crucal step s to allocate the memory space for the data sets to be used n the system,as the choce and format of data affects performance of thealgorthm. Scope of possble parallel computatonal processng s dscussed as follows: Harvey et al., [9] and Anderson et al., [10] n ther respectve work have gven a vast scope of parallelsm for Type-1 FLS. In the same fashon Ngo et al., [11] presented a novel scope of parallelsm for Interval Type-2 FLS. However, n all these mplementatons the emphass was lad to parallelze the number of fuzzy rules and dscrete levels for a typcal FLS wth two nputs and sngle output. So, a sngle FLS was computed wth parallel rule nference on GPU usng CUDA. However, a typcal FLS can be processed multple tmes wth fxed fuzzy rules and dscrete levels but varyng nputs. Here les our scope of parallelsm, we construe our code to compute multple FLS n parallel on GPU as a FLS serally on CPU wll consume more tme. IV. PARALLEL I MPLEMENTATION Two M N dmensonal Mean and Sgma matrces are used n ths mplementaton where M denotes number of fuzzy rules and N s number of antecedents. These matrces hold mean and sgma values of Gaussan membershp functon n accordance wth fuzzy rules used n the FLS. An M 1 dmensonal Consequent matrx contans only mean values of the consequent fuzzy sets as systems uses heght defuzzfcatonmethod gven by equaton (2). Multple nputs are provded to the systems collectvely n the form of an L N dmensonal nput matrx, X, where L s the number of nputs.the CPU executes rule frng sequentally wth a sngle nput at a tme and causes more computatonal tme. Whereas, multple threads are run at the same tme to fre multple rules smultaneously usng system matrces,.e., Mean, Sgma and Consequent matrces,whch reduce the executon tme for the FLS. A kernel functon s ntalzed from CPU to pass nputs n parallel to varous GPU cores. Copyng system matrces everytme along wth nput vectors and ncreases the GPU processng tme. Therefore, system matrces are coped only once and subsequently requre only nput vectors those are passed to GPU n parallel to enhance the GPU performance. (2) V. RESULTS DISCUSSION The speed up performance wth GPU mplementaton of a Type-1 Sugeno FLS was compared wth that of CPU mplementaton. Intel Core 2 Duo system under expermentaton has 2GB of system RAM, and Wndows 7 platform. The GPU used here works on nvidiageforce GTX 650 wth 1024 MB of texture memory, 192 stream processors, and PCI Express X16. The number of antecedents s fxed to 4 and consequentto 1, the number of rules s vared between 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 and nputs were vared as 128, 256, 512, and 1024. A set of nput data s obtaned from Mackey-Glass tme seres for expermentaton. Rato of CPU to GPU runtmes wth respect to number of fuzzy rules has been tabulated n Table I and presented graphcallyn Fg. 3. TABLE I FUZZY RULES VS. CPU TO GPU SPEEDUP RATIO CPU to GPU Runtme Speedup Rato Number of Fuzzy 128 256 512 1024 Rules Inputs Inputs Inputs Inputs 10 1.937 4.875 5.034 5.488 20 2.437 3.032 4.548 6.132 30 1.340 2.319 4.319 6.544 40 2.000 2.476 3.968 7.063 50 1.516 3.532 4.410 7.063 60 1.730 3.000 4.365 7.185 70 1.602 3.205 4.509 7.134 80 1.500 2.742 4.500 7.832 90 1.709 2.577 4.446 5.488 100 1.624 2.504 4.652 6.132 Here, t can be observed clearly that advantage of GPU computng has ncreased wth ncreased nput data vector szes. In another smulaton results, t s observed that CPU to GPU speedup tme mproves wth ncrease n fuzzy rules. Computatonal tme on CPU vares sgnfcantly due to already always runnng applcatons at the back end.therefore, to present far comparson seral and parallel tmng analyss experments wth same FLS have been repeated 30 tmes. Average of CPU to GPU speedups for 30 monte-carlosmulatons durng seral and parallel computatons of rule frng, mplcaton, aggregaton and defuzzfcaton have beenpresented n Fg. 4 7. Fg. 3 CPU to GPU Speedup Rato vs Inputs Data Length 269

Internatonal Conference on Communcaton, Computng & Systems (ICCCS 2014) The overall performance of CPU to GPU speedup tmngswth respect to collectve number of nputs, presented to the FLS, s shown n Fg. 8 that depcts that larger the number offuzzy rules better s the speedup performance. Fg. 6 CPU and GPU Runtme Comparson for 512 Inputs Fg. 4 CPU and GPU Runtme Comparson for 128 Inputs Fg. 7 CPU and GPU Runtme Comparson for 1024 Inputs Fg. 5 CPU and GPU Runtme Comparson for 256 Inputs VI. CONCLUSION AND FUTURE WORK Ths paper has demonstrated the mplementaton and comparatve runtme performances of a typcal Sugeno Type-1 FLSon a GPU and CPU wthout the use of a graphcs API whch s flexble, scalable, and can be used by any researcher wth knowledge of C. It has been demonstrated that the CPU works equally fast as GPU when the system s small. As the number of rules or the number of nputs ncrease the GPU outperforms the CPU runtme. Here, n the work nearly 7.83 tmes speedup could be acheved as 1024 nputs suppled n parallel to the FLS ported on GPU. The GPU has an ntal setup overhead of kernel loadng and memory transfer, however, subsequent parallel computatons leads to a small ncrease n processng tme despte a substantal ncrease n computatonal load. On the other hand, CPU has no ntal cost, but computaton tme grows lnearly wth computatonal load much beyond GPGPU runtme. Fg. 8 CPU to GPU Speedup Rato for Varous Input Data Sets The parallelzaton of more FLS applcatons s next on ouragenda. That wll follow mplementaton of Interval Type-2 FLSs and Generalzed Type-2 FLSs for varous applcatons that requre much more computatonal tme otherwse.gpgpu s also possbly nvestgated to on Evolutonary Algorthms those are computatonal ntensve and parallel nnature. REFERENCES [1] L.A. Zadeh, Fuzzy Sets, Informaton and Control, Vol. 8, No. 3, pp. 338 353, 1965. [2] L.A. Zadeh, Fuzzy Logc and Approxmate Reasonng, Synthese, Vol. 30, pp. 407 428, 1975. [3] S. Sngh, S. Sngh, V.K. Banga, and D. Chauhan, CUDA for GPGPU Applcatons-A Survey, n Natonal Conference on Contem-poraryTechnques & Technologes n Electroncs Engneerng, Murthal,Sonepat, Inda, March 2013, p. Accepted. 270

Speedup of Type-1 Fuzzy Logc Systems on Graphcs Processng Unts Usng CUDA [4] D.T. Anderson, R.H. Luke, and J.M. Keller, Speedup of FuzzyClusterng through Stream Processng on Graphcs Processng Unts, IEEE Transactons on Fuzzy Systems, Vol. 16, No. 4, pp. 1101 1106, 2008. [5] D. Anderson, R.H. Luke, and J.M. Keller, Incorporaton of non-eucldeandstance Metrcs nto Fuzzy Clusterng on Graphcs Processng Unts, n Analyss and Desgn of Intellgent Systems usng Soft Computng Technques. Sprnger, pp. 128 139, 2007. [6] S. Km and D. Wunsch, A GPU based Parallel Herarchcal Fuzzy ART Clusterng, n The 2011 Internatonal Jont Conference on Neural Networks (IJCNN), pp. 2778 2782, 2011. [7] I. Chosa and A. Kolb, GPU-based Multlevel Clusterng, IEEE Transactons on Vsualzaton and Computer Graphcs, Vol. 17, No. 2, pp. 132 145, 2011. [8] C.F. Juang, T.C. Chen, and W.Y. Cheng, Speedup of Implementng Fuzzy Neural Networks wth Hgh-dmensonal Inputs through Parallel Processng on Graphc Processng Unts, IEEE Trans-actons on Fuzzy Systems, Vol. 19, No. 4, pp. 717 728, 2011. [9] N. Harvey, R. Luke, J.M. Keller, and D. Anderson, Speedup of Fuzzy Logc through Stream Processng on Graphcs Processng Unts, n 2008. CEC IEEE World Congress on Computatonal Intellgence and Congress on Evolutonary Computaton, pp.3809 3815, 2008. [10] D. Anderson and S. Coupland, Parallelsaton of Fuzzy Inference on a Graphcs Processor Unt usng the Compute Unfed Desgn Archtecture, n Proceedngs of the UK Workshop on Computatonal Intellgence (UKCI 08), pp. 1 6, 2008. [11] L.T. Ngo, D.D. Nguyen, C.M. Luong et al.,., Speedup of Interval Type 2 Fuzzy Logc Systems based on GPU for Robot Navgaton, Advances n Fuzzy Systems, Vol. 2012, pp. 4, 2012. [12] J.M. Mendel, Fuzzy Logc Systems for Engneerng: A Tutoral, Proceedngs of the IEEE FUZZ, Vol. 83, No. 3, pp. 345 377, 1995. [13] G.C. Mouzours and J.M. Mendel, Non-sngleton Fuzzy Logc Systems: Theory and Applcaton, IEEE Transactons on Fuzzy Systems, Vol. 5, No. 1, pp. 56 71, 1997. [14] N.N. Karnk and J.M. Mendel, Applcatons of Type-2 Fuzzy Logc Systems to Forecastng of Tme-seres, Informaton Scences, Vol. 120, No. 1, pp. 89 111, 1999. [15] J.M. Mendel, Uncertanty, Fuzzy Logc, and Sgnal Processng, Sgnal Processng, Vol. 80, No. 6, pp. 913 933, 2000. 271