Computer and Machine Vision

Size: px

Start display at page:

Download "Computer and Machine Vision"

Constance Curtis
6 years ago
Views:

1 Computer and Machine Vision Lecture Week 3 Part-2 January 27, 2014 Sam Siewert

2 Resource Scaling Processing Co-Processors GPU CUDA, OpenCL Many-Core E.g. Intel Xeon Phi MICA FPGA E.g. Altera Stratix Ideally Camera Interface I/O High Rate Transport HD-SDI, Camera Link, GigE/10GE Memory SSD, PCIe Nand, NVM FusionIO, Micron, Intel Memristor (Future) Sam Siewert 2

3 SIMD Vector Instructions Intel MMX, SSE 1, 2, 3, 4.x Code Generation Using SIMD Extensions to Accelerate Algorithms (Edge Enhancement) PSF Sam Siewert 3

4 Offload, Co-Proc, Vector Proc 1. GPU (Graphics Processing Units) Evolved for Consumer CGI and Games Physics Engines 3D Rendering + Texture (4D Vector Operations) Game Engines and Simulation HD Output: HDMI, HD-SDI, Headless GP-GPU Higher End Used for Digital Cinema / Post Production, Broadcast PNY Quadro FX NVIDIA CUDA for Post GP-GPU Being Used to Accelerate Encode, Transcode, Trans-rate, etc Built-In SIMD Instruction Set Extensions Intel SSE Sam Siewert 4

5 GP-GPU, What Is It? Ideal for Large Bitwise, Integer, and Floating Point Vector Math Flynn s Taxonomy SIMD Architecture often leverages GP-GPU Co- Processors or Cell for MPMD Single Data Multiple Data Single Instruction/Prog Multiple Instruction SISD (Traditional Uniprocessor) SIMD (SSE 4.2, Vector Processing) SPMD (Single Program 5 Multiple Data), GP-GPU MISD (Voting schemes and active-active controllers) MIMD (Distributed systems (MPMD), Clusters with MPI/PVM (SPMD), AMP/SMP)

6 SSE Streaming SIMD Extensions 128-bit registers known as XMM0 through XMM7 Large Operands and Operators (Multi-Word) E.g. 128-bit XOR of Two Operands Multiple Multiply and Accumulate Operations for Floating Point (DSP Kernel Operations) E.g. 4 Component Vector addition 4 Single Precision Pixel Multiply and Accumulate in Single Instruction vec_res.x = v1.x + v2.x; vec_res.y = v1.y + v2.y; vec_res.z = v1.z + v2.z; vec_res.w = v1.w + v2.w; movaps xmm0,address-of-v1 addps xmm0,address-of-v2 movaps address-of-vec_res,xmm0 16 operations to load 2 operands, add, store 3 SSE operations to load, add, store ;xmm0=v1.w v1.z v1.y v1.x ;xmm0=v1.w+v2.w v1.z+v2.z v1.y+v2.y v1.x+v2.x Sam Siewert 6

7 Scheduling Parallel/Cluster HW MIMD OS SMP threading, provides load balancing, affinity operations, routable interrupts (e.g. MSI- X), e.g. NPTL RTOS AMP is most often used in Embedded Systems MPMD OpenCL, CUDA, DirectCompute (DirectX extension) Intel OpenMP, Linux Cluster, MPI Sam Siewert 7

8 How Does NPTL Work? No Thread Manager or M-on-N Mapping Previous POSIX Threading Model Manager Becomes Bottleneck Two-Level Scheduling Not Deterministic Many Pthreads (M) to N Kernel Threads Still an Issue O(n) Scheduling for each Manager Direct Mapping of User to Kernel Thread or 1-to-1 User Space Pthread Maps Directly onto Kernel Thread (Requires Root privilege) Deterministic (Non-Determinism due to Kernel Preemptability Issues) O(1) Scheduling Scheduling Policies Selectable Similar to RTOS Tasking Sam Siewert 8

9 Linux NPTL Scheduling Policies Fixed Priority Preemptive SCHED_FIFO This is Priority Preemptive SCHED_RR This is Fair, but at Kernel Level SCHED_OTHER This is OS default and should not be used POSIX Threads have Policy (FIFO, RR, OTHER) Priority (RT min to RT max) Creation (Fork) Join (Wait for thread completion at rendezvous) Synchronization Methods Semaphores Message Queues Asynchronous Communication Methods Signals Queued Signals POSIX RT Extensions Include Virtual Timer Services Signals Tied to Timer Services Priority Inversion Protection (Availability on Linux TBD) Sam Siewert 9

10 NPTL Coding Code Walk-through July 7, 2004 Sam Siewert

11 Thread Scheduling Policy pthread_attr_init(&rt_sched_attr); pthread_attr_setinheritsched(&rt_sched_attr, PTHREAD_EXPLICIT_SCHED); pthread_attr_setschedpolicy(&rt_sched_attr, SCHED_FIFO); rt_max_prio = sched_get_priority_max(sched_fifo); rt_min_prio = sched_get_priority_min(sched_fifo); rt_param.sched_priority = rt_max_prio-1; rc=sched_setscheduler(getpid(), SCHED_FIFO, &rt_param); pthread_attr_getscope(&rt_sched_attr, &scope); if(scope == PTHREAD_SCOPE_SYSTEM) printf("pthread SCOPE SYSTEM\n"); else if (scope == PTHREAD_SCOPE_PROCESS) printf("pthread SCOPE PROCESS\n"); else printf("pthread SCOPE UNKNOWN\n"); Sam Siewert 11

12 Thread Creation and Join rc = pthread_create(&main_thread, &main_sched_attr, testthread, (void *)0); if (rc) { printf("error; pthread_create() rc is %d\n", rc); perror(null); exit(-1); } pthread_join(main_thread, NULL); if(pthread_attr_destroy(&rt_sched_attr)!= 0) perror("attr destroy"); Sam Siewert 12

13 Issues Beyond Policy and Feasibility Throughput Latency How do they Differ? E.g. Frame Rate vs. Time to First Frame Sam Siewert 13

14 Digital Video (Quick Reminders) Sam Siewert 14

Simple Encode/Decode is Processing Intensive GPU

Mplayer and VDPAU (Video Decode and Presentation

Mplayer VDPAU MPEG Decode (Load balancing and

15 Simple Encode/Decode is Processing Intensive GPU Co-Processors Can Offload CPU Example with Mplayer and VDPAU (Video Decode and Presentation Acceleration Unit) for Linux Core Loading with Mplayer VDPAU MPEG Decode (Load balancing and offload) Dual-Core SW Decode (Load balancing) Sam Siewert 15

Discussion What Does Eye See? Ewald Hering (1872), Opponent Colors (R/G, Y/B) Color Models RGB Cube HSV - Hue/Saturation/Value Hue Similarity to R, G, Y, B Saturation Color vs.

png Red and Green Opponent Colors Can t See Both Simultaneously Yellow and Blue Opponent Colors HSV Cylinder Luminance (Candela/Square-Meter) Light Passing Through Area Forming a Solid Angle in

16 Discussion What Does Eye See? Ewald Hering (1872), Opponent Colors (R/G, Y/B) Color Models RGB Cube HSV - Hue/Saturation/Value Hue Similarity to R, G, Y, B Saturation Color vs. Brightness Value Low=Black, High=Color RGB Cube Red and Green Opponent Colors Can t See Both Simultaneously Yellow and Blue Opponent Colors HSV Cylinder Luminance (Candela/Square-Meter) Light Passing Through Area Forming a Solid Angle in A Direction Candela (Photonic Power )= Watts/Steradian More Precise than Brightness Chrominance ( CrCb or UV in YCrCb or YUV) U=Blue Luminance (Y) V=Red - Luminance (Y) Wavelength Spectrum - ROYGBIV Sam Siewert 16

17 Summary of Capture Bayer Pixel RGB Sampled in 4:3 or 16:9 or other AR Frame Array G R B G Graymap is Green, or Y alone in YCrCb Eye Integrates the Color-bands to Perceive Color (any band alone appears gray) Frames are Processed with CV/MV CV/MV Processing of Frames Over Time Distinguishes from Image Processing Interactive or Real-Time Sam Siewert 17

Frame Analysis and Image Processing Resources for Raw Frame Data GNU Image Processing Single Frame Analysis and Transforms Octave Similar to MATLAB Irfanview Simple Viewer includes PPM OpenCV

18 Frame Analysis and Image Processing Resources for Raw Frame Data GNU Image Processing Single Frame Analysis and Transforms Octave Similar to MATLAB Irfanview Simple Viewer includes PPM OpenCV (C/C++ and Python API) Single Frame Viewing and Analysis Image Processing Libraries Sam Siewert 18

19 Practice with Linux GIMP PPM and JPEG Frame Analysis FFMPEG MPEG-4 DV to Frames Sobel Image Transformation Real-Time Sobel or Canny Image Transformation Batch Mode FFMPEG Re-encoding Sam Siewert 19

A320 Supplemental Multi-Core Materials

A320 Supplemental Multi-Core Materials Scaling for Data-centric Computing (Overview for OS) April 18, 2013 Sam Siewert Scaling Processors and Processing Distributed Systems Networked Machines, Map Reduce