H.264 Streaming Framework for Virtual Colonoscopy

Similar documents
Image and video processing

About MPEG Compression. More About Long-GOP Video

Internet Streaming Media

Internet Streaming Media. Reji Mathew NICTA & CSE UNSW COMP9519 Multimedia Systems S2 2006

Compression and File Formats

Relevance Feature Discovery for Text Mining

Internet Streaming Media. Reji Mathew NICTA & CSE UNSW COMP9519 Multimedia Systems S2 2007

Internet Streaming Media

Streaming (Multi)media

MISB EG Motion Imagery Standards Board Engineering Guideline. 24 April Delivery of Low Bandwidth Motion Imagery. 1 Scope.

Advanced Video Coding: The new H.264 video compression standard

Video Compression An Introduction

TotalCode Studio. Professional desktop encoding for digital distribution and over the top services NEW FEATURES

TotalCode Enterprise is an ideal solution for video-on-demand content preparation for any screen anywhere.

MULTIMEDIA AND CODING

Our Technology Expertise for Software Engineering Services. AceThought Services Your Partner in Innovation

Lecture 7, Video Coding, Motion Compensation Accuracy

Video. Add / edit video

More performance options

THE H.264 ADVANCED VIDEO COMPRESSION STANDARD

Introduction to LAN/WAN. Application Layer 4

Digital Imaging and Communications in Medicine (DICOM)

ISO/IEC Information technology Coding of audio-visual objects Part 15: Advanced Video Coding (AVC) file format

PREFACE...XIII ACKNOWLEDGEMENTS...XV

Digital Asset Management 5. Streaming multimedia

ZEN / ZEN Vision Series Video Encoding Guidelines

Module 10 MULTIMEDIA SYNCHRONIZATION

CS 218 F Nov 3 lecture: Streaming video/audio Adaptive encoding (eg, layered encoding) TCP friendliness. References:

Intel Stress Bitstreams and Encoder (Intel SBE) HEVC Getting Started

EE 5359 H.264 to VC 1 Transcoding

A Converged Content Delivery Platform for IP and QAM Video

Video coding. Concepts and notations.

Mark Kogan CTO Video Delivery Technologies Bluebird TV

Important Encoder Settings for Your Live Stream

Online Help Browser Requirements Safari *Please note: Episode 7 does not support OS X or previous versions of Mac OS X.

lesson 24 Creating & Distributing New Media Content

Chapter 2. Application Layer. Chapter 2: Application Layer. Application layer - Overview. Some network apps. Creating a network appication

MPEG-4. Today we'll talk about...

H.264 AVC 4k Decoder V.1.0, 2014

Streaming Technologies Delivering Multimedia into the Future. May 2014

Transcoding SDK. Professional Transcoding Engine

How Libre can you go?

MPEG-4: Overview. Multimedia Naresuan University

Selected coding methods in H.265/HEVC

Technology Overview. Gallery SIENNA London, England T

Module 7 VIDEO CODING AND MOTION ESTIMATION

Networking Applications

Lecture 5: Error Resilience & Scalability

Mobile Cloud Computing & Adaptive Streaming

How to achieve low latency audio/video streaming over IP network?

Internet Engineering Task Force (IETF) Request for Comments: R. Jesup WorldGate Communications May 2011

CS 457 Multimedia Applications. Fall 2014

TransMu x. Users Manual. Version 3. Copyright PixelTools Corporation

Quicktime Player Error Codec For Avi Per

CS640: Introduction to Computer Networks. Application Classes. Application Classes (more) 11/20/2007

IMPROVING LIVE PERFORMANCE IN HTTP ADAPTIVE STREAMING SYSTEMS

ONLIVE CLOUD GAMING SERVICE

Advanced High Graphics

ISO/IEC INTERNATIONAL STANDARD. Information technology JPEG 2000 image coding system: Motion JPEG 2000

EzyCast Mobile Mobile video, made simple.

TRIBHUVAN UNIVERSITY Institute of Engineering Pulchowk Campus Department of Electronics and Computer Engineering

Application and Desktop Sharing. Omer Boyaci November 1, 2007

Achieving Low-Latency Streaming At Scale

access to reformatted and born digital content regardless of the challenges of media failure and technological

Completing the Multimedia Architecture

Week 14. Video Compression. Ref: Fundamentals of Multimedia

Format-Independent Multimedia Streaming

Online Help Browser Requirements Safari *Please note: Episode 7 does not support OS X or previous versions of Mac OS X.

Outline. QoS routing in ad-hoc networks. Real-time traffic support. Classification of QoS approaches. QoS design choices

ISO/IEC TR TECHNICAL REPORT. Information technology Coding of audio-visual objects Part 24: Audio and systems interaction

Bandwidth Planning in your Cisco Webex Meetings Environment

An Architecture for Distributing Scalable Content over Peer-to-Peer Networks

Switch Release Notes. Switch

Scalable Extension of HEVC 한종기

MpegRepair Software Encoding and Repair Utility

Laboratoire d'informatique, de Robotique et de Microélectronique de Montpellier Montpellier Cedex 5 France

Instavc White Paper. Future of Enterprise Communication

ADAPTIVE STREAMING. Improve Retention for Live Content. Copyright (415)

Module objectives. Integrated services. Support for real-time applications. Real-time flows and the current Internet protocols

Streaming Media. Advanced Audio. Erik Noreke Standardization Consultant Chair, OpenSL ES. Copyright Khronos Group, Page 1

NOT FOR DISTRIBUTION OR REPRODUCTION

Lecture 04 Introduction: IoT Networking - Part I

IMMERSIVE MEDIA OVER 5G - WHAT STANDARDS ARE NEEDED?

IDM 221. Web Design I. IDM 221: Web Authoring I 1

Product Overview. Overview CHAPTER

Wowza Streaming Engine

Parallelism In Video Streaming

Introduction to Video Encoding

DVS-200 Configuration Guide

Mobile AR Hardware Futures

Digital Imaging and Communications in Medicine (DICOM) Supplement 180: MPEG-4 AVC/H.264 Transfer Syntax

Healthcare IT A Monitoring Primer

Interframe coding A video scene captured as a sequence of frames can be efficiently coded by estimating and compensating for motion between frames pri

The VC-1 and H.264 Video Compression Standards for Broadband Video Services

OTRUM Digital Signage. Solution Document

Computing in the Modern World

Envivio Mindshare Presentation System. for Corporate, Education, Government, and Medical

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

CSC 4900 Computer Networks: Multimedia Applications

Encoding Video for the Web

Transcription:

H.264 Streaming Framework for Virtual Colonoscopy CSE 523/524: Advanced Project in Computer Science Project Report Apurva Kumar apkumar@cs.stonybrook.edu August 17 th, 2016

C o n t e n t s ii CONTENTS Acknowledgements... iii Overview... 1 Virtual Colonoscopy... 2 Streaming... 2 JPEG Streaming... 2 H.264 Streaming... 3 Adding additional information... 3 Supplemental enhancement information... 3 Format and Specification... 4 Limitations... 4 Multiple Streams... 4 Omegalib... 5 FFENC... 5 Adding SEI... 6 Code... 6 Conclusion & Future Work... 7 References... 8

A c k n o w l e d g e m e n t s iii ACKNOWLEDGEMENTS I would specifically like to thank Sayedkoosha Mirhosseini for mentoring me through the project and guiding me whenever I hit dead ends. A special thanks to Ping Hu, Alessandro Febretti and the Stack Overflow community for helping out at certain points along the project. I would also like to thank my advisor Prof. Arie E. Kaufman for giving me the opportunity to pursue my interests and support me on working on this project.

OVERVIEW The goal of this project was to implement an architecture for virtual reality (VR) streaming to a mobile device like an IPad. More specifically, a framework to enable Virtual Colonoscopy (VC) on a tablet. The current setup runs on a desktop system. However, to make it more accessible and easy to use for doctors at hospitals, the goal is to make it run on simple to use hand held devices. However due to limited memory and processing power, the application cannot entirely reside within the mobile device. Hence, we set up a server-client architecture: the processing is done on the server and the final view is sent to the client to display. Previously, the codebase only supported JPEG streaming. Through this project we implement a faster H.264 streaming. We developing a framework keeping future enhancements in mind like: low latency streaming & handling real-time interactions on client.

V i r t u a l C o l o n o s c o p y 2 VIRTUAL COLONOSCOPY In recent years, virtual colonoscopy, an alternative to the traditional colonoscopy, has emerged as an option for most patients. Virtual colonoscopy is a safe, highly accurate minimally invasive CT imaging examination of the entire colon and rectum. It is a well-tolerated exam that takes about 10 minutes to complete. Its goal is the same as that of traditional colonoscopy: to identify polyps and cancers in the colon. Polyps have been shown to be the precursor of most colon cancers, and the goal of virtual colonoscopy is to find these potentially dangerous polyps before they become actual cancers. At the visualization lab at Stony Brook University, we employ advanced visualization techniques to achieve virtual imaging and exploration of the human colon. a helical CT scanner is used to obtain a sequence of 2D slices of the human abdomen. These CT slices are then reconstructed into a 3D volume and, subsequently, the human colon is visualized with various visualization techniques implemented in VolVis, which is a comprehensive volume visualization system intended for scientists and engineers as well as visualization developers. This noninvasive procedure is employed as an alternative method to existing procedures of imaging the mucosal surface of the colon. Our current implementation allows the user to achieve both planned and guided navigations inside the colon. This is a joint project between the Departments of Computer Science and Radiology. The research activities have been carried out in the Visualization Lab of the Computer Science Department and the Lab for Imaging Research and Informatics (IRIS) of the Radiology Department. STREAMING Streaming media is video or audio content sent in compressed form over the Internet and played immediately, rather than being saved to the hard drive. With streaming media, a user does not have to wait to download a file to play it. Because the media is sent in a continuous stream of data it can play as it arrives. Users can pause, rewind or fast-forward, just as they could with a downloaded file, unless the content is being streamed live. There are two main protocols used for carrying video and audio data over IP networks: HTTP and RTSP. Using these protocols, it is possible to transmit video and audio in various compression formats (JPEG, MPEG-4, H.264, AAC etc.). JPEG Streaming HTTP has long been established as a method of transmitting JPEG video streams. Motion JPEG (M-JPEG) is a video compression format in which each video frame or interlaced field of a

A d d i n g a d d i t i o n a l i n f o r m a t i o n 3 digital video sequence is compressed separately as a JPEG image and then streamed. It has it s set of advantages and disadvantages: It is simple to implement because it uses a mature compression standard (JPG) with welldeveloped libraries, and it's an intra-frame method of compression. It enjoys broad client support and Minimal hardware is required because it is not computationally intensive. Disadvantages of M-JPEG include lack of support for sound. The lack of inter-frame prediction limits its efficiency to 1:20 or lower; making it slow & causing it to consume much more bandwidth and storage. H.264 Streaming H.264 is a block-oriented motion-compensation-based video compression standard that is currently one of the most commonly used formats for the recording, compression, and distribution of video content. Unlike M-JPEG, H.264 compresses across frames: only some frames are compressed by themselves, while most of them only record changes from the previous frame. This is much faster and can save a significant amount of bandwidth. As a video codec, H.264 can be incorporated into multiple container formats, and is frequently produced in the MPEG-4 container format, which uses the.mp4 extension, as well as QuickTime (.MOV), Flash (.F4V), 3GP for mobile phones (.3GP), and the MPEG transport stream(.ts). Most of the time, but not all the time, H.264 video is encoded with audio compressed with the AAC (Advanced Audio Coding) codec, which is an ISO/IEC standard (MPEG4 Part 3). ADDING ADDITIONAL INFORMATION There is a lot of useful information that can be streamed to the client along with the frames to be rendered. This could be information about the scene in terms of meta data, or depth buffers for predicting frames & faster local rendering on the client. In our framework, we use h.264 encoding of the frames to stream to the client. This gives us two options to package additional information along with the stream: adding supplemental enhancement information & adding an extra stream. Supplemental enhancement information The h.264 compression format supports the addition of user specified meta data called supplemental enhancement information (SEI) along with every frame that is encoded. We make use of this provision to pack in useful meta-data to send to the client.

A d d i n g a d d i t i o n a l i n f o r m a t i o n 4 Format and Specification Network Abstraction Layer (NAL) and Video Coding Layer (VCL) are the two main concepts in H.264. A H.264 file consists of a number of NAL units (NALU) and each NALU can be classified as VCL or non-vcl. Video data is processed by the codec and packed into NAL units. A three-byte or four-byte start code, 0x000001 or 0x00000001, is added at the beginning of each NAL unit. They are called Byte-Stream Format and help the decoder find the boundaries of the NALU easily. In a NALU, the first byte is a header byte indicating the type of data contained in it and other information. The rest of bytes are the payload of a NAL unit. A NALU of type 6 indicates that the following bytes represent a SEI payload. The next byte indicates the type of SEI payload. A SEI of type 5 represents unregistered user data, which is what we will be using. The next byte indicates the size of the SEI data followed by the data itself. We then end the payload by using the end code: 0x80 The following is an example of a NALU for a user SEI message: Can be broken down as: \x00\x00\x01\x06\x05\x05\x68\x65\x6c\x6c\x6f\x80 \x00\x00\x01 - NAL unit identifier \x06 - Indicating NAL unit of type 6 ie. SEI \x05 - Indicating SEI of type 5 ie. User Data Unregistered \x05 - Indicating size of SEI payload (1 byte) \x68\x65\x6c\x6c\x6f - SEI data: hello (in hex) \x80 - Indicates end of SEI NAL unit Limitations According to the h.264 standard, the user specified SEI payload can be only 1 byte in size. Hence, the amount of information that can be packaged is limited to 255 bytes. If only small amount of data is to be supplied, this is the best way to package data with minimal overhead for encoding and streaming. Multiple Streams Another way to package additional data is to mux multiple streams together into a container (like MP4) and then stream to the client. If we want to package more than 255 bytes of data, adding and additional stream containing the desired information on a frame to frame basis can solve the problem. Increasing the number of streams will increase the bandwidth required.

O m e g a l i b 5 There is also an overhead in encoding every extra stream of data and then muxing all the streams into a container. We also encounter an additional overhead of de-muxing the streams on the client before we can decode each stream. OMEGALIB Today s visualization, visual analytics and virtual-reality technologies could significantly facilitate and enhance human insight and knowledge. Multidisciplinary teams rely on a variety of domain-specific and/or special-purpose software libraries that do not interoperate, do not take advantage of the changing landscape of computing platforms, and do not take advantage of new consumer-priced and advanced 2D and 3D display systems. Omegalib is an integrated Hybrid Framework for Scientific Visualization that addresses these challenges. Omegalib lets researchers tightly couple multiple libraries to create combined or linked visualizations; to utilize a variety of display devices, from smartphones and 3D headmounted displays to conference-room monitors to room-sized immersive environments; and, to use cloud computing to render complex graphics and then stream to personal devices using a web browser. Omegalib tightly couples 2D/3D visualizations and virtual environments with computing and display platforms to create an ecosystem that allows scientists to focus more time on analysis and discovery. Omegalib is a joint venture which includes computer science developers at University of Illinois at Chicago and Stony Brook University partnering with domain scientists in astrophysics, engineering, geoscience, and molecular modeling to expand Omegalib s features. Our VC system uses Omegalib as a core component to visualize & render the colon. In this project, we use the porthole module of omegalib to stream the visualized data to the browser. Omegalib currently only supports JPEG streaming, and the main goal of this project is to develop a module (ffenc) which lies between the core omegalib and porthole module and is responsible for h.264 encoding of frames on the fly along with support for further future enhancements. FFENC The major work of this project focusses on developing an h.264 encoding module for Omegalib with support for adding SEI data and multiple streams. This has been done utilizing FFmpeg and hence the module has been named FFENC (FFmpeg ENCoder). FFmpeg is a free software project that produces libraries and programs for handling multimedia data. FFmpeg includes libavcodec, an audio/video codec library used by several other projects, libavformat, an audio/video container mux and demux library, and the ffmpeg command line program for transcoding multimedia files.

F F E N C 6 FFENC is a standalone module by itself; but in the nature of our project it is heavily coupled with the Porthole module in Omegalib. Porthole is a framework that helps Virtual Environment applications developers to generate decoupled HTML5 interfaces. The volume to be rendered is passed onto Porthole on per frame basis. Porthole utilized FFENC to encode every frame as desired and then streams it to the browser. The workflow of FFENC is as follows: first the FFmpeg h.264 encoder is initialized with finetuned parameters. It then has an interface to encode a frame passed on to it. Frames are stored in a specific pixel format within Omegalib. They are first converted into an FFmpeg accessible RGB24 format and then passed on to the encoder. If specified by the application, there is also a provision to add a SEI message or add another stream for metadata prior to encoding. Adding SEI Even though there is a h.264 standard defining SEI NAL syntax structure, there is currently no API in the libavcodec/libavformat libraries to assign this data. However, provision for adding the SEI data per frame is an integral aspect of the framework. After countless hours spent analyzing the h.264 standard and the its implementation in libavcodec/libavformat, I devised a small hack to pack in the SEI data. I was able pinpoint the location in memory that the SEI NALU is supposed to reside. I explicitly create a SEI NAL unit and copy it to the desired offset at the memory location prior to encoding. Once encoded, while analyzing every frame of the video, I was able to recover the SEI message transmitted and thus verified the correctness of the hack. Provision for adding SEI metadata is available only for the FFmpeg CPU based h.264 encoder which uses x.264 encoding libraries. Currently the hardware accelerated ffmpeg h.264 encoders don t have any support for encoding SEI data. Code FFENC module is free and open source; the source code for the same is currently available on my github: https://github.com/cruxeon/ffenc

C o n c l u s i o n & F u t u r e W o r k 7 CONCLUSION & FUTURE WORK With the FFENC module integrated into Omegalib, we now have a framework for efficiently streaming VR content using Omegalib. More specifically, we can now use this module to stream the VC application to mobile devices. The next step involves creating a mobile VC web app for the tablet as an interface for the streamed data. We can devise and implement an intuitive interaction scheme to interact with a volume (colon) on the client. Some inputs like measurements, comments and bookmarking can be handled directly on the client. Navigational interactions will be sent to the server for processing and sending back subsequent frames. Another possible future extension is to enable low latency streaming. Our framework already allows packaging of extra information in the form of SEI messages and additional streams. Extra information, like depth buffers can be sent to the client for fast local rendering of predicted future frames. This will effectively hide the latency to inputs on the client side and we can have scene correction on the fly for the predicted frames.

R e f e r e n c e s 8 REFERENCES 1. https://www.stonybrookmedicine.edu/patientcare/virtual-colonoscopy 2. https://labs.cs.sunysb.edu/labs/vislab/3d-virtual-colonoscopy-home/ 3. http://bensoftware.com/blog/comparison-of-streaming-formats/ 4. https://blog.angelcam.com/what-is-the-difference-between-mjpeg-and-h-264/ 5. https://en.wikipedia.org/wiki/motion_jpeg 6. https://en.wikipedia.org/wiki/h.264/mpeg-4_avc 7. https://www.itu.int/rec/t-rec-h.264-201602-i/en 8. http://uic-evl.github.io/omegalib/ 9. https://github.com/uic-evl/omegalib 10. https://github.com/omega-hub 11. http://ip.hhi.de/imagecom_g1/assets/pdfs/h264_iso-iec_14496-10.pdf 12. http://yumichan.net/video-processing/video-compression/introduction-to-h264-nal-unit/ 13. https://ffmpeg.org/ 14. https://en.wikipedia.org/wiki/ffmpeg