Syntax-directed content analysis of videotext application to a map detection and recognition system

Size: px
Start display at page:

Download "Syntax-directed content analysis of videotext application to a map detection and recognition system"

Transcription

1 Syntax-directed content analysis of videotext application to a map detection and recognition system Hrishikesh Aradhye *, James Herson, and Gregory Myers SRI International, 333 Ravenswood Avenue, Menlo Park, CA ABSTRACT Video is an increasingly important and ever-growing source of information to the intelligence and homeland defense analyst. A capability to automatically identify the contents of video imagery would enable the analyst to index relevant foreign and domestic news videos in a convenient and meaningful way. To this end, the proposed system aims to help determine the geographic focus of a news story directly from video imagery by detecting and geographically localizing political maps from news broadcasts, using the results of videotext recognition in lieu of a computationally expensive, scale-independent shape recognizer. Our novel method for the geographic localization of a map is based on the premise that the relative placement of text superimposed on a map roughly corresponds to the geographic coordinates of the locations the text represents. Our scheme extracts and recognizes videotext, and iteratively identifies the geographic area, while allowing for OCR errors and artistic freedom. The fast and reliable recognition of such maps by our system may provide valuable context and supporting evidence for other sources, such as speech recognition transcripts. The concepts of syntax-directed content analysis of videotext presented here can be extended to other content analysis systems. Keywords: Video OCR, video content analysis, map detection and recognition, syntax-directed recognition and retrieval 1. INTRODUCTION The volume of collected multimedia data of potential interest to the intelligence and homeland defense analyst is expanding at a tremendous rate. A capability to automatically identify the contents of video imagery would enable videos to be indexed in a convenient and meaningful way for later reference, and would enable actions such as automatic notification and dissemination to be triggered in real time by the contents of streaming video. Besides speech, closed captioning, and visual content, videotext (text superimposed on images and video frames) is an important source of semantic information in video streams of news broadcasts. The recognition of text superimposed on video frames yields useful information such as the identity of a speaker, his or her location, the topic under discussion, sports scores, product names, and associated shopping data, allowing for automated content description, search, event monitoring, and video program categorization. For instance, the proposed system can detect and geographically localize near-full-screen political maps from news broadcasts, such as the maps shown in Fig. 1, using image analyses based on videotext recognition in lieu of a rigorous, scale-independent generic shape recognizer. The recognition of such maps can help determine the geographic focus of a news story directly from video imagery; and may provide valuable context and supporting evidence for the speech recognition transcripts generated from the audio track of the story. The recognition of text is easier and faster than the recognition of objects in an arbitrarily complex scene, because text has been designed to be readable and has regular forms that humans can easily interpret. For these reasons, recent work has focused on the use of videotext extraction and recognition, instead of rigorous object recognition, for content analysis. For instance, a recent video content retrieval system 1 learns to associate extracted faces from video frames with recognized textual content of the superimposed caption, which presumably includes the names of the persons shown in the video. The system then uses videotext recognition results, along with a lexicon of names, to recognize the occurrences of the persons faces in news broadcasts. Analogously, most published videotext recognition work focuses on the textual content of videotext objects. Our work, however, is instead based on our contention that the syntactic aspects of videotext objects, such as their placement relative to the frame as well as to each other, in addition to their size, typeface, and color, may present the intelligence analyst with additional, yet-unexplored information. The current application is focused on the detection and recognition of political maps from news broadcasts. However, similar * Hrishikesh.aradhye@sri.com; phone ; fax Document Recognition and Retrieval X, Tapas Kanungo, Elisa H. Barney Smith, Jianying Hu, Paul B. Kantor, Editors, Proceedings of SPIE-IS&T Electronic Imaging, SPIE Vol (2003) 2003 SPIE-IS&T X/03/$

2 principles can be extended to other video-based detection and recognition systems, such as those that recognize scores and statistics in sports broadcasts. As in other work in video content analysis, such as the detection and recognition of human faces in video, we first use the unique image-domain features of political maps to detect the presence of a map on a given frame. Next, we attempt to pinpoint the geographical area covered in the map, such as Eastern Europe, South-East Asia, or the Middle East. To this end, it would suffice to estimate the geographical coordinates of the center of the map and its magnification. Explicit shape recognition of the territorial lines on the map would be a difficult and computationally expensive task. Our method is instead based on the premise that the relative placement of text superimposed on a map roughly corresponds to the geographic locations the text represents. For instance, a map of North and South Americas from Fig. 1 displays the geotext UNITED STATES and BRAZIL placed above left and below right, respectively, of the geo-text MEXICO. Since this relative placement is roughly consistent with the known geographic fact that the U.S. and Brazil are to the north and southeast of Mexico, one may conclude that the map in question is indeed of the Americas, given the coordinates of the geo-text relative to the frame. Our key assumption then is that the names of geographical locations such as UNITED STATES, MEXICO, and BRAZIL would not appear on a video frame with a nearly bimodal color scheme as (1) isolated, unjustified words and (2) at geographically consistent distances and directions unless it were a map of the Americas. This approach is preferable to map shape recognition because of its simplicity, generality, and scale-invariant nature. The map localization is of course approximate, since the graphic artist has some degree of freedom to arrange the text in a geographically consistent yet readable and uncluttered manner. However, we contend that an approximate geographic localization may be sufficient for the purposes of video content analysis. The following sections describe our approach for the detection and recognition of near-full-screen political maps in greater detail. 2. VIDEOTEXT EXTRACTION AND RECOGNITION The earliest efforts in videotext extraction and recognition were applied to text captions in commercially produced video 2, 3, 4. Certain constraints make the task of videotext extraction and recognition a challenging one: these include the low resolution of videotext, unconstrained font styles and sizes, poor separation of characters often resulting from compression and decoding, and complex, colorful moving backgrounds. Methods of detecting and locating text try to take advantage of distinguishing characteristics of text such as consistency in alignment, orientation, stroke thickness, character height, spacing, and intensity or color. All approaches have two main steps: (1) apply filters or other processing to produce a high response in text areas, and (2) coalesce the high-response pixels into regions or individual text lines. One class of methods uses color clustering techniques 5 or binarization 6 to identify pixels belonging to text, but these methods are not sufficient by themselves to robustly distinguish a wide variety of text amid complex backgrounds. Another class of approaches measures spatial frequency or texture with spatial variance 7, Gabor filtering 8, Gaussian filtering 9, or wavelets 4 to locate candidate regions of text. A third class of approaches 9, 10, 11, including our work, detects vertically oriented strokes or character edges, and then links or clusters them by a set of rules that depend on the characteristics of the individual elements. Therefore, compared to the texture-based filtering approach, in this approach coalescing can be more finely controlled so as to avoid including non-text image items close to or touching the text. Our videotext recognition process, shown in Fig. 2, operates on individual frames extracted from a video sequence. It produces OCR results that are time tagged: that is, each word in the OCR results corresponds to a single instance of text with a starting frame time (when it first appears) and an ending frame time (when it disappears from the image). The processing steps are arranged in a pipelined architecture that has a latency of up to several seconds. All of the processing is implemented in software in C++ and runs under Windows NT. In each processing cycle the following steps are executed: 1. Individual lines of text are located in the gray-scale image. 2. Each text line is binarized. 3. The OCR engine is applied to each of the binarized text lines. We now describe each of these steps in detail. 58 Proc. of SPIE Vol. 5010

3 2.1 Text localization Our approach to text location assumes that the text is roughly horizontal, and that the characters have a minimum level of contrast with the image background. The text may be of either polarity (light text on a dark background, or dark text on a light background). The process first detects vertically oriented edge transitions in the gray-scale image, using a Sobel operator. The output of the operator is thresholded to form two binary images, one for dark-to-light transitions (B 1 ), and the other for light-to-dark transitions (B 2 ). Fig. 3a shows a sample gray-scale image that contains both light and dark text, and Fig. 3b shows the corresponding light-to-dark edge transition image B 2 overlaid on the gray-scale image. A connected components algorithm is run on each binary image. The connected components that have been determined (by examining their height and area) not to be due to text are eliminated. Fig. 3c shows the eliminated connected components in red. The remaining connected components are linked to form lines of text by searching the areas to the left and right of each connected component for additional connected components that are compatible in size and relative position. Finally, a rectangle is fitted to each line of detected text. Fig. 3d shows the results of the text location process. This approach is quite fast and can accommodate the entire range of font sizes in a single processing pass through the image data. Text regions are eliminated if their height is less than 6 pixels or the height-to-width ratio is greater than 0.5. The parameters for text location were tuned so as to minimize the possibility of missing any text. 2.2 Binarization Binarization is performed on each text line independently. We assume that the text pixels are relatively homogeneous, and that the intensity of the background pixels may be highly variable. For each text line, the polarity of text is determined, and then a fixed threshold is chosen for the binarization of the text line. To determine the polarity of text, three histograms are computed. The gray-scale image is smoothed with a Gaussian kernel in preparation for computing histograms H 1 and H 2. Histogram H 1 is composed of gray-scale pixels in the smoothed image on the right side of the connected components in the dark-to-light edge transition image B 1 and on the left side of the light-to-dark edge transition image B 2. If light text is in this text region, these are the pixels most likely to belong to light text or near the edge of light text. Similarly, histogram H 2 is composed of gray-scale pixels in the smoothed image on the right side of the connected components in image B 2 and on the left side of the image B 1. H 3 is the histogram of a line of pixels immediately above the text line in the original gray-scale image. The gray-scale value G i at the peak of each histogram H i is found. The polarity of the text is determined by the following: if G 1 G 3 < G 2 G 3 GMinDiff, then the text is light; else if G 2 G 3 < G 1 G 3 GMinDiff, then the text is dark; else text could be either light or dark. The threshold for the text line is then set to the gray value at the 80th percentile in histogram H 1 or H 2, depending on the polarity chosen. If the text could be either light or dark, binarizations of both polarities are sent to the OCR processing step. 2.3 OCR The binarized text lines are deskewed and packed into a single buffer for processing by the OCR engine. We used the Caere Corporation DevKit2000 commercial OCR package. The output of the OCR process is a series of hierarchical structures (one for each processed frame) of text lines, words, and characters with multiple candidate identities for each recognized text character in the image, rank ordered according to likelihood. A confidence value is associated with the top-ranked character. These structures were stored in Document Attribute Format Specification (DAFS) format, 12 a standard for representing OCR and document image decomposition data. Proc. of SPIE Vol

4 3. MAP DETECTION The first phase of our system identifies those video frames with full-screen maps, based on the text localization and recognition results obtained as described above, and a rough image-level feature analysis. Each frame is processed independently. 3.1 Feature extraction The following features were designed to characterize and distinguish maps from other on-screen objects. Color homogeneity features Most full-screen political maps from news broadcasts can be characterized by a small number of primary colors, typically corresponding to land, water, and nation boundaries. Shades of blue usually represent bodies of water. Shades of gray, brown, or green usually represent land. The land and water segments usually constitute most of the map area and are largely spatially contiguous and homogeneous in color. Political boundaries between nations are often displayed in darker shades of the color chosen for the land sections of the map. The following features attempt to quantify these characteristics. 1. Segmentation error: This feature is defined as the mean squared error between the original image and the segmented image. 2. Color reduction factor: We compute the reduction in the number of image colors due to segmentation as the ratio of the number of colors in the segmented image to the number of colors in the original image. Contour features Compared with most non-map content from news broadcasts, maps have sparser contour lines, usually resulting from the presence of boundaries of nations or states. We compute contour sparseness as the fraction of the total number of nontext pixels that belong to contour lines. Content-independent text features Videotext is usually displayed in contrasting colors for better readability. Most text on full-screen political maps corresponds to names of geographic locations such as nations or cities. Caption text, such as the title of the story, may sometimes be present, typically in the bottom third of the frame. Text designating geographical locations tends to appear as single, isolated words spread over the image. Most names of nations are composed of one to two words. Two-word nation names are typically displayed as one single line of text or two left-justified lines of text with one word each. We use the following two text features: 1. Average contrast for text: We define the average contrast for text as the average ratio of gray values for the foreground and background pixels in videotext. 2. Text distribution index: We characterize text distribution as the fraction of the total number of videotext objects that are isolated, that contain a maximum of two words, and that satisfy the above-described justification constraints. Lexicon-based features Most isolated words on a political map are names of geographical locations such as nations and cities. We define a lexicon-match index feature as the number of isolated single or two-word videotext objects that match one of the entries in the specified lexicon of geographical locations, with the degree of match exceeding a given minimum acceptable level. 3.2 Feature matching To detect the presence or absence of a full-screen map, the above-defined features are used in a manually configured decision tree. The decision rules attempt to encode the typical characteristics of full-screen maps described above. Our 60 Proc. of SPIE Vol. 5010

5 ongoing work has focused on automating the generation of the decision tree via the use of machine learning methods such as C MAP RECOGNITION As stated earlier, the method of map recognition presented in this paper is based on the premise that the relative placement of text superimposed on a map roughly corresponds to the geographic locations the text represents. Such placement of text makes a map more readable and understandable by the viewer, which is critical in light of the brief time the map is displayed. This approach can be expected to be computationally less expensive than a generic scaleindependent shape recognizer for matching nation-boundary contours. Such an analysis, however, must address three coupled uncertainties. 1. Uncertainty in placement of text: Within the loose limits set by the geographical boundaries of the nation in question, the graphic artist may have significant freedom to place the text for better appearance and/or readability. This is especially true for large countries. 2. Uncertainty in perceived content of text: The extraction and recognition mechanism for videotext is not perfect: the text may not be detected at all, only a part of the text may be detected, word segmentation may be inaccurate, and the character recognition results may have errors. 3. Uncertainty of scale: The scale of the map, in terms of latitudes or longitudes per a unit distance in the video frame, is not known a priori. In addition, the curvature of the earth s surface may cause the scale to be different in different parts of the same map. 4.1 Iterative optimization procedure When multiple geo-text objects occur in the same frame, the resulting redundancies may mitigate the above uncertainties. In other words, one may be able to iteratively optimize the global consistency of the frame with respect to the placement and content of the text and the scale of the map. Given the results of text extraction and recognition for isolated videotext objects, our map recognition algorithm consists of the following steps: 1. Initialize the set of potential geo-texts to include all isolated text objects containing a maximum of two words. 2. For a lower lexicon-match threshold, reduce the current set of potential geo-texts to those that result in an acceptable match with any one of the lexicon entries. If this set contains less than two objects, stop. The frame does not have enough information to recognize this map. 3. Set the actual or pixel-domain X and Y coordinates of each geo-text object to those of the centroid of its bounding box. 4. For each geo-text object that matches the name of a city, set its expected or geographical X and Y coordinates to be its latitude and longitude, respectively. For each geo-text object that matches the name of a nation, set its expected X and Y coordinates to be the latitude and longitude of one of its major cities. Iterate step 5 for all major cities and all geo-texts that match the name of a nation. In other words, use the major cities from a nation as sample points where the geo-text could be placed. The assumption here is that the major cities adequately cover the geographical area of the entire nation. 5. Compute the acceptability of the current set of hypotheses by calculating the least-squares linear fit between the set of actual and expected coordinates of all geo-texts. 6. With ideal recognition, placement, and linear geographical scale, the actual and expected coordinates would be perfectly linearly related. The slope of this line would correspond to the scale of the map, and the intercepts would correspond to the offset. Compute the best set of sample cities for the nation geo-texts by maximizing the acceptability of the results from step If the maximum acceptability of the results from step 5 is less than a given threshold, sufficiently increase the match threshold from step 2 and iterate steps 1 through 6. This reduces the possibility that an incorrect lexicon match might prevent convergence. Proc. of SPIE Vol

6 8. The parameters of the linear fit provide the geographical scale and offset of the map in the image. Calculate the latitudes and longitudes of the four corners of the full-screen map. Map recognition is complete. Note that step 2 requires a minimum of two geo-text objects in a given frame to match the lexicon entries, since it takes at least two data points to compute a linear fit. 4.2 Temporal agglomeration For enhanced readability and emphasis, maps and the accompanying videotext are typically displayed over multiple frames at a time. Our experiments indicated that a full-screen map is typically displayed for a minimum time of 5 s. Unlike still images, this temporal contiguity of video content presents a redundancy that can be exploited to improve the accuracy of map detection and localization. Our approach for temporal aggregation in the context of full-screen map recognition consists of area-based histogramming over the duration of the recognized map. The thresholded plateau of this histogram represents a consensus area over many temporally recognized frames and mitigates the effect of the recognition errors and placement uncertainties discussed above. 5. RESULTS We applied the above algorithms for map detection, localization, and temporal agglomeration to three illustrative MPEG-1 video clips from the CNN interactive newsroom (available at Figs. 4 and 5 show sample map localization results. Fig. 4a shows an example frame with a map of the Korean peninsula. As can be seen in Fig. 4b, our analysis successfully localizes the geographical region depicted in this video frame, based solely on the relative positions of the videotext in the frame. Fig. 5 presents the intermediate steps of our analysis in more detail. The input image from Fig. 5a was subjected to videotext extraction and recognition, the raw results of which are presented in Fig. 5b. The text strings that resulted in an acceptable degree of match with a lexicon of country names is shown in Fig. 5c. Note that the word CHINA is detected twice. The iterative optimization procedure from Section 4 is then applied to jointly improve on the degree of lexicon matches and overall consistency with expected geographic locations. Figures 5d and 5e show the results of this procedure. We can see that the region depicted in the video frame was successfully localized. Our test videos contained over 45,000 frames, of which roughly 2000 contained full-screen political maps. The video frames with maps were manually ground truthed to record the geographical area covered in the map. We define the following area-based precision and recall metrics to quantify the map localization performance of the agglomerated result for each temporally contiguous map sequence. The area-based precision is defined as the ratio of the area of the overlap between ground-truthed and predicted map localization to the area of the predicted map localization. In an analogous manner, the area-based recall is defined as the ratio of the area of the overlap between ground-truthed and predicted map localization to the area of the ground-truthed map localization. For each agglomerated map sequence, we assumed that area-based precision and recall ratios greater than 75% constituted an acceptable overlap for the purposes of content analysis. Given this threshold, our method was able to successfully localize 75% of the political map sequences in the CNN video clips. Lack of success in some cases was due to the presence in some of the frames of maps with only a single geo-text object, which prevented the estimation of map scale and offset based on a linear fit. There were no false alarms. 6. CONCLUSION This work has demonstrated a novel approach to video content analysis that does not require rigorous object recognition. Superimposed videotext is often used by broadcasters as a precise mechanism to convey specific information to the viewer. Our work makes use of this fact by analyzing not only the textual content of the recognized videotext, but also its syntactic attributes such as its relative location, color, and size. This concept is illustrated in this paper as a system that detects and recognizes near-full-screen political maps from news video. It has been demonstrated to work well without expensively rigorous object-recognition computation. 62 Proc. of SPIE Vol. 5010

7 REFERENCES 1. R. Houghton, Named Faces: Putting Names to Faces, Intelligent Systems, 14:5, pp R. Lienhart, Indexing and Retrieval of Digital Video Sequences based on Automatic Text Recognition, in Proc. Fourth ACM International Multimedia Conf., T. Sato, K. Takeo, E. Hughes, and M. Smith, Video OCR for Digital News Archive, in Proc Intl. Workshop on Content-Based Access of Image and Video Databases (CAIVD 98), H. Li and D. Doermann, Automatic Identification of Text in Digital Video Key Frames, in Proc. Intl. Conf. on Pattern Recognition, pp , A. Jain and B. Yu, Automatic Text Location in Images and Video Frames, in Proc Intl. Conf. on Pattern Recognition, pp , J. Ohya, A. Shio, and S. Akamatsu, Recognizing Characters in Scene Images, IEEE Transactions on Pattern Analysis and Machine Intelligence, 16:2, pp , Y. Zhong, K. Karu, and A. Jain, Locating Text in Complex Color Images, in Proceedings of the Third International Conference on Document Analysis and Recognition, A. Jain and S. Bhattacharjee, Text Segmentation Using Gabor Filters for Automatic Document Processing, Machine Vision and Applications 5, pp , V. Wu, R. Manmatha, and E. Riseman, Automatic Text Detection and Recognition, in Proc. Image Understanding Workshop, pp , G. Myers, J. Herson, J. DeCurtins, R. Bolles, and A. Stolcke, Multimodal Fusion for Autonomous TV Monitoring (AVTM): Phase 3 Final Report, ITAD-1681-FR , SRI International, Menlo Park, California, M.A. Smith and T. Kanade, Video Skimming for Quick Browsing Based on Audio and Image Characterization, Technical Report CMU-CS , Carnegie Mellon University, DAFS.ORG: Supporting the Document Attribute Format Specification (DAFS) standard, Proc. of SPIE Vol

8 Figure 1: Near-full-screen political maps in news videos. Gray -Scale Image Text Location Text Line Coordinates Binarization Binarized Text Lines OCR OCR Results Figure 2: Videotext recognition process. 64 Proc. of SPIE Vol. 5010

9 (a) Input image (b) Transition image overlay (c) Candidate connected components (d) Located videotext Figure 3: Results of text location (a) Input image (b) Geographical localization Figure 4: Map recognition results. Proc. of SPIE Vol

10 (a) Input image (b) Raw recognized videotext (c) Place name lexicon matches (d) Geographically consistent matches Figure 5: Map recognition results. (e) Localized map area 66 Proc. of SPIE Vol. 5010

Text Enhancement with Asymmetric Filter for Video OCR. Datong Chen, Kim Shearer and Hervé Bourlard

Text Enhancement with Asymmetric Filter for Video OCR. Datong Chen, Kim Shearer and Hervé Bourlard Text Enhancement with Asymmetric Filter for Video OCR Datong Chen, Kim Shearer and Hervé Bourlard Dalle Molle Institute for Perceptual Artificial Intelligence Rue du Simplon 4 1920 Martigny, Switzerland

More information

Time Stamp Detection and Recognition in Video Frames

Time Stamp Detection and Recognition in Video Frames Time Stamp Detection and Recognition in Video Frames Nongluk Covavisaruch and Chetsada Saengpanit Department of Computer Engineering, Chulalongkorn University, Bangkok 10330, Thailand E-mail: nongluk.c@chula.ac.th

More information

CORRELATION BASED CAR NUMBER PLATE EXTRACTION SYSTEM

CORRELATION BASED CAR NUMBER PLATE EXTRACTION SYSTEM CORRELATION BASED CAR NUMBER PLATE EXTRACTION SYSTEM 1 PHYO THET KHIN, 2 LAI LAI WIN KYI 1,2 Department of Information Technology, Mandalay Technological University The Republic of the Union of Myanmar

More information

IDIAP IDIAP. Martigny ffl Valais ffl Suisse

IDIAP IDIAP. Martigny ffl Valais ffl Suisse R E S E A R C H R E P O R T IDIAP IDIAP Martigny - Valais - Suisse ASYMMETRIC FILTER FOR TEXT RECOGNITION IN VIDEO Datong Chen, Kim Shearer IDIAP Case Postale 592 Martigny Switzerland IDIAP RR 00-37 Nov.

More information

TEVI: Text Extraction for Video Indexing

TEVI: Text Extraction for Video Indexing TEVI: Text Extraction for Video Indexing Hichem KARRAY, Mohamed SALAH, Adel M. ALIMI REGIM: Research Group on Intelligent Machines, EIS, University of Sfax, Tunisia hichem.karray@ieee.org mohamed_salah@laposte.net

More information

IDIAP IDIAP. Martigny ffl Valais ffl Suisse

IDIAP IDIAP. Martigny ffl Valais ffl Suisse R E S E A R C H R E P O R T IDIAP IDIAP Martigny - Valais - Suisse Text Enhancement with Asymmetric Filter for Video OCR Datong Chen, Kim Shearer and Hervé Bourlard Dalle Molle Institute for Perceptual

More information

Text Area Detection from Video Frames

Text Area Detection from Video Frames Text Area Detection from Video Frames 1 Text Area Detection from Video Frames Xiangrong Chen, Hongjiang Zhang Microsoft Research China chxr@yahoo.com, hjzhang@microsoft.com Abstract. Text area detection

More information

Scene Text Detection Using Machine Learning Classifiers

Scene Text Detection Using Machine Learning Classifiers 601 Scene Text Detection Using Machine Learning Classifiers Nafla C.N. 1, Sneha K. 2, Divya K.P. 3 1 (Department of CSE, RCET, Akkikkvu, Thrissur) 2 (Department of CSE, RCET, Akkikkvu, Thrissur) 3 (Department

More information

Texture Segmentation by Windowed Projection

Texture Segmentation by Windowed Projection Texture Segmentation by Windowed Projection 1, 2 Fan-Chen Tseng, 2 Ching-Chi Hsu, 2 Chiou-Shann Fuh 1 Department of Electronic Engineering National I-Lan Institute of Technology e-mail : fctseng@ccmail.ilantech.edu.tw

More information

Recognition of Gurmukhi Text from Sign Board Images Captured from Mobile Camera

Recognition of Gurmukhi Text from Sign Board Images Captured from Mobile Camera International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 17 (2014), pp. 1839-1845 International Research Publications House http://www. irphouse.com Recognition of

More information

A Robust Wipe Detection Algorithm

A Robust Wipe Detection Algorithm A Robust Wipe Detection Algorithm C. W. Ngo, T. C. Pong & R. T. Chin Department of Computer Science The Hong Kong University of Science & Technology Clear Water Bay, Kowloon, Hong Kong Email: fcwngo, tcpong,

More information

A New Algorithm for Detecting Text Line in Handwritten Documents

A New Algorithm for Detecting Text Line in Handwritten Documents A New Algorithm for Detecting Text Line in Handwritten Documents Yi Li 1, Yefeng Zheng 2, David Doermann 1, and Stefan Jaeger 1 1 Laboratory for Language and Media Processing Institute for Advanced Computer

More information

CS 223B Computer Vision Problem Set 3

CS 223B Computer Vision Problem Set 3 CS 223B Computer Vision Problem Set 3 Due: Feb. 22 nd, 2011 1 Probabilistic Recursion for Tracking In this problem you will derive a method for tracking a point of interest through a sequence of images.

More information

EXTRACTING TEXT FROM VIDEO

EXTRACTING TEXT FROM VIDEO EXTRACTING TEXT FROM VIDEO Jayshree Ghorpade 1, Raviraj Palvankar 2, Ajinkya Patankar 3 and Snehal Rathi 4 1 Department of Computer Engineering, MIT COE, Pune, India jayshree.aj@gmail.com 2 Department

More information

Automatic Video Caption Detection and Extraction in the DCT Compressed Domain

Automatic Video Caption Detection and Extraction in the DCT Compressed Domain Automatic Video Caption Detection and Extraction in the DCT Compressed Domain Chin-Fu Tsao 1, Yu-Hao Chen 1, Jin-Hau Kuo 1, Chia-wei Lin 1, and Ja-Ling Wu 1,2 1 Communication and Multimedia Laboratory,

More information

Texture Analysis. Selim Aksoy Department of Computer Engineering Bilkent University

Texture Analysis. Selim Aksoy Department of Computer Engineering Bilkent University Texture Analysis Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr Texture An important approach to image description is to quantify its texture content. Texture

More information

Human Motion Detection and Tracking for Video Surveillance

Human Motion Detection and Tracking for Video Surveillance Human Motion Detection and Tracking for Video Surveillance Prithviraj Banerjee and Somnath Sengupta Department of Electronics and Electrical Communication Engineering Indian Institute of Technology, Kharagpur,

More information

Searching Video Collections:Part I

Searching Video Collections:Part I Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia Representation Visual Features (Still Images and Image Sequences) Color Texture Shape Edges Objects, Motion

More information

CS 231A Computer Vision (Fall 2012) Problem Set 3

CS 231A Computer Vision (Fall 2012) Problem Set 3 CS 231A Computer Vision (Fall 2012) Problem Set 3 Due: Nov. 13 th, 2012 (2:15pm) 1 Probabilistic Recursion for Tracking (20 points) In this problem you will derive a method for tracking a point of interest

More information

Overlay Text Detection and Recognition for Soccer Game Indexing

Overlay Text Detection and Recognition for Soccer Game Indexing Overlay Text Detection and Recognition for Soccer Game Indexing J. Ngernplubpla and O. Chitsophuk, Member, IACSIT Abstract In this paper, new multiresolution overlaid text detection and recognition is

More information

Text Information Extraction And Analysis From Images Using Digital Image Processing Techniques

Text Information Extraction And Analysis From Images Using Digital Image Processing Techniques Text Information Extraction And Analysis From Images Using Digital Image Processing Techniques Partha Sarathi Giri Department of Electronics and Communication, M.E.M.S, Balasore, Odisha Abstract Text data

More information

CITS 4402 Computer Vision

CITS 4402 Computer Vision CITS 4402 Computer Vision A/Prof Ajmal Mian Adj/A/Prof Mehdi Ravanbakhsh, CEO at Mapizy (www.mapizy.com) and InFarm (www.infarm.io) Lecture 02 Binary Image Analysis Objectives Revision of image formation

More information

Digital Image Classification Geography 4354 Remote Sensing

Digital Image Classification Geography 4354 Remote Sensing Digital Image Classification Geography 4354 Remote Sensing Lab 11 Dr. James Campbell December 10, 2001 Group #4 Mark Dougherty Paul Bartholomew Akisha Williams Dave Trible Seth McCoy Table of Contents:

More information

Elimination of Duplicate Videos in Video Sharing Sites

Elimination of Duplicate Videos in Video Sharing Sites Elimination of Duplicate Videos in Video Sharing Sites Narendra Kumar S, Murugan S, Krishnaveni R Abstract - In some social video networking sites such as YouTube, there exists large numbers of duplicate

More information

An Introduction to Content Based Image Retrieval

An Introduction to Content Based Image Retrieval CHAPTER -1 An Introduction to Content Based Image Retrieval 1.1 Introduction With the advancement in internet and multimedia technologies, a huge amount of multimedia data in the form of audio, video and

More information

I. INTRODUCTION. Figure-1 Basic block of text analysis

I. INTRODUCTION. Figure-1 Basic block of text analysis ISSN: 2349-7637 (Online) (RHIMRJ) Research Paper Available online at: www.rhimrj.com Detection and Localization of Texts from Natural Scene Images: A Hybrid Approach Priyanka Muchhadiya Post Graduate Fellow,

More information

A Content Based Image Retrieval System Based on Color Features

A Content Based Image Retrieval System Based on Color Features A Content Based Image Retrieval System Based on Features Irena Valova, University of Rousse Angel Kanchev, Department of Computer Systems and Technologies, Rousse, Bulgaria, Irena@ecs.ru.acad.bg Boris

More information

Text Extraction in Video

Text Extraction in Video International Journal of Computational Engineering Research Vol, 03 Issue, 5 Text Extraction in Video 1, Ankur Srivastava, 2, Dhananjay Kumar, 3, Om Prakash Gupta, 4, Amit Maurya, 5, Mr.sanjay kumar Srivastava

More information

Spam Filtering Using Visual Features

Spam Filtering Using Visual Features Spam Filtering Using Visual Features Sirnam Swetha Computer Science Engineering sirnam.swetha@research.iiit.ac.in Sharvani Chandu Electronics and Communication Engineering sharvani.chandu@students.iiit.ac.in

More information

Clustering Methods for Video Browsing and Annotation

Clustering Methods for Video Browsing and Annotation Clustering Methods for Video Browsing and Annotation Di Zhong, HongJiang Zhang 2 and Shih-Fu Chang* Institute of System Science, National University of Singapore Kent Ridge, Singapore 05 *Center for Telecommunication

More information

Arabic Text Recognition in Video Sequences

Arabic Text Recognition in Video Sequences Arabic Text Recognition in Video Sequences Mohamed Ben Halima, Hichem Karray and Adel M. Alimi REGIM: REsearch Group on Intelligent Machines, University of Sfax, National School of Engineers (ENIS), BP

More information

Text Separation from Graphics by Analyzing Stroke Width Variety in Persian City Maps

Text Separation from Graphics by Analyzing Stroke Width Variety in Persian City Maps Text Separation from Graphics by Analyzing Stroke Width Variety in Persian City Maps Ali Ghafari-Beranghar Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran,

More information

Analysis of Image and Video Using Color, Texture and Shape Features for Object Identification

Analysis of Image and Video Using Color, Texture and Shape Features for Object Identification IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 6, Ver. VI (Nov Dec. 2014), PP 29-33 Analysis of Image and Video Using Color, Texture and Shape Features

More information

Localization, Extraction and Recognition of Text in Telugu Document Images

Localization, Extraction and Recognition of Text in Telugu Document Images Localization, Extraction and Recognition of Text in Telugu Document Images Atul Negi Department of CIS University of Hyderabad Hyderabad 500046, India atulcs@uohyd.ernet.in K. Nikhil Shanker Department

More information

DATA EMBEDDING IN TEXT FOR A COPIER SYSTEM

DATA EMBEDDING IN TEXT FOR A COPIER SYSTEM DATA EMBEDDING IN TEXT FOR A COPIER SYSTEM Anoop K. Bhattacharjya and Hakan Ancin Epson Palo Alto Laboratory 3145 Porter Drive, Suite 104 Palo Alto, CA 94304 e-mail: {anoop, ancin}@erd.epson.com Abstract

More information

Object Detection in Video Streams

Object Detection in Video Streams Object Detection in Video Streams Sandhya S Deore* *Assistant Professor Dept. of Computer Engg., SRES COE Kopargaon *sandhya.deore@gmail.com ABSTRACT Object Detection is the most challenging area in video

More information

AViTExt: Automatic Video Text Extraction

AViTExt: Automatic Video Text Extraction AViTExt: Automatic Video Text Extraction A new Approach for video content indexing Application Baseem Bouaziz systems and Advanced Computing Bassem.bouazizgfsegs.rnu.tn Tarek Zlitni systems and Advanced

More information

Multi-scale Techniques for Document Page Segmentation

Multi-scale Techniques for Document Page Segmentation Multi-scale Techniques for Document Page Segmentation Zhixin Shi and Venu Govindaraju Center of Excellence for Document Analysis and Recognition (CEDAR), State University of New York at Buffalo, Amherst

More information

A Miniature-Based Image Retrieval System

A Miniature-Based Image Retrieval System A Miniature-Based Image Retrieval System Md. Saiful Islam 1 and Md. Haider Ali 2 Institute of Information Technology 1, Dept. of Computer Science and Engineering 2, University of Dhaka 1, 2, Dhaka-1000,

More information

Fast Rotation-Invariant Video Caption Detection BasedonVisualRhythm

Fast Rotation-Invariant Video Caption Detection BasedonVisualRhythm Fast Rotation-Invariant Video Caption Detection BasedonVisualRhythm Felipe Braunger Valio, Helio Pedrini, and Neucimar Jeronimo Leite Institute of Computing - University of Campinas Campinas, SP, Brazil,

More information

International Journal of Scientific & Engineering Research, Volume 6, Issue 8, August ISSN

International Journal of Scientific & Engineering Research, Volume 6, Issue 8, August ISSN International Journal of Scientific & Engineering Research, Volume 6, Issue 8, August-2015 1033 Text Extraction from Natural Images of different languages Using ISEF Edge Detection Sanjay K. Shah,Deven

More information

Region-based Segmentation

Region-based Segmentation Region-based Segmentation Image Segmentation Group similar components (such as, pixels in an image, image frames in a video) to obtain a compact representation. Applications: Finding tumors, veins, etc.

More information

AN APPROACH OF SEMIAUTOMATED ROAD EXTRACTION FROM AERIAL IMAGE BASED ON TEMPLATE MATCHING AND NEURAL NETWORK

AN APPROACH OF SEMIAUTOMATED ROAD EXTRACTION FROM AERIAL IMAGE BASED ON TEMPLATE MATCHING AND NEURAL NETWORK AN APPROACH OF SEMIAUTOMATED ROAD EXTRACTION FROM AERIAL IMAGE BASED ON TEMPLATE MATCHING AND NEURAL NETWORK Xiangyun HU, Zuxun ZHANG, Jianqing ZHANG Wuhan Technique University of Surveying and Mapping,

More information

Figure 1: Workflow of object-based classification

Figure 1: Workflow of object-based classification Technical Specifications Object Analyst Object Analyst is an add-on package for Geomatica that provides tools for segmentation, classification, and feature extraction. Object Analyst includes an all-in-one

More information

Prof. Fanny Ficuciello Robotics for Bioengineering Visual Servoing

Prof. Fanny Ficuciello Robotics for Bioengineering Visual Servoing Visual servoing vision allows a robotic system to obtain geometrical and qualitative information on the surrounding environment high level control motion planning (look-and-move visual grasping) low level

More information

Video shot segmentation using late fusion technique

Video shot segmentation using late fusion technique Video shot segmentation using late fusion technique by C. Krishna Mohan, N. Dhananjaya, B.Yegnanarayana in Proc. Seventh International Conference on Machine Learning and Applications, 2008, San Diego,

More information

Aerial photography: Principles. Visual interpretation of aerial imagery

Aerial photography: Principles. Visual interpretation of aerial imagery Aerial photography: Principles Visual interpretation of aerial imagery Overview Introduction Benefits of aerial imagery Image interpretation Elements Tasks Strategies Keys Accuracy assessment Benefits

More information

An Approach to Detect Text and Caption in Video

An Approach to Detect Text and Caption in Video An Approach to Detect Text and Caption in Video Miss Megha Khokhra 1 M.E Student Electronics and Communication Department, Kalol Institute of Technology, Gujarat, India ABSTRACT The video image spitted

More information

An ICA based Approach for Complex Color Scene Text Binarization

An ICA based Approach for Complex Color Scene Text Binarization An ICA based Approach for Complex Color Scene Text Binarization Siddharth Kherada IIIT-Hyderabad, India siddharth.kherada@research.iiit.ac.in Anoop M. Namboodiri IIIT-Hyderabad, India anoop@iiit.ac.in

More information

Comparative Study of Hand Gesture Recognition Techniques

Comparative Study of Hand Gesture Recognition Techniques Reg. No.:20140316 DOI:V2I4P16 Comparative Study of Hand Gesture Recognition Techniques Ann Abraham Babu Information Technology Department University of Mumbai Pillai Institute of Information Technology

More information

A Text Detection, Localization and Segmentation System for OCR in Images

A Text Detection, Localization and Segmentation System for OCR in Images A Text Detection, Localization and Segmentation System for OCR in Images Julinda Gllavata 1, Ralph Ewerth 1 and Bernd Freisleben 1,2 1 SFB/FK 615, University of Siegen, D-57068 Siegen, Germany 2 Dept.

More information

Introduction to Medical Imaging (5XSA0) Module 5

Introduction to Medical Imaging (5XSA0) Module 5 Introduction to Medical Imaging (5XSA0) Module 5 Segmentation Jungong Han, Dirk Farin, Sveta Zinger ( s.zinger@tue.nl ) 1 Outline Introduction Color Segmentation region-growing region-merging watershed

More information

ABSTRACT 1. INTRODUCTION

ABSTRACT 1. INTRODUCTION ABSTRACT A Framework for Multi-Agent Multimedia Indexing Bernard Merialdo Multimedia Communications Department Institut Eurecom BP 193, 06904 Sophia-Antipolis, France merialdo@eurecom.fr March 31st, 1995

More information

Definition, Detection, and Evaluation of Meeting Events in Airport Surveillance Videos

Definition, Detection, and Evaluation of Meeting Events in Airport Surveillance Videos Definition, Detection, and Evaluation of Meeting Events in Airport Surveillance Videos Sung Chun Lee, Chang Huang, and Ram Nevatia University of Southern California, Los Angeles, CA 90089, USA sungchun@usc.edu,

More information

Differential Compression and Optimal Caching Methods for Content-Based Image Search Systems

Differential Compression and Optimal Caching Methods for Content-Based Image Search Systems Differential Compression and Optimal Caching Methods for Content-Based Image Search Systems Di Zhong a, Shih-Fu Chang a, John R. Smith b a Department of Electrical Engineering, Columbia University, NY,

More information

OTCYMIST: Otsu-Canny Minimal Spanning Tree for Born-Digital Images

OTCYMIST: Otsu-Canny Minimal Spanning Tree for Born-Digital Images OTCYMIST: Otsu-Canny Minimal Spanning Tree for Born-Digital Images Deepak Kumar and A G Ramakrishnan Medical Intelligence and Language Engineering Laboratory Department of Electrical Engineering, Indian

More information

N.Priya. Keywords Compass mask, Threshold, Morphological Operators, Statistical Measures, Text extraction

N.Priya. Keywords Compass mask, Threshold, Morphological Operators, Statistical Measures, Text extraction Volume, Issue 8, August ISSN: 77 8X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Combined Edge-Based Text

More information

Offering Access to Personalized Interactive Video

Offering Access to Personalized Interactive Video Offering Access to Personalized Interactive Video 1 Offering Access to Personalized Interactive Video Giorgos Andreou, Phivos Mylonas, Manolis Wallace and Stefanos Kollias Image, Video and Multimedia Systems

More information

Lecture 7: Most Common Edge Detectors

Lecture 7: Most Common Edge Detectors #1 Lecture 7: Most Common Edge Detectors Saad Bedros sbedros@umn.edu Edge Detection Goal: Identify sudden changes (discontinuities) in an image Intuitively, most semantic and shape information from the

More information

Journal of Asian Scientific Research FEATURES COMPOSITION FOR PROFICIENT AND REAL TIME RETRIEVAL IN CBIR SYSTEM. Tohid Sedghi

Journal of Asian Scientific Research FEATURES COMPOSITION FOR PROFICIENT AND REAL TIME RETRIEVAL IN CBIR SYSTEM. Tohid Sedghi Journal of Asian Scientific Research, 013, 3(1):68-74 Journal of Asian Scientific Research journal homepage: http://aessweb.com/journal-detail.php?id=5003 FEATURES COMPOSTON FOR PROFCENT AND REAL TME RETREVAL

More information

NeTra-V: Towards an Object-based Video Representation

NeTra-V: Towards an Object-based Video Representation Proc. of SPIE, Storage and Retrieval for Image and Video Databases VI, vol. 3312, pp 202-213, 1998 NeTra-V: Towards an Object-based Video Representation Yining Deng, Debargha Mukherjee and B. S. Manjunath

More information

NOWADAYS, video is the most popular media type delivered

NOWADAYS, video is the most popular media type delivered IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 2, FEBRUARY 2005 243 A Comprehensive Method for Multilingual Video Text Detection, Localization, and Extraction Michael R. Lyu,

More information

Smith et al. [6] developed a text detection algorithm by using vertical edge. In this method, vertical edges are first detected with a predefined temp

Smith et al. [6] developed a text detection algorithm by using vertical edge. In this method, vertical edges are first detected with a predefined temp COMPARISON OF SUPPORT VECTOR MACHINE AND NEURAL NETWORK FOR TEXT TEXTURE VERIFICATION Datong Chen and Jean-Marc Odobez IDIAP, Switzerland chen, odobez@idiap.ch Abstract. In this paper we propose a method

More information

8. Automatic Content Analysis

8. Automatic Content Analysis 8. Automatic Content Analysis 8.1 Statistics for Multimedia Content Analysis 8.2 Basic Parameters for Video Analysis 8.3 Deriving Video Semantics 8.4 Basic Parameters for Audio Analysis 8.5 Deriving Audio

More information

Learning-based Neuroimage Registration

Learning-based Neuroimage Registration Learning-based Neuroimage Registration Leonid Teverovskiy and Yanxi Liu 1 October 2004 CMU-CALD-04-108, CMU-RI-TR-04-59 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Abstract

More information

Toward Part-based Document Image Decoding

Toward Part-based Document Image Decoding 2012 10th IAPR International Workshop on Document Analysis Systems Toward Part-based Document Image Decoding Wang Song, Seiichi Uchida Kyushu University, Fukuoka, Japan wangsong@human.ait.kyushu-u.ac.jp,

More information

A Model-based Line Detection Algorithm in Documents

A Model-based Line Detection Algorithm in Documents A Model-based Line Detection Algorithm in Documents Yefeng Zheng, Huiping Li, David Doermann Laboratory for Language and Media Processing Institute for Advanced Computer Studies University of Maryland,

More information

CS 231A Computer Vision (Winter 2014) Problem Set 3

CS 231A Computer Vision (Winter 2014) Problem Set 3 CS 231A Computer Vision (Winter 2014) Problem Set 3 Due: Feb. 18 th, 2015 (11:59pm) 1 Single Object Recognition Via SIFT (45 points) In his 2004 SIFT paper, David Lowe demonstrates impressive object recognition

More information

Detecting Salient Contours Using Orientation Energy Distribution. Part I: Thresholding Based on. Response Distribution

Detecting Salient Contours Using Orientation Energy Distribution. Part I: Thresholding Based on. Response Distribution Detecting Salient Contours Using Orientation Energy Distribution The Problem: How Does the Visual System Detect Salient Contours? CPSC 636 Slide12, Spring 212 Yoonsuck Choe Co-work with S. Sarma and H.-C.

More information

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 2013 ISSN:

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 2013 ISSN: Semi Automatic Annotation Exploitation Similarity of Pics in i Personal Photo Albums P. Subashree Kasi Thangam 1 and R. Rosy Angel 2 1 Assistant Professor, Department of Computer Science Engineering College,

More information

Consistent Line Clusters for Building Recognition in CBIR

Consistent Line Clusters for Building Recognition in CBIR Consistent Line Clusters for Building Recognition in CBIR Yi Li and Linda G. Shapiro Department of Computer Science and Engineering University of Washington Seattle, WA 98195-250 shapiro,yi @cs.washington.edu

More information

A NOVEL FEATURE EXTRACTION METHOD BASED ON SEGMENTATION OVER EDGE FIELD FOR MULTIMEDIA INDEXING AND RETRIEVAL

A NOVEL FEATURE EXTRACTION METHOD BASED ON SEGMENTATION OVER EDGE FIELD FOR MULTIMEDIA INDEXING AND RETRIEVAL A NOVEL FEATURE EXTRACTION METHOD BASED ON SEGMENTATION OVER EDGE FIELD FOR MULTIMEDIA INDEXING AND RETRIEVAL Serkan Kiranyaz, Miguel Ferreira and Moncef Gabbouj Institute of Signal Processing, Tampere

More information

Video OCR for Digital News Archives. Carnegie Mellon University Forbes Avenue, Pittsburgh, PA ftsato, tk, ln,

Video OCR for Digital News Archives. Carnegie Mellon University Forbes Avenue, Pittsburgh, PA ftsato, tk, ln, Video OCR for Digital News Archives Toshio Sato Takeo Kanade Ellen K. Hughes Michael A. Smith School of Computer Science Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA 15213 ftsato, tk,

More information

A MORPHOLOGY-BASED FILTER STRUCTURE FOR EDGE-ENHANCING SMOOTHING

A MORPHOLOGY-BASED FILTER STRUCTURE FOR EDGE-ENHANCING SMOOTHING Proceedings of the 1994 IEEE International Conference on Image Processing (ICIP-94), pp. 530-534. (Austin, Texas, 13-16 November 1994.) A MORPHOLOGY-BASED FILTER STRUCTURE FOR EDGE-ENHANCING SMOOTHING

More information

A Laplacian Based Novel Approach to Efficient Text Localization in Grayscale Images

A Laplacian Based Novel Approach to Efficient Text Localization in Grayscale Images A Laplacian Based Novel Approach to Efficient Text Localization in Grayscale Images Karthik Ram K.V & Mahantesh K Department of Electronics and Communication Engineering, SJB Institute of Technology, Bangalore,

More information

Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi

Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi hrazvi@stanford.edu 1 Introduction: We present a method for discovering visual hierarchy in a set of images. Automatically grouping

More information

Story Unit Segmentation with Friendly Acoustic Perception *

Story Unit Segmentation with Friendly Acoustic Perception * Story Unit Segmentation with Friendly Acoustic Perception * Longchuan Yan 1,3, Jun Du 2, Qingming Huang 3, and Shuqiang Jiang 1 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing,

More information

Optimal Video Adaptation and Skimming Using a Utility-Based Framework

Optimal Video Adaptation and Skimming Using a Utility-Based Framework Optimal Video Adaptation and Skimming Using a Utility-Based Framework Shih-Fu Chang Digital Video and Multimedia Lab ADVENT University-Industry Consortium Columbia University Sept. 9th 2002 http://www.ee.columbia.edu/dvmm

More information

Mobile Camera Based Text Detection and Translation

Mobile Camera Based Text Detection and Translation Mobile Camera Based Text Detection and Translation Derek Ma Qiuhau Lin Tong Zhang Department of Electrical EngineeringDepartment of Electrical EngineeringDepartment of Mechanical Engineering Email: derekxm@stanford.edu

More information

Iterative Image Based Video Summarization by Node Segmentation

Iterative Image Based Video Summarization by Node Segmentation Iterative Image Based Video Summarization by Node Segmentation Nalini Vasudevan Arjun Jain Himanshu Agrawal Abstract In this paper, we propose a simple video summarization system based on removal of similar

More information

Detection of Rooftop Regions in Rural Areas Using Support Vector Machine

Detection of Rooftop Regions in Rural Areas Using Support Vector Machine 549 Detection of Rooftop Regions in Rural Areas Using Support Vector Machine Liya Joseph 1, Laya Devadas 2 1 (M Tech Scholar, Department of Computer Science, College of Engineering Munnar, Kerala) 2 (Associate

More information

Robust PDF Table Locator

Robust PDF Table Locator Robust PDF Table Locator December 17, 2016 1 Introduction Data scientists rely on an abundance of tabular data stored in easy-to-machine-read formats like.csv files. Unfortunately, most government records

More information

Topic 4 Image Segmentation

Topic 4 Image Segmentation Topic 4 Image Segmentation What is Segmentation? Why? Segmentation important contributing factor to the success of an automated image analysis process What is Image Analysis: Processing images to derive

More information

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov Structured Light II Johannes Köhler Johannes.koehler@dfki.de Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov Introduction Previous lecture: Structured Light I Active Scanning Camera/emitter

More information

Bus Detection and recognition for visually impaired people

Bus Detection and recognition for visually impaired people Bus Detection and recognition for visually impaired people Hangrong Pan, Chucai Yi, and Yingli Tian The City College of New York The Graduate Center The City University of New York MAP4VIP Outline Motivation

More information

Digital Image Processing COSC 6380/4393

Digital Image Processing COSC 6380/4393 Digital Image Processing COSC 6380/4393 Lecture 21 Nov 16 th, 2017 Pranav Mantini Ack: Shah. M Image Processing Geometric Transformation Point Operations Filtering (spatial, Frequency) Input Restoration/

More information

Extracting Layers and Recognizing Features for Automatic Map Understanding. Yao-Yi Chiang

Extracting Layers and Recognizing Features for Automatic Map Understanding. Yao-Yi Chiang Extracting Layers and Recognizing Features for Automatic Map Understanding Yao-Yi Chiang 0 Outline Introduction/ Problem Motivation Map Processing Overview Map Decomposition Feature Recognition Discussion

More information

ECE 176 Digital Image Processing Handout #14 Pamela Cosman 4/29/05 TEXTURE ANALYSIS

ECE 176 Digital Image Processing Handout #14 Pamela Cosman 4/29/05 TEXTURE ANALYSIS ECE 176 Digital Image Processing Handout #14 Pamela Cosman 4/29/ TEXTURE ANALYSIS Texture analysis is covered very briefly in Gonzalez and Woods, pages 66 671. This handout is intended to supplement that

More information

Motion in 2D image sequences

Motion in 2D image sequences Motion in 2D image sequences Definitely used in human vision Object detection and tracking Navigation and obstacle avoidance Analysis of actions or activities Segmentation and understanding of video sequences

More information

Data: a collection of numbers or facts that require further processing before they are meaningful

Data: a collection of numbers or facts that require further processing before they are meaningful Digital Image Classification Data vs. Information Data: a collection of numbers or facts that require further processing before they are meaningful Information: Derived knowledge from raw data. Something

More information

Extraction and Recognition of Artificial Text in Multimedia Documents

Extraction and Recognition of Artificial Text in Multimedia Documents Extraction and Recognition of Artificial Text in Multimedia Documents Christian Wolf and Jean-Michel Jolion Lyon research center for Images and Intelligent Information Systems INSA de Lyon, Bât. J.Verne

More information

Arabic Text Recognition in Video Sequences

Arabic Text Recognition in Video Sequences Arabic Text Recognition in Video Sequences Mohamed Ben Halima, Hichem Karray, Adel M. Alimi REGIM: REsearch Group on Intelligent Machines, University of Sfax, National School of Engineers (ENIS), BP 1173,

More information

Image Mining: frameworks and techniques

Image Mining: frameworks and techniques Image Mining: frameworks and techniques Madhumathi.k 1, Dr.Antony Selvadoss Thanamani 2 M.Phil, Department of computer science, NGM College, Pollachi, Coimbatore, India 1 HOD Department of Computer Science,

More information

Conspicuous Character Patterns

Conspicuous Character Patterns Conspicuous Character Patterns Seiichi Uchida Kyushu Univ., Japan Ryoji Hattori Masakazu Iwamura Kyushu Univ., Japan Osaka Pref. Univ., Japan Koichi Kise Osaka Pref. Univ., Japan Shinichiro Omachi Tohoku

More information

Accurate 3D Face and Body Modeling from a Single Fixed Kinect

Accurate 3D Face and Body Modeling from a Single Fixed Kinect Accurate 3D Face and Body Modeling from a Single Fixed Kinect Ruizhe Wang*, Matthias Hernandez*, Jongmoo Choi, Gérard Medioni Computer Vision Lab, IRIS University of Southern California Abstract In this

More information

AN EFFICIENT BINARIZATION TECHNIQUE FOR FINGERPRINT IMAGES S. B. SRIDEVI M.Tech., Department of ECE

AN EFFICIENT BINARIZATION TECHNIQUE FOR FINGERPRINT IMAGES S. B. SRIDEVI M.Tech., Department of ECE AN EFFICIENT BINARIZATION TECHNIQUE FOR FINGERPRINT IMAGES S. B. SRIDEVI M.Tech., Department of ECE sbsridevi89@gmail.com 287 ABSTRACT Fingerprint identification is the most prominent method of biometric

More information

Linear combinations of simple classifiers for the PASCAL challenge

Linear combinations of simple classifiers for the PASCAL challenge Linear combinations of simple classifiers for the PASCAL challenge Nik A. Melchior and David Lee 16 721 Advanced Perception The Robotics Institute Carnegie Mellon University Email: melchior@cmu.edu, dlee1@andrew.cmu.edu

More information

CHAPTER 3 IMAGE ENHANCEMENT IN THE SPATIAL DOMAIN

CHAPTER 3 IMAGE ENHANCEMENT IN THE SPATIAL DOMAIN CHAPTER 3 IMAGE ENHANCEMENT IN THE SPATIAL DOMAIN CHAPTER 3: IMAGE ENHANCEMENT IN THE SPATIAL DOMAIN Principal objective: to process an image so that the result is more suitable than the original image

More information

Segmentation Framework for Multi-Oriented Text Detection and Recognition

Segmentation Framework for Multi-Oriented Text Detection and Recognition Segmentation Framework for Multi-Oriented Text Detection and Recognition Shashi Kant, Sini Shibu Department of Computer Science and Engineering, NRI-IIST, Bhopal Abstract - Here in this paper a new and

More information

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule. CS 188: Artificial Intelligence Fall 2007 Lecture 26: Kernels 11/29/2007 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit your

More information