A Fano-Huffman Based Statistical Coding Method

Similar documents
KbdKaz 500 layout tables

Myriad Pro Light. Lining proportional. Latin capitals. Alphabetic. Oldstyle tabular. Oldstyle proportional. Superscript ⁰ ¹ ² ³ ⁴ ⁵ ⁶ ⁷ ⁸ ⁹,.

OFFER VALID FROM R. 15 COLORS TEXT DISPLAYS SERIES RGB12-K SERIES RGB16-K SERIES RGB20-K SERIES RGB25-K SERIES RGB30-K

Infusion Pump CODAN ARGUS 717 / 718 V - Release Notes. Firmware V

uninsta un in sta 9 weights & italics 5 numeral variations Full Cyrillic alphabet

JOINT-STOCK COMPANY GIDROPRIVOD. RADIAL PISTON PUMPS OF VARIABLE DISPLACEMENT type 50 НРР

OFFER VALID FROM R. TEXT DISPLAYS SERIES A SERIES D SERIES K SERIES M

4WCE * 5 * : GEO. Air Products and Chemicals, Inc., 2009

For all questions, answer choice E) NOTA" means none of the above answers is correct. A) 50,500 B) 500,000 C) 500,500 D) 1,001,000 E) NOTA

Cubic fuzzy H-ideals in BF-Algebras

Operating Manual version 1.2

«, 68, 55, 23. (, -, ).,,.,,. (workcamps).,. :.. 2

On a Sufficient and Necessary Condition for Graph Coloring

1-D matrix method. U 4 transmitted. incident U 2. reflected U 1 U 5 U 3 L 2 L 3 L 4. EE 439 matrix method 1

Optimal Allocation of Complex Equipment System Maintainability

Descriptive Statistics: Measures of Center

A Genetic K-means Clustering Algorithm Applied to Gene Expression Data

Area and Power Efficient Modulo 2^n+1 Multiplier

COMSC 2613 Summer 2000

PERSPECTIVES OF THE USE OF GENETIC ALGORITHMS IN CRYPTANALYSIS

UCA Chart Help. Primary difference. Secondary Difference. Tertiary difference. Quarternary difference or no difference

Region Matching by Optimal Fuzzy Dissimilarity

Title of your Paper AUTHOR NAME. 1 Introduction. 2 Main Settings

Machine Learning: Algorithms and Applications

А. Љто нќша кђшка That s our cat

Process Quality Evaluation based on Maximum Entropy Principle. Yuhong Wang, Chuanliang Zhang, Wei Dai a and Yu Zhao

Using The ACO Algorithm in Image Segmentation for Optimal Thresholding 陳香伶財務金融系

Bezier curves. 1. Defining a Bezier curve. A closed Bezier curve can simply be generated by closing its characteristic polygon

ChEn 475 Statistical Analysis of Regression Lesson 1. The Need for Statistical Analysis of Regression

, «Ruby»..,

Eight Solved and Eight Open Problems in Elementary Geometry

Clustering documents with vector space model using n-grams

Differentiated Service of Streaming Media Playback Technology

Point Estimation-III: General Methods for Obtaining Estimators

LP: example of formulations

Speeding- Up Fractal Image Compression Using Entropy Technique

THE MATHEMATICAL MODEL OF AN OPERATOR IN A HUMAN MACHINE SYSTEMS. PROBLEMS AND SOLUTIONS

Software reliability is defined as the probability of failure

THE COCHINEAL FONT PACKAGE

Face Recognition using Supervised & Unsupervised Techniques

TEX Gyre: The New Font Project. Marrakech, November 9th 11th, Bogusław Jackowski, Janusz M. Nowacki, Jerzy B. Ludwichowski

Chapter 3 Descriptive Statistics Numerical Summaries

Web Page Clustering by Combining Dense Units

APPLICATION OF CLUSTERING METHODS IN BANK S PROPENSITY MODEL

MINIMIZATION OF THE VALUE OF DAVIES-BOULDIN INDEX

Eight Solved and Eight Open Problems in Elementary Geometry

Mode Changes in Priority Pre-emptively Scheduled Systems. K. W. Tindell, A. Burns, A. J. Wellings

A Comparison of Univariate Smoothing Models: Application to Heart Rate Data Marcus Beal, Member, IEEE

NON-PROFIT ORGANIZATION CHARITY FUND

NEURO FUZZY MODELING OF CONTROL SYSTEMS

Office Hours. COS 341 Discrete Math. Office Hours. Homework 8. Currently, my office hours are on Friday, from 2:30 to 3:30.

Fuzzy ID3 Decision Tree Approach for Network Reliability Estimation

International Mathematical Forum, 1, 2006, no. 31, ON JONES POLYNOMIALS OF GRAPHS OF TORUS KNOTS K (2, q ) Tamer UGUR, Abdullah KOPUZLU

An Enhanced Local Covering Approach for Minimization of Multiple-Valued Input Binary-Valued Output Functions

A Comparison of Heuristics for Scheduling Spatial Clusters to Reduce I/O Cost in Spatial Join Processing

EDGE- ODD Gracefulness of the Tripartite Graph

Nine Solved and Nine Open Problems in Elementary Geometry

COMBINATORIAL METHOD OF POLYNOMIAL EXPANSION OF SYMMETRIC BOOLEAN FUNCTIONS

The XCharter Font Package

Parallel Ant Colony for Nonlinear Function Optimization with Graphics Hardware Acceleration

Morphological Ending based Strategies of Unknown Word Estimation for Statistical POS Urdu Tagger

Transistor/Gate Sizing Optimization

AT MOST EDGE 3 - SUM CORDIAL LABELING FOR SOME GRAPHS THE STANDARD

Blind Steganalysis for Digital Images using Support Vector Machine Method

MIVOICE OFFICE 400 MITEL 6863 SIP / MITEL 6865 SIP

APR 1965 Aggregation Methodology

R E N E W A B L E E N E R G Y D E V E L O P M E N T I N K A Z A K H S T A N MINISTRY OF ENERGY OF THE REPUBLIC OF KAZAKHSTAN

Reconstruction of Orthogonal Polygonal Lines

A Framework for Block-Based Timing Sensitivity Analysis

Delay based Duplicate Transmission Avoid (DDA) Coordination Scheme for Opportunistic routing

A New Hybrid Audio Classification Algorithm Based on SVM Weight Factor and Euclidean Distance

Fitting. We ve learned how to detect edges, corners, blobs. Now what? We would like to form a. compact representation of

A MapReduce-Based Multiple Flow Direction Runoff Simulation

MEGAFLASHROM SCC+ SD

MATHEMATICAL PROGRAMMING MODEL OF THE CRITICAL CHAIN METHOD

Romanization rules for the Lemko (Ruthenian or Rusyn) language in Poland

Comparison Studies on Classification for Remote Sensing Image Based on Data Mining Method

Text Categorization Based on a Similarity Approach

Р у к о в о д и т е л ь О О П д -р б и о л. н а у к Д. С. В о р о б ь е в 2017 г. А Г И С Т Е Р С К А Я Д И С С Е Р Т А Ц И Я

A PROCEDURE FOR SOLVING INTEGER BILEVEL LINEAR PROGRAMMING PROBLEMS

Vertex Odd Divisor Cordial Labeling of Graphs

Extended user documentation

2 General Regression Neural Network (GRNN)

MIVOICE OFFICE 400 MITEL 6873 SIP USER GUIDE

A New Newton s Method with Diagonal Jacobian Approximation for Systems of Nonlinear Equations

CS 2710 Foundations of AI Lecture 22. Machine learning. Machine Learning

Module I. Unit 5. д е far ед е - near д м - next to. - е у ед е? railway station

P07303 Series Customer Display User Manual

TDT-2004: ADAPTIVE TOPIC TRACKING AT MARYLAND

Town Development Fund

SBOX-Max Operating Manual

Mesh Connectivity Compression for Progressive-to-Lossless Transmission

Statistical Techniques Employed in Atmospheric Sampling

Enumerating XML Data for Dynamic Updating

Weighting Cache Replace Algorithm for Storage System

Preventing Information Leakage in C Applications Using RBAC-Based Model

Pattern Extraction, Classification and Comparison Between Attribute Selection Measures

ц э ц эр е э вс свэ эч эр э эвэ эч цэ е рээ рэмц э ч чс э э е е е э е ц е р э л в э э у эр це э в эр э е р э э э в э ес у ч р Б эш сэ э в р э ешшв р э

ITEM ToolKit Technical Support Notes

Optimization of Light Switching Pattern on Large Scale using Genetic Algorithm

Transcription:

oural of oder ppled tatstcal ethods olume 6 ssue rtcle 5 5--007 ao-uffma ased tatstcal odg ethod ladd hamlov adolu versty, urkey eay sma adolu versty, urkey ollow ths ad addtoal works at: http://dgtalcommos.waye.edu/jmasm art of the ppled tatstcs ommos, ocal ad ehavoral ceces ommos, ad the tatstcal heory ommos ecommeded tato hamlov, ladd ad sma, eay (007) " ao-uffma ased tatstcal odg ethod," oural of oder ppled tatstcal ethods: ol. 6: ss., rtcle 5. valable at: http://dgtalcommos.waye.edu/jmasm/vol6/ss/5 hs egular rtcle s brought to you for free ad ope access by the pe ccess ourals at gtalommos@ayetate. t has bee accepted for cluso oural of oder ppled tatstcal ethods by a authorzed admstrator of gtalommos@ayetate.

oural of oder ppled tatstcal ethods opyrght 007, c. ay, 007, ol. 6, o., 65-78 538 947/07/$95.00 ao-uffma ased tatstcal odg ethod ladd hamlov eay sma adolu versty, urkey tatstcal codg techques have bee used for lossless statstcal data compresso, applyg methods such as rdary, hao, ao, haced ao, uffma ad hao-ao-las codg methods. ew ad mproved codg method s preseted, the ao-uffma ased tatstcal odg ethod. t holds the advatages of both the ao ad uffma codg methods. t s more easly applcable tha the uffma codg methods ad t s more optmal tha ao codg method. he optmalty wth respect to the other methods s realzed o the bass of glsh, erma, urksh, rech, ussa ad pash. ey words: ao-uffma based statstcal codg method, probablty dstrbuto of laguage, etropy, formato, optmal code. troducto roblem tatemet uffma s algorthm s a well-kow ecodg method that geerates a optmal prefx ecodg scheme, the sese that the average code word legth s mmum. s opposed to ths, ao s method has ot bee used so much because t geerates prefx ecodg schemes that ca be sub-optmal (ueda & omme, 004). ths artcle, a mproved codg method s preseted, whch has bee amed the ao-uffma ased tatstcal odg method ad applcatos of ths method. hs method holds the both advatages of ao ad uffma codg method. o, t s more easly applcable tha the uffma codg method ad s more optmal tha ao codg method. he optmalty of the metoed codg method wth ladd hamlov s a rofessor the epartmet of tatstcs. mal: asamlov@aadolu.edu.tr. eay sma s a esearch ssstat the epartmet of tatstcs. mal: seayyolaca@aadolu.edu.tr. respect to the other codg methods s realzed o the bass of glsh, erma, urksh, rech, ussa ad pash. he classcal codg methods ad the cocept of optmalty are descrbed the secto ttled lasscal odg ethods ad ptmalty. mproved codg method, ao- uffma ased odg ethod by whch ecodg schemes, whch are arbtrarly close to the optmum, ca be easly costructed, s troduced the secto called ao-uffma ased tatstcal odg ethod. the followg secto, the tables of costructed bary codes are gve ad comparsos of cosdered methods sese of optmalty are made. the cocluso, the terpretato of optmalty of these results s made subject to classcal codg methods ad suggestos are gve. vervew ssume that a source alphabet, = { s, s, s }, whose probabltes of occurrece are = { p, p p }, ad a code alphabet, = { a, a, a r } s gve. he propose of ths study s the geerato of a s, such a way ecodg scheme, { } w 65

66 - that l = = p l s mmzed, where l s the legth of w. formato theory has mportat applcatos probablty theory, statstcs ad commucato systems. ossless ecodg methods used to solve ths problem clude uffma s algorthm (uffma, 95), hao s method (hao & eaver, 949), arthmetc codg (ayood, ), ao s method (akerso, arrs, & ohso, 998), ehaced ao-based codg algorthm (ueda & ome, 004) etc. daptve versos of these methods have bee proposed, ad ca be foud (aller, 973; allager, 978; akerso et al., 998; uth, 985; ueda, 00; ayood, ). he survey s ecessarly bref as ths s a well-reputed feld. lso, assume that the source s memoryless or zeroth-order, whch meas that the occurrece of the ext symbol s depedet of ay other symbol that has occurred prevously. gher-order models clude arkov models (akerso et al., 998), dctoary techques (v & empel, 977; v & empel, 978), predcto wth partal matchg (tte, offat, & ell, 999), grammar based compresso (effer & ag, ), etc., ad the techques troduced here are also readly applcable for such structure models. lasscal odg ethods ad ptmalty ths secto, the fudametal steps of classcal codg methods are descrbed ad the cocept of optmalty of codes s expouded. lasscal codg methods uppose that source alphabet (alphabet of laguage) = { s,s s } ad ts probablty dstrbuto = { p, p p } are gve. rdary odg ethod hs method requres the followg steps: (a) eterme umber satsfyg the equalty log, where s the legth of codeword ad s the the umber of symbols source alphabet; (b) frequecy; umerate letter gorg the (c) overt umbers determed by (b) from base 0 to base such that s the legth of coverted umber (oma, 997). hao odg ethod ostructo of hao s provded by steps: (a) ut { p, p } ascedg order p = p p p ; (b) alculate = log p the legth of codeword, =,,...,; (c) et defe dyadc fracto as k = 0 ad k = p, k. he = calculate, =,,...,; (d) overt dyadc fracto to bary form by usg obltz s trck, the select frst bts as a code correspodg to s (akerso et. al., 003). ao odg ethod hs method volves the steps: (a) erform the probabltes of symbols source alphabet ascedg order p p ; p (b) vde the set of symbols to two subsets such that the sum of the probabltes of occurreces of symbols each subset are equal or almost equal. he, assg a 0 to frst subset ad a to secod; (c) epeat step (a) utl all subsets have a sgle elemet (Венцель, 969).

& 67 haced ao odg ethod hs method proposed the followg steps: (a) osder the source alphabet = { s,s s } whose probablty dstrbuto of occurreces s = { p, p p }, where p p p ; (b) bta φ : s w,,s w the ecodg scheme by ao s method; (c) earrage w, w w to w, w w such that j for all < j, ad smultaeously mata s,s,, s the same order, to yeld the ecodg scheme: φ : s w,,s w (ueda & omme, 004). uffma odg ethod hs method s bottom-up whle the others are top-dow. t ca be explaed more clearly as follows: (a) ort symbols of source alphabet decreasg order of ther probabltes; (b) erge the two least-probable letter to a sgle output whose probablty s the sum of the correspodg probabltes; (c) o to step (a) f the umber of remag outputs s more tha ; (d) ssg a 0 ad a arbtrarly as code words for the two remag outputs; (e) pped the curret codeword wth a 0 ad a to obta the codeword the precedg outputs ad repeat step (e) f a output s the result of the merger of two outputs a precedg step. top f o output s preceded by aother output a precedg step (azhag, 004). (a) erform the source alphabet = { s,s s } whose probablty dstrbuto of occurreces s = { p, p p } ad the order of probabltes s t mportat; (b) bta the cumulatve dstrbuto by the fucto (s) = p(a) ; a s (c) osder modfed cumulatve dstrbuto fucto (s) = p(a) + p(s), a< s where (s) deotes the sum of probabltes of all symbols less tha s plus half the probablty of the symbols; (d) bta the legth of codeword by the formula (s) = log + p(s), where. deotes roudg up; (e) overt dyadc fracto (s) to bary form by usg obltz s trck such that the codeword has () s bts (over & homas, 99). he cocept of optmalty of codes here exsts a uquely decodable code whose codeword legths are gve by the sequece {} l f raft equalty l = = holds. ue to raft equalty (over, 99), the codtos for optmal codes are as follows: (a) he average codeword legth = p of a optmal code for a source s = greater tha or equal to ts etropy ( ) = plog p ; = hao-ao-las odg ethod hs method ca be explaed by steps:

68 - (b) he average codeword legth of a optmal code for a source s strctly less tha ()+. or source alphabet = { s,s s } whose probablty dstrbuto of occurreces s = { p, p p }, the average codeword legth s gve by, ad etropy of the source alphabet s gve by ( ). der these codtos, t s requred to trasmt as well as possble formato by usg codes cossts of fewer bts. o, ths problem ca be cosdered as optmzato problem whch s cosst of mmzg = = p l subject to costrat, where s = dmeso of codebook,.e. f the codebook s {0,} the = etc. hs problem s solved by usg agrage ultplers, ad the followg result s obtaed: * log p l = ; (.) * = = log ( ) ; (.) l = pl = p p = l = ( ). (.3) ut t s t possble to fd a terger umber for codeword legth that satsfes (.). or ths reaso, t s ecessary to obta the etropy lower boud (over & homas, 99; oma, 997) satsfyg the followg equalty: * l = pl ( ). = (.4) oreover, f s a statoary stochastc process, l ( ), (.5) where () s the etropy rate of the process. der the metoed kowledge, the formato per symbol (letter) s gve by ( ) f/ letter = ad the optmalty crtera for codes s cosdered as f/ letter (Венцель, 969). oreover, the optmalty meas that f the text s coded by a optmal codg method, the umber of s ad the umber of 0s are early equal sece of maxmum etropy. ece, the optmal codes meas that they trasmt early maxmum formato sce s ad 0s are t always equal probable. ao-uffma ased tatstcal odg ethod ths secto, a ew ad mproved codg method s proposed, whch ca be cosdered as a hybrd method that holds the both advatages of ao ad uffma codg methods. t s well kow that ao codg method s a suboptmal procedure for costructg a source code (ueda & omme, 004). ths method, the source symbols ad ther probabltes are sorted a o-creasg order of the probabltes ad the the set of symbols s dvded to two subsets such that the sum of the probabltes of occurreces of symbols each subset are equal or almost equal. he ma advatage of ths method s the dvso of the set of symbols. ecause, t requres pure computatos. ece, the frst goal of the mproved codg method s to hold ths advatage. uffma codg method s a optmal procedure (over & homas, 99). ths method, the source symbols ad ther probabltes are also sorted decreasg order ad the the two least-probable symbols are merged to a sgle output whose probablty s the sum of the correspodg probabltes. hus, by ths recursve procedure, the optmal uffma codes are costructed. he advatage of ths codg method s that the procedure s from bottom to top. ths way, the short code

& 69 words are atta to the symbols that occur frequetly ad log code words are atta to the symbols that occur rarely. hs advatage of uffma codg method costtutes the secod goal of the mproved codg method. osderg the advatages of these two codg procedure a hybrd codg method s preseted. o, the codg method s more easly applcable tha the uffma codg methods ad s more optmal tha ao codg method. he codes performed by that codg method are prefx codes ad satsfy the sblg property. he ao-uffma based statstcal codg method s ow proposed the followg form: (a) erform the probabltes of symbols source alphabet ascedg order p p ; p (b) hoose k such that k m p p s mmzed. hs umber k = = k+ dvdes the source symbols to two sets of almost equal probablty. (c) erge the two least-probable letter each set to a sgle output whose probablty s the sum of the correspodg probabltes; (d) o to step (c) f the umber of remag outputs s more tha ; (e) ssg a 0 ad a arbtrarly as codewords for the two remag outputs; (f) pped the curret codeword wth a 0 ad a to obta the codeword the precedg outputs ad repeat step (e) f o output s preceded by aother output a precedg step merge the two least-probable subset to a sgle output whose probablty s the sum of the correspodg probabltes; (g) top f o output s preceded by aother output a precedg step. ote that, accordg to step (b) due to sze of source alphabet, the set of symbols ca be dvded to more subsets (, =,,...) of equal or almost equal probabltes. he advatages of the proposed method arse from the comparsos of ths method wth the other aforesad codg methods. he applcatos of ths method ad comparsos are gve the followg secto. ables, omputatoal etals ad omparsos ths secto, order to dcate the advatages of our proposed method, ao- uffma ased statstcal codg method, we compare t wth the tradtoal codg methods. arous bary codes for glsh, erma, urksh, rech, ussa ad pash symbols are costructed sese of optmalty. rech, erma, pash ad glsh symbols (letters) are the at characters cosstg of 6 letters whch are gve able a. he probabltes of rech, erma ad pash symbols (letters) were establshed 939 by letcher ratt (tephes, 00; ratt, 939), the probabltes of glsh symbols (letters) were establshed by am hamdo () ad they are gve able b. able a. rech, erma, pash ad glsh ymbols a c d e g h j k l m p q r s t u v w x y z

70 - able b. robabltes of rech, erma, pash ad glsh ymbols ymbols glsh rech erma pash 0.06574 0.045 0.0734 0.034984 0.0444 0.09788 0.0586 0.04989 0.055809 0.903 0.005053 0.03349 0.00 0.05645 0.059630 0.03765 0.86 0.049756 0.05576 0.07936 0.053 0.00890 0.077 0.369 0.04598 0.784 0.988 0.0847 0.00876 0.03063 0.045 0.7564 0.00959 0.005 0.007 0.07559 0.00598 0.4 0.05783 0.0990 0.073 0.0589 0.0980 0.036 0.069 0.0803 0.07353 0.0599 0.0557 0.0 0.00350 0.6 0.7-0.06506 0.0566 0.0837 0.0544 0.6693 0.0044 0.03647 0.04064 0.078 0.9 0.0879 0.085 0.03005 0.09905 0.085 0.00944 0.55 0.06539 0.06765 0.0674 0.03703 0.0069 0.0396 0. 0.3 0.000-0.59 0.040 0.04679 0.05856 0.3676 0.00694 0.0006 0.00704 0.0649 0.00443 0.04 0.0497 0.0350 0.067 0.08684 0.0505 0.00875 0.06873 0.07980 0.0469 0.03934 0.00895 0.3 0.00 0.00895 0.0053 - urksh ource cossts of 9 symbols (letters). he captal ad small letters of the urksh are gve able a. robabltes of occurrece of urksh symbols (letters) are gve able b (hamlov & olaca, 005; alklc & alklc, 00). osdered probabltes have bee costtuted from a corpus cosst of words from may varety of felds,. e. scetfc artcles, ewspapers, poetcs etc.,.5 mllo characters total. ussa uses yrllc alphabet cosstg of 3 symbols (letters) whch are gve able 3a. robabltes of ussa symbols are gve able 3b., where deotes the space symbol (Венцель, 969; aglom & aglom, 966).

& 7 able a. urksh ource Ç Ğ İ a b ç d e f g ğ h ı j k Ö Ş Ü l m o ö p r s ş t ı ü v y able b. robabltes of urksh ymbols etter requecy etter requecy etter requecy Ç Ğ 0.06 0.037 0.0084 0.00 0.0400 0.078 0.0038 0.04 0.009 0.0096 İ Ö 0.0444 0.073 0.3 0.0407 0.0530 0.030 0.0633 0.04 0.0074 0.0073 Ş Ü 0.0604 0.064 0.057 0.087 0.084 0.07 0.0087 0.095 0.030 0.39 able 3a. ussa ymbols (yrllc alphabet) А Б В Г Д Е Ж З И Й К Л М Н О а б в г д е ж з и й к Л м н о П Р С Т У Ф Х Ц Ч Ш Щ Ъ(Ь) Ы Э Ю Я п р с т у ф х ц ч ш щ ъ(ь) ы э ю я able 3b. robabltes of ussa ymbols ymbols robabltes ymbols robabltes А Б В Г Д Е Ж З И Й К Л М Н О П 0.064 0.05 0.039 0.04 0.06 0.074 0.008 0.05 0.064 0.00 0.09 0.036 0.06 0.056 0.095 0.04 Р С Т У Ф Х Ц Ч Ш Щ Ъ(Ь) Ы Э Ю Я 0.04 0.047 0.056 0.0 0.00 0.009 0.004 0.03 0.006 0.003 0.05 0.06 0.003 0.007 0.09 0.45

7 - order to costruct bary codes for glsh, erma, urksh, rech, ussa ad pash, the classcal codg methods are appled to cosdered source alphabets. osequetly, the costructed bary codes are gve respectvely ables 4-9. oreover, ao-uffma ased statstcal codg method s also appled to cosdered laguages. ary costructed by ao-uffma based statstcal codg are gve able 0. glsh o 0 3 4 5 6 7 8 9 0 3 4 5 6 7 8 9 0 3 4 5 6 able 4 ary for robablty strubuto of glsh ymbols rdary 00 0 0 00 0 0 0 0 0 0 0 0 0 0 000 -- 0 0 0 0 0 00 0 0 0 0 0 0 0 0 00 0 00 0 00 00 0 rdered hao 000 000 0 0 0 0 00 0 ao 00 00 00 0 00 0 0 0 0 0 0 0 haced ao 00 00 00 0 00 0 0 0 0 0 0 0 uffma 00 0 0 00 0 0 0 0 0 0 0 0 0 0 0

& 73 able 5. ary for robablty strubuto of erma ymbols erma o rdary -- rdered hao ao haced ao uffma 0 3 4 5 6 7 8 9 0 3 4 5 6 7 8 9 0 3 4 5 00 0 0 00 0 0 0 0 0 0 0 0 0 0 000 0 000 00 00 0 0 00 00 000 0 0 0 00 00 0 0 0 000 0 0 0 0 00 00 0 00 00 0 00 0 00 0 00 0 0 0 0 0 00 00 0 00 0 00 0 00 0 0 0 0 0 000 0 00 0 0 0 0 00 00 0 0 able 6. ary for robablty strubuto of urksh ymbols urksh o rdary -- rdered hao ao haced ao uffma Ç Ğ İ Ö Ş Ü 0 3 4 5 6 7 8 9 0 3 4 5 6 7 8 9 0 3 4 5 6 7 8 9 00 0 0 00 0 0 0 0 0 0 0 0 0 0 000 0 00 0 0 0 0 0 00 0 00 0 000 00 0 0 0 0 00 0 0 0 00 00 0 0 00 0 İ Ü Ş Ç Ğ Ö 0 0 0 0 000 0 0 0 00 0 0 0 0 000 000 0 0 00 0 0 0 000 000 0 0 00 0 0 0 0 000 0 0 0 00 0 0 0 0 0 0 0 0 0 0

- 74 able 7. ary for robablty strubuto of rech ymbols rech o rdary -- rdered hao ao haced ao uffma 0 3 4 5 6 7 8 9 0 3 4 5 6 7 8 9 0 3 4 5 00 0 0 00 0 0 0 0 0 0 0 0 0 0 000 0 00 0 00 0 00 0 0 0 00 00 00 0 0 0 0 00 0 000 0 00 0 0 0 0 0 0 00 00 0 00 0 00 0 00 0 00 0 0 0 0 0 0 00 00 0 00 0 00 0 00 0 00 0 0 0 0 0 0 00 0 0 00 0 0 0 0 0 0 0 0 0 0 able 8. ary for robablty strubuto of ussa ymbols ussa o rdary -- rdered hao ao haced ao uffma А Б В Г Д Е Ж З И Й К Л М Н О П Р С Т У Ф Х Ц Ч Ш Щ Ъ(Ь) Ы Э Ю Я 0 3 4 5 6 7 8 9 0 3 4 5 6 7 8 9 0 3 4 5 6 7 8 9 30 3 00 0 0 00 0 0 0 0 0 0 0 0 0 0 000 0 00 0 0 0 0 00 00 0 000 0 0 0 0 00 00 00 0 00 0 0 00 0 000 0 0 0 0 0 О Е А И Т Н С Р В Л К М Д П У Я Ы З Ъ(Ь) Б Г Ч Й Х Ж Ю Ш Ц Щ Э Ф 0 0 000 0 0 0 0 00 0 0 00 0 000 000 0 0 00 0 00 0 00 0 0 0 000 000 0 0 00 0 00 0 00 0 0 0 0 0 00 00 0 0 0 00 0 00 0 0 0 00 0 0 00 00

& 75 able 9. ary for robablty strubuto of pash ymbols pash ymbols o rdary -- rdered hao ao haced ao uffma 0 3 4 5 6 7 8 9 0 3 4 5 6 7 8 9 0 3 4 5 00 0 0 00 0 0 0 0 0 0 0 0 0 0 000 0 00 0 0 000 00 0 0 0 00 00 0 0 0 0 000 0 000 0 0 0 0 0 0 00 00 0 0 00 0 00 0 00 0 0 0 0 0 00 00 0 0 00 0 00 0 00 0 0 0 0 0 0 00 000 00 0 0 0 0 0 0 0 0 0 0 0 able 0. ary ostructed by ao-uffma ased tatstcal odg ethod urksh ao- uffma based for urksh symbols ussa ao- uffma based for ussa symbols glsh, rech, erma, pash ao- uffma based for glsh symbols ao- uffma based for rech symbols ao-uffma based for erma symbols ao-uffma based for pash symbols Ç Ğ İ Ö Ş Ü 0 00 0 0 0 0 0 0 0 0 0 00 0 0 0 0 00 А Б В Г Д Е Ж З И Й К Л М Н О П Р С Т У Ф Х Ц Ч Ш Щ Ъ(Ь) Ы Э Ю Я 0 00 0 00 00 00 0 00 0 0 0 0 0 0 000 00 00 00 0 0 00 00 0 00 0 0 00 00 00 00 00 000 0 0 00 00 0 00 00 00 000 0 000 00 0 00 000 00 0 0 0 0 000 0 0 0 0 0 0 0 0 00 0 0 0 000 0 0 0 00 0 000 00 0 0 00 0 0 0 0 0 00 00 0 0 000 00 00 00 0 0

76 - order to determe the formato per letter for cosdered alphabets due to the metoed codg methods, the followg stages are preseted: ) he etropy of each metoed laguages () s calculated. ) he codeword legth of each codes show ables 4-0 s obtaed by coutg the bts of the code words ad thus average codeword legth s computed for each codg methods. 3) he formato per letter ( ) f/ letter = s get for terpretato of optmalty of codes. he results of these stages are gve able. s prevously preseted, the optmalty crtera for codes s / s. bvously, t s see from able that, bary codes costructed for each symbols of dfferet alphabet by ao-uffma based statstcal codg method s more optmal tha ao codg method ad s as optmal as costructed by uffma codg method but t s more easly applcable tha uffma codg method. lso, the mproved codg method s more optmal tha the others. oreover, f a fle s coded by ao-uffma based codes the the dmeso of the fle wll be less tha the fles coded by the other cosdered codg methods. ece, ths meas faster commucato. ource glsh urksh rech erma pash ussa able formato per letter set by costructed bary codes hao ao uffma (bts) (bts) (bts) rdary (bts) 0.845 0.873 0.797 0.890 0.803 0.8839 0.880 0.9075 0.8885 0.900 0.950 0.9085 0.9834 0.9937 0.9854 0.990 0.9909 0.995 mproved ao (bts) 0.9839 0.9937 0.9854 0.990 0.9909 0.9936 hao ao las (bts).079.0955.09.083.6.4 0.9905 0.9939 0.9899 0.995 0.994 0.9936 ao-uffma based (bts) 0.9888 0.9939 0.9899 0.990 0.996 0.9936

& 77 ocluso t s see that, bary codes costructed by ao- uffma based statstcal codg method carry formato per letter as much as codes costructed by uffma codg method. owever, by ths codg method the less subset you dvde the more optmal codes you obta. hus, ths result make ao-uffma based statstcal codg method preferred codg methods as uffma codg method for each of the cosdered laguages. ao-uffma based statstcal codg method takes less tme tha uffma codg method to costruct bary codes. owever, t requre more pure computato tha uffma codg method by meas of dvdg the source alphabet to subsets ad ths meas faster codg. s t s commoly kow, operatg system of computers based o merca tadard ode for formato terchage () whch s ordary bary codes. herefore, aother ma result from ths study s the advatage of ao-uffma based codes rather tha. bvously, t ca be cocluded from ths study that ordary codes are ot optmal because they have the hghest average codeword legth ad the least formato per letter. ece, sce codes are ordary codes, the text coded by them wll be larger sze cotrary to ao-uffma based codes. o, codes are ot preferred codes. osequetly, ao-uffma based codes ca be used computer systems for data compresso rather tha for faster commucato. ecause, f a fle s coded by ao-uffma based codes the the dmeso of the fle wll be less tha fle coded by but t wll trasmt the same formato by usg codes cosst of less bts. efereces azhag,. (004). http://cx.rce.edu/cotet/m076/latest/, reatve ommos. over,.. & homas,.. (99). lemets of formato theory. : oh ley & os, c. aller,. (973). adaptve system for data compresso. 7th slomar coferece o crcuts, systems, ad computers, 593 597. allager,. (978). aratos o a theme by uffma. rasactos o formato heory, 4(6), 668 674. akerso,., arrs,.. & ohso,.. (003). troducto to formato theory ad data compresso (d ed.). oca ato, : hapma & all/ ress. akerso,., arrs,. & ohso,. (998). troducto to formato theory ad data compresso. ress. uffma,. (95). method for the costructo of mmum redudacy codes. roceedgs of, 40(9), 098 0. effer,.., & ag,. (). rammar-bassed codes: a ew class of uversal lossless source codes. rasactos o formato heory, 46(3), 737 754. uth,. (985). yamc uffma codg. oural of lgorthms, 6, 63 80. oma,. (997). troducto to odg ad formato heory. ew ork: prger-erlag. ueda,. (00). dvaces data compresso ad patter recogto. h thess, chool of omputer cece, arleto versty, ttawa, aada. ueda,.. & omme.. (004). early-ptmal ao-ased odg lgorthm. formato rocessg ad aagemet, 40, 57-68. ayood,. (). troducto to data compresso ( d ed.). orga aufma. Венцель, Ε. С. (969). Теория Вероятностей, Москва. ratt,. (939). ecret ad urget: he story of codes ad cphers. lue bbo ooks. hamdo,. (). http://dwww.epfl.ch/matra/ours /eb/compresso/eglsh.html, tate versty of ew ork. tephes,. (00). http://www.satacruzpl.org/readyref/fles/gl/ltfrqsp.shtml, ata ruz ublc brares, alfora.

78 - hamlov. ad olaca. (005). arous bary codes for probablty dstrbuto of urksh letters. teratoal oferece rdered tatstcal ata: pproxmatos, ouds ad haracterzatos, pp.70 zmr, urkey. hao,.., & eaver,. (949). he mathematcal theory of commucatos. versty of llos ress. tte,., offat,. & ell,. (999). aagg ggabytes: ompressg ad dexg documets ad mages (d ed.). orga aufma. olaca,. (005). tatstcal propertes of dfferet laguages based o etropy ad formato theory. adolu versty raduate chool of ceces, aster of cece hess (at turksh), sksehr. v,. & empel,. (977). uversal algorthm for sequetal data compresso. rasactos o formato heory, 3(3), 337 343. v,. & empel,. (978). ompresso of dvdual sequeces va varable-rate codg. rasactos o formato heory, 5(5), 530 536.