Word counts and match values

Similar documents
Translation Tools. What they can and cannot do. Angelika Zerfass

integrated translation environment quick start guide

STANDARD FORMATS IN TRANSLATION

integrated translation environment How to translate in memoqwebtrans

User Guide. Copyright Wordfast, LLC All rights reserved.

User Guide. Copyright Wordfast, LLC All rights reserved.

User Guide. Copyright Wordfast, LLC All rights reserved.

Getting Started for Translators Training Course

SDL TRADOS STUDIO 2014 CTI BEGINNER WORKSHOP Tuomas Kostiainen

AAB UNIVERSITY. Lecture 5. Use of technology in translation process. Dr.sc. Arianit Maraj

User Guide. Copyright Wordfast, LLC All rights reserved.

An understanding of the following functionality which may have been gained through training or by experience working with SDL Trados Studio 2017

Online Help for Project Managers and Translators

Workflow - memoq. Contents

Rigi plugin for SDL Trados Studio

MadCap Software. Index Guide. Flare 2017 r2

Online Help for Project Managers and Translators. Copyright Wordfast, LLC All rights reserved.

TQAUDITOR Quick Start Guide

User Guide. Copyright Wordfast, LLC All rights reserved.

Using SDL Trados Studio with SDL TMS Quick Start Guide.

Published by SDL International ( Copyright SDL International. All rights reserved. Documentation for SDLX2004.

Setting Up a Paper in APA Style Using Microsoft Word 2008 for MACs

SDL Trados Studio 2014 Innovation Delivered. Migration Guide

CCRS Quick Start Guide for Program Administrators. September Bank Handlowy w Warszawie S.A.

Editing XML Data in Microsoft Office Word 2003

SDL Trados Studio 2011 Getting Started for Translators

USER GUIDE. MADCAP FLARE 2017 r3. Import

SDL Trados Studio 2014 Translation Memory Management Innovation Delivered. Quick Start Guide

Click: Double-click:

Associated Connect. Quick Reference Guide: Lockbox

Copyright 2018 Maxprograms

TABLE OF CONTENTS TABLE OF CONTENTS... 1 INTRODUCTION... 2 USING WORD S MENUS... 3 USING WORD S TOOLBARS... 5 TASK PANE... 9

goanalyze User s Guide Version: 2.0

GETTING STARTED WITH TIME TRACKING (TIME WORKLET OPTION) WHO NEEDS TO RECORD TIME ACCESSING YOUR TIMESHEET

TMX Wish List. FEISGILTT 2016 Dublin. Angelika Zerfaß

Lesson 1: Writing Your First JavaScript

Table Basics. The structure of an table

User Guide for Project Managers and Translators

MadCap Lingo. Translation Guide. Version 7.0 THIS USER GUIDE WAS CREATED USING MADCAP FLARE

LimeSurvey manual. Version; Authors: Carl DEVOS, Laurence Kohn. Page 1 of 48

Filters and File Formats in Translation Workspace. Project Engineer and Translators Reference Guide

Creating Accessible, Section 508 Compliant PDFs with Adobe Acrobat Pro DC

SureClose Product Line

Tutorial and Exercises with WordList in WordSmith Tools: Level I

Scout Enterprise Report Generator

Microsoft Access: Let s create the tblperson. Today we are going to use advanced properties for the table fields and use a Query.

TRADOS STUDIO 2009 MIGRATE OR NOT TO MIGRATE.

SPEECH RECOGNITION COMMON COMMANDS

ESRI stylesheet selects a subset of the entire body of the metadata and presents it as if it was in a tabbed dialog.

An introduction to translation memory

Managing the Burn Down Agent

Handbook. Sony CLIÉ handheld basic operations. Entering data on your Sony CLIÉ. handheld. Exchanging and updating data using a HotSync operation

Alarms & Events Plug-In PTC Inc. All Rights Reserved.

Lab - Task Scheduler in Windows 7 and Vista

Published by SDL International ( Copyright SDL International. All rights reserved. Documentation for SDLX2004.

Navigating a Database Efficiently

Hotmail Documentation Style Guide

Installation. General Use

Searching Guide. September 16, Version 9.3

Visual Workflow Implementation Guide

Clay Tablet Connector for Adobe Experience Manager. User Guide. Version 2.3.0

Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 2 Working with data in Excel and exporting to JMP Introduction

Creating Accessible PDFs

Sysfilter for Excel MANUAL FOR THE TRANSLATION OF EXCEL FILES WITH SYSFILTER FOR EXCEL. User Guide - Sysfilter for Excel

JMP Scripting Using JMP 14 Exam

Microsoft Excel 2010 Handout

supported file formats & compatibility with other tools

Excel 2013 Intermediate

QuickStart Across Personal Edition v5.5 (Revision: November 8, 2012)

CAL 9-2: Café Soylent Green Chapter 12

USER GUIDE MADCAP FLARE Topics

MS Excel Henrico County Public Library. I. Tour of the Excel Window

WhatsConfigured for WhatsUp Gold 2016 User Guide

ComponentOne. Scheduler for LightSwitch

Objective 1: Familiarize yourself with basic database terms and definitions. Objective 2: Familiarize yourself with the Access environment.

New Features Guide EventTracker v6.2

Easy Time Control Training Manual

Marquette University Time and Attendance

Excel 2013 for Beginners

Privacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras

SDL Multiterm 2014 Extract Because Brand Matters. Tools Guide

ADMINISTRATOR PORTAL MANUAL

& More Date Range Gadgets Date Depends on the Report Run Date

Logging In to MultiTrans 2 The Main Screen 3 My Flow 4. Tasks 11

Advanced Handle Definition

Style Report Enterprise Edition

An Overview of Webmail

Aesop QuickStart User Guide for Campus Users

Lab 8-1 Lab Multiple Form Types - Setup

The Goal of this Document. Where to Start?

Mandarin Oasis TM Library Automation System

Batch Scheduler. Version: 16.0

Microsoft Excel 2013: Part 3 More on Formatting Cells And Worksheet Basics. To apply number formatting:

FSRM (File Server Resource Management)

Inserting or deleting a worksheet

Contents. About This Book...1 Audience... 1 Prerequisites... 1 Conventions... 2

Module 1: Information Extraction

Flow Control: Branches and loops

User Manual For Project Managers, Translators, Reviewers, & Customers

Transcription:

integrated translation environment Word counts and match values 2004-2014 Kilgray Translation Technologies. All rights reserved. 2004-2014 ZAAC. All rights reserved.

Contents Contents... 2 1 General... 3 1.1 Statistics in translation tools... 3 1.2 What is a segment... 4 1.3 What is a word... 4 2. Influences on statistics... 6 3. Match ranges... 8 3.1 Meanings of match values... 8 3.2 Influences on match rates... 11 3.2.1 TMX exchange... 11 4 New functionality in memoq 2014: weighted word counts and automatic deadline calculation. 13 5 Summary... 15 This guide covers memoq 2013 R2 and higher. It contains text items from the English user interface of the program. These items are under constant verification and are subject to change without prior notification. memoq integrated translation environment Page 2 of 15

1 General This guide explains of what a translation memory counts, what is a word, the meaning of match values like a 101% match, tags and tokens. 1.1 Statistics in translation tools Each translation environment tool counts words differently. By default, each translation tool counts: The number of segments, The number of words in segments, The number of characters in words (no spaces), The number of tags/tokens (placeables), The number of repeated segments (repetitions), The number of pre-translated segments. The match values are analyzed based on their similarity (in percent) between segments in the document and segments stored in the TM. The percentage given is calculated on the basis of the word count. For example: 25% of 100% matches means that 25% of the words appear in 100% matches. The below table shows the word counts of memoq 2013 R2 and SDL Trados Studio 2014: memoq 2013 R2: memoq integrated translation environment Page 3 of 15

SDL Trados Studio 2014: 1.2 What is a segment A segment (also called a row in memoq) is a translatable unit ending with a character defined in the segmentation rules. Segmentation rules define when a segment ends and a new one starts. In memoq, you can configure segmentation rules in the Resource Console > Segmentation rules; or in a memoq project, go to Settings > Segmentation rules. By default, a segment ends with: marks like dot, question mark, colon, etc. line breaks such as hard returns, structure tags (which you can find in tagged file formats like XML and HTML). Each tool interprets tab characters, soft returns and semicolons differently, or you can change the behavior through a file format specific setting. A segmentation rule can also have exceptions. For example a dot followed by a space and a capital letter is the end of a segment; but if the dot is preceded by a number, then the dot is not the end of a segment. In memoq, you can also configure custom lists. Add an abbreviation like Prof. to such a list and change the segmentation: memoq will then no longer segment after Prof.. 1.3 What is a word Words are elements that are delimited by spaces or punctuation marks. A word is everything that contains at least one letter. Numbers are also recognized as words. Numbers can occur inside a segment or standalone as their own segment (number only segments). There are differences between tools when hyphens or slashes are used. For example: XYZ-Product, In/Out. memoq integrated translation environment Page 4 of 15

The below tables shows a comparison of word counts: Segment MS Word memoq Trados Studio Sentence with a reference (see page 1). 7 words 7 words 6 words (number is tag) Note: The number 1 in the example is a reference to a page in the original word file. That is why Trados interprets it as a tag. Otherwise Trados would only see it as word as well. Sentence with a link. 4 words 4 words 5 words (the link content appears as separate segment/word). It s a new way of working. 6 words 6 words 7 words (It s = It is) Press the On/Off button. 4 words 4 words 5 words (/ counts as delimiter) Press the On-Off button. 4 words 4 words 5 words (- counts as delimiter) memoq integrated translation environment Page 5 of 15

2. Influences on statistics If the tool does not have the information that something is an abbreviation, like the U.S. in the following example, the tool will segment the sentence incorrectly and there will also be no match or only a lower match from the TM for these partial segments. Another influence on statistics aside of segmentation rules are the settings in the Statistics dialog: If your documents contain tags, you can define that a tag is counted as a part of a word or as X characters, thus adding words to the word count. It also may happen that you lock rows after pretranslation. Such rows (segments) can be number only segments or 101% pre-translated, unambiguous matches. These matches would still be visible for the translator and provide the full context, but these segments cannot be touched by the translator. You can include or exclude locked rows in your statistics. Another option in memoq is to include spaces in character counts. This might be useful if you have to give a line count instead of a character count as you could then divide the number of characters by the standard number of characters per line. Including or excluding these options influence your statistics and give you a different word count. Example for a different tag weight and its influence on the counts: The word count shows 7907 words in no matches and 164 tags within the no match segments. memoq integrated translation environment Page 6 of 15

Now set the tag weight to 0.5, which means that a tag equals half a word. memoq adds 82 words (164 tags divided by 0.5) to the total word count of the no matches. SDL Trados Studio makes a difference between tags and tokens (placeables) and lists them in separate columns in the statistics. A token is an element that SDL Trados Studio recognizes and marks in the translation editor. Further special cases for statistics are: the word count on the target language text (after the translation) and the Post-translation analysis in memoq. memoq counts the actual match values that were inserted during the translation. This can differ from the initial analysis, especially if several translators are working with the same TM. Another special case are LiveDocs corpora. memoq not only enables you to use TMs in your statistics, but also corpora. In the Statistics dialog, check the Project TMs and corpora check box: When you have assigned a corpus to your project, then you can also get matches from this corpus. memoq integrated translation environment Page 7 of 15

3. Match ranges When you run statistics, you get matches for different match ranges. Matching is the comparison of the source segment in a document with all source segments in the TM. 3.1 Meanings of match values This section explains the meaning of match ranges: No match: A segment in the document which does not exist in the TM or has a match lower than the currently set minimum match value. Fuzzy matches: Fuzzy matches are similar segments in a document and TM, but there are differences in the text, punctuation, numbers, spaces, etc. 100% match: The exact same segment in the document appears in the TM. This only means that this segment has been translated before; it does not mean that it is correct, fits the context or has the appropriate quality or register. Repetitions: Segments that appear again and again within one document or between documents in a project. The segments do NOT have a 100% match from the TM (otherwise, these segments would be counted as 100% matches). The segments might be new segments or might have a fuzzy match from the TM. The first time a segment occurs, it is counted as a no match or fuzzy match; the next time these segments occur, they are counted as the 1 st, 2 nd, etc. repetition. X-translate: This is the transfer of text blocks from a bilingual file into an updated version of the source language document. The context is taken into account. For example, you translate a manual version 1.0 in memoq, meanwhile, your customer sends you an updated document. You can then import the updated document, and memoq updates the updated document with the translations from the version 1.0 document. Context match (101% match): The source segment in the TM was saved with its preceding and following segment. The same sequence of segments appears in the document. Double context match: In addition to the preceding and following segment for a 101% match, memoq also stores the ID number (if there was one in the previous translation). memoq integrated translation environment Page 8 of 15

Match ranges in memoq 2013 R2: Match ranges in SDL Trados Studio 2014: Note: A perfect match in SDL Studio is equivalent to X-translated in memoq. A context match in SDL Studio is equivalent to the 101% match range in memoq. Each translation tool calculates the similarity differently, using a different algorithm. Here is an example where the first segment is matched against the segments 2 and 3. memoq 2013 R2: SDL Trados Studio 2014 Specific match values might be reserved for specific changes. memoq shows 95%-99% matches for segments where the text is identical, but other elements differ like numbers, formatting, tags or punctuation. The following example shows the match values from the first segment to the others. If a number differs, you get a 99% match. If tags differ (no matter how many), you get a 99% match. If memoq integrated translation environment Page 9 of 15

numbers AND tags differ, you get a 98% match, as there is a subtraction for two different categories of changes. memoq also indicates specifically the differences below the translation results pane: memoq integrated translation environment Page 10 of 15

3.2 Influences on match rates Match rates can be influenced by the settings you made in the Statistics dialog: internal fuzzy matches/homogeneity. You can also give a penalty for a TM or a user or metadata. In your memoq project, you find the TM settings in Settings > TM Settings. The same applies for LiveDocs corpora: you can set a user penalty and a corpus penalty, and a minimum match threshold. Configure the LiveDocs corpus settings in your memoq project in Settings > LiveDocs Settings. It could also have an influence when you import a TM, coming from another translation tool, using TMX. TMX is the Translation Memory Exchange format to exchange translation memories between different translation tools. Match rates can also differ, when segments were initially translated in a different file format (Adobe FrameMaker -> XML). When you run statistics with the checkbox for Homogeneity enabled, memoq not only compares the segments in the document to the segments in the TM, but also compares these segments with all other segments in the document. If a segment appears in a similar way again, this similar segment will be counted as a match as well. 3.2.1 TMX exchange Match values can differ if the TM was created in the translation tool itself or if the translation memory data was imported using TMX from another translation tool. A scenario could be: 1. Translate a document in tool A. 2. Export the translation memory data to TMX. 3. Import this TMX file into tool B. 4. Translate the same file and check the match rates you get. memoq integrated translation environment Page 11 of 15

Match comparisons: The reasons for the differences are the different segmentation rules, the different interpretation of tag sequences (as one tag or as several consecutive tags) and placeholders. memoq integrated translation environment Page 12 of 15

4 New functionality in memoq 2014: weighted word counts and automatic deadline calculation When you assign users in your project, memoq can automatically calculate the deadlines when the translators and reviewers should deliver the documents: 1. Create an online project. 2. Assign users to the project. Go to the People pane. Click the Add user button. Select the users or a group, then click OK. The users are now listed in the People pane on the Project users tab. 3. Assign multiple users to a document. Go to the Translations pane, select a document, then rightclick the document, select Assign. The Assign selected documents to users dialog appears. 4. Choose Assign to users of this organization. 5. Check the Calculated deadlines check box, and then click the Calculate method... link. Depending on the hours you choose from the Translator, Reviewer 1 and Reviewer 2 drop-down lists, memoq calculates automatically deadlines. memoq can also use a weighted word count for this calculation. Options This number of working hours from now radio button: Select this option to count the work hours from now. Start of business in this number of weekdays radio button: Select this option to start counting from a specific weekday onwards. Monday start of business in this number of weeks radio button: Select this option to start counting on a Monday in a specific week that you indicate in the drop-down lists Translator, Reviewer 1 and Reviewer 2. Note: Hours are only calculated within workday. Note: The week of the project launch does not count when the deadline is calculated by weeks. Note: When you set a number of days, then only weekdays (Monday to Friday) are calculated, day of project launch does not count. memoq integrated translation environment Page 13 of 15

If you choose the Calculate hours from daily capacity (words per day) radio button, you need to enter a number for what the translator is able to translate per day. For example 2500 words, but this depends on the subject. Check the Use weighted word counts, if available check box to use a weighted word count. Weighted word counts are available only when you generated an analysis report for the document. If a report was created, then the most recently generated one is used to calculate the the weighted word count. If no report is found, memoq will use raw word count, even when a weighted option is chosen. Important: Weighted word count is not available for slices of a document. Note: If you choose Use row word counts, then memoq counts words and characters without taking into account existing resources. The weighted word count works like this: o 100% and above counts as 0.3 o A 95-99% match counts as 0.5 o A 94-75% match counts as 0.8 o A no-match (below 75%) counts as 1.0 o Locked segments do not count anywhere. 6. Click OK to apply the automatic word count calculation. The start time for the counting is: for the translator (first actor): file assignment, i.e. when you click OK in the auto-assign dialog or when the automated task runs for the following actors (reviewers): the deadline of the previous role (all these deadlines are filled in at the moment of the file assignment) Important: If you calculate deadlines the local time zone is used for your local projects, and the stored time zone for automated actions on the memoq server. Example to calculate the weighted words: no match (no match found in the TM) = 100% paid lower fuzzy ranges between 50 and 70% higher fuzzy ranges = 40% Exact match (a 100% match from the TM) = 30% Repetition (the same segment is there several times in the document) = 20% Context match (a 101% match from the TM, where the previous and next segment is the same) = 10% X-translated = 10% These values give you the total amount of words and the weighted total. Weighted means for example, you have: 1000 new words (no match) = 1000 words 1000 words calculated for 70% = 700 words 1000 words calculated for 30 % = 300 words memoq integrated translation environment Page 14 of 15

5 Summary Statistics are tool-specific as each tool has its own way of defining what a word is, where a segment ends and how match values are calculated. This means that statistics from different tools are not really comparable. There are lots of settings such as segmentation rules and penalties that can influence the outcome of an analysis. memoq integrated translation environment Page 15 of 15