Hw T enrich transcribed dcuments with mark-up Versin v1.4.0 (22_02_2018_15:07) Last update 30.09.2018 This guide will shw yu hw t add mark-up t dcuments which are already transcribed in Transkribus. This gives yu the pprtunity t define persns, places and abbreviatins. Yu can add custmized tagging categries and search fr individual tags in yur dcuments. Additinally the tags can be exprted in different frmats. Mre infrmatin abut the exprt f tags can be fund in the Hw t Exprt Dcuments frm Transkribus guide. Dwnlad the Transkribus Expert Client, r make sure yu are using the latest versin: - https://transkribus.eu/ Cnsult the Transkribus Wiki fr further infrmatin and ther Hw t Guides: - https://transkribus.eu/wiki/ Transkribus and the technlgy behind it are made available via the fllwing prjects and sites: - https://read.transkribus.eu/ - https://transcriptrium.eu/ - https://github.cm/transkribus/ Cntact: - The Transkribus Team: email@transkribus.eu
2 Hw t enrich transcribed dcuments with mark-up Cntents Intrductin... 3 Tagging interface... 3 Create yur wn tags... 4 Adding tags... 5 Histrical letters and abbreviatin signs... 9 Illegible text... 11 Deletins... 11 Black ut text... 12 Searching fr tags... 13 Metadata... 15 Editrial Declaratin... 16 Credits... 16 The READ prject has received funding frm the Eurpean Unin s Hrizn 2020 research and innvatin prgramme under grant agreement N 674943.
3 Hw t enrich transcribed dcuments with mark-up Intrductin The tagging interface in Transkribus enables yu t - Assign tags t imprtant wrds r phrases in yur dcument. - Search fr individual tags r tag categries. - Exprt the tags yu added in different file frmats s that yu can g n wrking with them utside f Transkribus. Tagging interface - The tagging interface can be fund by clicking the Metadata tab, and then the Textual tab. Figure 1 The Textual tab - If yu click the Shw all buttn at the bttm f the Textual tab, all the predefined tags will be shwn. Yu can start wrking with these right away.
4 Hw t enrich transcribed dcuments with mark-up Figure 2 Shw all predefined tags Figure 3 Predefined tags in Transkribus Create yur wn tags - T create yur wn tag categries, click the Custmize buttn in the Tags tab. The Tag cnfiguratin windw will pen up.
5 Hw t enrich transcribed dcuments with mark-up Figure 4 Create yur wn tags - With the Create new tag buttn yu can add yur wn tags. - Once yu have created a new tag, it will appear when yu click the Shw all buttn. - In the Tag cnfiguratin windw predefined tags are shwn in italics, custmized nes are shwn withut italicisatin. Adding tags - If yu want t tag a wrd r phrase there are three ways (at least) t d it: Highlight the text in the Text Editr field and afterwards click n the green + buttn f the tag yu want t apply. Figure 5 Highlight the wrd t be tagged
6 Hw t enrich transcribed dcuments with mark-up Figure 6 Chsing the right tag Alternatively, yu can highlight the wrd r phrase and then make a right click with yur muse. Under All tags the suitable ne can then be chsen. Figure 7 Tag a wrd r phrase with right muse click Finally, if there are tag categries yu use frequently, yu can create a shrtcut fr them in rder t speed up yur wrk. T d s, within the Textual tab, click the Custmize buttn in the Tags tab. In the Tag Specificatins sectin, yu can nw add yur preferred shrtcut in the Shrtcut clumn.
7 Hw t enrich transcribed dcuments with mark-up Figure 8 Add shrtcuts fr frequently used tags - Yu can als add a shrtcut relating t the prperties f yur tags, e.g. fr expanding abbreviatins r adding a standardised cuntry name t a place tag. Click the Custmize buttn in the Tags tab. In the Tag cnfiguratin windw click the desired tag. The details relating t that tag will appear in the Prperties sectin. Click Add prperty t add the prperty yu wuld like. Then click Add tag specificatin Nw yur tag and its prperty (e.g. an expansin fr an abbreviatin) will appear in the Tag Specificatin sectin f the windw. Add the shrtcut yu wuld like t use. Nw yu can add the tag and its prperty by simply highlighting the wrd r phrase in the Text Editr field and then pressing the shrt cut.
8 Hw t enrich transcribed dcuments with mark-up Figure 9 Hw t add a fixed abbreviatin - If yu tagged smething by mistake yu can und it by highlighting the wrd r phrase again, right clicking with yur muse and then pressing the Delete buttn. The prgram will give yu tw ptins: Delete nly the highlighted tag Delete all the tags fr the current cllectin - Nte: Tags can be applied t text n regin, line, wrd, r even character level. T apply tags t a segmentatin element, click n a text r line regin in the Canvas image viewer and fllw the abve instructins. - Users can apply as many tags as necessary t the text. - In the Textual tab Transkribus will give yu an verview f the tags yu have put in yur dcument.
9 Hw t enrich transcribed dcuments with mark-up Figure 10 Overview f tags Histrical letters and abbreviatin signs - In mdern dcuments the handling f abbreviatins is less imprtant, but in histrical dcuments it is a cmplex and challenging task. - In earlier time perids wrds were ften heavily abbreviated, in the hpe f writing faster r saving paper. In sme dcuments mre than 20 r 30% f all wrds are abbreviated as shwn in the figure belw: Figure 11 Examples f typical abbreviatins in Latin text f the Middle Ages (cf. Wikipedia: https://en.wikipedia.rg/wiki/scribal_abbreviatin) - Again there are tw main ptins t transcribe abbreviated text: Optin 1: Expand abbreviatins in the usual way. Neural netwrks are ften able t learn t recgnise and reprduce expansins. E.g. Latin prefixes and suffices such as
10 Hw t enrich transcribed dcuments with mark-up cum, cn r us and rum are learned easily by the machine. This means that yu just need t prvide an expanded versin f the text in yur transcriptin. Optin 2: Keep t the rule mentined abve as lng as yu can recgnize the base character transcribe the base character. This rule is especially suited t histrians and peple interested in the cntent f a dcument and thse wh want t prvide training data fr the HTR engine. Nte: When it cmes t HTR training, tags are nt relevant yet. Develpments in Named Entity Recgnitin technlgy shuld make the autmated recgnitin f tags pssible in the future. Therefre the crrect transcriptin fr the examples abve wuld be simple: pdr qq cus qr Nte: In the future HTR engines may als learn t autmatically expand these abbreviatins (r t supply the crrect abbreviatin fr an expansin) s that cmputer assisted transcriptin may be supprted. Optin 3: If yu are als interested in using Unicde characters which are near t the special graphemes f the riginal dcument, then yu can transcribe the text by utilizing the full pwer f Unicde. In this case the transcriptin f abve culd lk like the fllwing: pˀ: LATIN SMALL LETTER P COMBINING OGONEK ABOVE ᵭ: LATIN SMALL LETTER D WITH MIDDLE TILDE : LATIN SMALL LETTER O : LATIN SMALL LETTER RUM ROTUNDA. Als LATIN SMALL LETTER R ROTUNDA may be used t represent this letter. Nte: In real-wrld cases it is ften hard t decide which diacritic, mdifier letter r Unicde character may be the right ne. Yu may cnsult the MUFI website t get mre infrmatin n this issue (cf. sectin References ): http://flk.uib.n/hnh/mufi/ Unicde and ther special characters can be fund in the Virtual keybards buttn in the Text Editr menu. Figure 12 Virtual keybards buttn
11 Hw t enrich transcribed dcuments with mark-up Figure 13 Virtual keybards windw - Of curse mixed mdels will ften be useful. E.g. frequently ccurring histrical characters may be transcribed with their crrect Unicde letter, whereas characters which were used just by a specific writer may be transcribed with their base character. Yu shuld nte such editrial decisins in the Editrial Declaratin in the Dcument tab, within the Metadata tab s that yur transcriptin rules are transparent t ther users. Example: LATIN SMALL LETTER RUM ROTUNDA is regularly used in medieval and early mdern texts. Therefre it might be useful t intrduce this letter t an HTR mdel which deals exclusively with medieval dcuments and is dedicated t prcessing large amunts f such dcuments. Illegible text - Text which cannt be transcribed since it is illegible can be marked with the tags unclear r gap. - If the text is unclear, highlight it in the text editr field and tag it as unclear. - If text is impssible t read, click yur cursr where the text appears in the text editr field and add the gap tag. - Yu may als add alternatives r suggestins fr the illegible wrd in the Prperties sectin f the tag. Deletins - If yu discver deleted text yu have several ptins: Optin 1: The text which is deleted is still readable, r at least large prtins are readable. In this case transcribe the text as well as pssible and mark it as strike thrugh. Yu can find the strike thrugh buttn in the Text Editr menu.
12 Hw t enrich transcribed dcuments with mark-up Figure 14 Strike thrugh buttn Nte: HTR engines are able t decipher strike thrugh text and the mre examples they have, the better. Optin 2: The text which is deleted is illegible, r nly small parts can be read. In this case use the gap tag t indicate that there is sme text which is illegible. Black ut text - The blackening tag can be used t redact sensitive infrmatin in the exprt frmats. Typically this is used t hide persnal data in a dcument which is made publicly available. - The blackening tag is used in cnjunctin with the blackening regin which must be added with the segmentatin tls. - T blacken part f yur text: Use the drp dwn menu n the + segmentatin element buttn n the Canvas menu and select Blackening. Use the Blackening regin t mark the wrd r sectin that yu want t hide. Nte: Click the Item visibility buttn n the Main menu and select Render blackenings t display the blackened sectins n a page. Highlight the crrespnding wrd in the Text Editr field and select the Blackening tag. In the exprt f the dcument the text will be replaced by: [ ]. When yu exprt yur dcument, make sure that D blackening is selected. Nte: In METS and TEI files the wrd r phrase is blacked ut but the infrmatin behind the blackened sectin is kept. In ther file frmats, the text behind the blacked ut sectin is cmpletely bscured.
13 Hw t enrich transcribed dcuments with mark-up Figure 15 Select "D blackening" t hide image regins and text in exprted files Searching fr tags - If yu need t search fr distinct tags click the binculars buttn in the Textual tab. Figure 16 Binculars buttn fr tag search - In the windw which will pen up yu can define yur search Chse where yu wuld like t search (current cllectin, current page ) Line r wrd level In the Name field put the name f the tag In the Text field put the written text Press the Search! buttn The search results will appear at the bttm f the windw.
14 Hw t enrich transcribed dcuments with mark-up Figure 17 Search fr.. windw fr tag search - T quickly add an expansin r anther prperty t a wrd which appears several times in the text: Srt the searching results by Value. This is dne by simply clicking n Value. Mark the similar wrds by clicking them while hlding the Cntrl buttn n yur keybard. Then click the Assign tag values buttn and type in the prperty that shuld be added.
15 Hw t enrich transcribed dcuments with mark-up Figure 18 Speeding up yur wrk by adding prperties t mre wrds r phrases at the same time Metadata - We are currently supprting nly a very simple descriptin f dcuments since we assume that in a Digital Editin mst f the metadata wuld reside n an external server and be linked t the dcument. Every dcument has its unique ID and can be accessed als via the REST services prvided by the Transkribus platfrm (https://transkribus.eu/wiki/). - The fllwing fields are currently available in the Dcument tab, within the Metadata tab: Title Authr Upladed Genre Writer Language Script type Date f writing Descriptin
16 Hw t enrich transcribed dcuments with mark-up Editrial Declaratin - Since there are always several ways t prduce a crrect transcript f a text it is imprtant t be transparent abut the way in which the transcriptin was undertaken. - Fr this purpse we have included a special feature in Transkribus, called Editrial Declaratin. This is fund in the Dcument tab, within the Metadata tab. - As with the tagging system, the Editrial Declaratin ffers a set f predefined features and ptins. Mrever yu are able t create yur wn descriptins and t stre them tgether with yur dcument. - It is especially imprtant t list special characters and their use in the Editrial Declaratin using the frm: Character Set Extensin: LATIN SMALL LETTER LONG S (U+017F) Figure 19 Create yur Editrial Declaratin buttn Credits We wuld like t thank the many users wh have cntributed their feedback t help imprve the Transkribus sftware. Transkribus is made available t the public as part f H2020 e-infrastructure Prject READ (Recgnitin and Enrichment f Archival Dcuments) which received funding frm the Eurpean Cmmissin under grant agreement N 674943.