Tina John University of Munich Workshop on standards for phonological corpora Tina John M.A. 1

Size: px
Start display at page:

Download "Tina John University of Munich Workshop on standards for phonological corpora Tina John M.A. 1"

Transcription

1 Tina John University of Munich 1

2 Emu Speech Database System Database system for: audio data parametrical data annotation 2

3 Emu Speech Database System provides: 3

4 Platforms following setups are available Linux Mac OsX (universial build) > 10.3 Windows (not tested much on Windows 7) upcoming builds: Mac OsX < 10.3 Solaris (this summer) build it for your platform using the open source from CVS ( 4

5 Database organisation Emu Speech Database 5

6 Database organisation Emu Speech Database there is a graphical user interface for creating the template content 6

7 Emu Speech Database System Database storage local or on server apart from the EMU Systems Storage of data and access to data organized by the local or server system Database structure defined by limited database template location of signals and annotations structure of annotation display options Database organisation via access to database template and template contents 7

8 Exchange of databases zip your database in a given format create a database information file put both on a server send url of db information file to recipient... 8

9 audio data processed for display and play by The Snack Sound Toolkit supported audio formats: WAV, MP3, AU, SND, AIFF, SD, SMP, CSL processed for data analysis by tkassp a GUI for the assp tools supported audio formats: WAV, AU, SND, AIFF, CSL Emu totally supported audio formats: WAV, AU, SND, AIFF, CSL 9

10 parametrical data derived from audio data by tkassp stored in the Simple Signal File Format a simple version of the ESPS data file format Header: ASCI Magic: SSFF (c) SHLRC time information information about postion of included data name, type, nr. of columns 17 - mark end of header Data stored as binary 10

11 annotation different types of annotation: timeless hierarchical times of segments derived from associated segments without any time at all time bounded segments / intervals events / points in time stored in different label files not independent but usable apart from each other unicode utf8 / utf16 not supported yet 11

12 time bounded annotation stored in Emu label files with user defined (in template file) extension in minor modified ESPS xlabel file format ASCI Header (optional): signal utterance name nfields 1 (current) # - marks end of header Data: time in template defined time units segments end time onset time of the first segment is marked by H# events time mark Number a colour code the label 12

13 time less and hierarchical annotation stored in Emu hlb file (extension) hierarchical label files ASCI Magic : ** EMU hierarchical labels** Number of labels Information per level Association of segments encoded by their segment numbers per Segment 13

14 hierarchical label files Information per level Level Labeltype1 Labeltype2... information per segment Segment number # in order of creation label of labeltype1 label of labeltype2... Association of segments per Segment segment number ordered segment numbers of associated labels 14

15 hierarchical label files time bounded not associated labels chronologically ordered totally time less labels 15

16 hierarchical label files.hlb includes: all tiers all labels at all tiers time less and time bounded all associations between labels does not contain any time information but the order of the labels of time bounded levels Usable without external label files but need to be in line with the label files if they are used together 16

17 Interface to other speech proc. tools direct access: Praat wavesurfer ESPS read only: ACCOR, SPEECHSTN, TIMIT, KIEL (simple old format) indirect access: per converting Praat per one way conversion Kiel Corpus Articulate Assistant 17

18 Interface to Praat on the fly conversion of time bounded tiers to Praat TextGrid TextGrid and Sound opened via Sendpraat hierarchical information is kept segment numbers in front of label save changes via praat script tcl script derives changes from the given segment numbers direct access video 18

19 Interface to Praat direct access 19

20 Interface Emu-Praat-Emu conversion of all tiers to Praat TextGrid times of timeless tiers are derived from associated segments absolute time less labels are ignored looses metric of data indirect access 20

21 Interface Emu-Praat-Emu conversion of all Praat tiers to time bounded Emu levels one label file per tier creates corresponding database template file additional information needs to be added to the template indirect access 21

22 Interface Emu-Praat-Emu indirect access 22

23 Interface Emu-Praat-Emu associate segments from time 23

24 Interface to Kiel Corpus per one way conversion Kiel Corpus labelling includes: orthographic canonical differences between canonical and phonetical realisation phonetical autosegmental prosodical stress intonation paralinguistical annotation file Rearrange linear labelling to readable hierarchy 24

25 a Kiel Corpus annotation file 25

26 annotation file 26

27 Different hierarchies 27

28 Query Language use the Emu Query Language to query your annotation structure for everything you annotated Example: 4 syllabic words with word inital /f/ in an L Phrase that occure after a function Word. Query string: [ Word!= g4d6j7 & Func = I -> [ [ Word!= g4d6j7 & Num ( Word,Syllable ) = 4 ^ Phrase = L ] ^ Kanonic = f & Start ( Word,Kanonic ) = 1 ] ] graphical solution to get the query string: 28

29 29

30 Interface to R an example explains more than words 30

31 Vowelspace analysis Vowelspace 31

32 Vowel space analysis in R use R to query the vowels in the database for the male and female speaker query formant values at the mid of segments plot the vowel spaces 32

33 create dictionary of database words use R to query the database for words and the linked phonetic segments order the results write dictionary to Excel readable csv file 33

34 Requests from 1 st workshop day and what Emu can solve: availability of tools on different platforms Win Mac Linux less scripting more Graphical User Interfaces Database/Corpus exchangeability Information about updates of tools via emu.sf.net only yet Alignment of audio and text running on old machines turn taking / conversation overlapping make use of suitable annotation structure (link segments to speaker) Query/ search function for metadata and segmental data (add and link metadata to annotation) Quantitative and qualitative analyses use emu/r functions Visualisation of vowel space use emu/r functions make a dictionary from corpus use emu/r functions interlinear morpheme translation make use of suitable annotation structure (time less linked annotations), may be use auto hierarchy build functions for automatic annotation 34

35 Emu at WWW setup patches documentation publications FAQ mailing list forum feature request bug and feature request tracking system Introduction to the Emu system: Thank You Harrington, J. (in press). Phonetic Analysis of speech corpora. Blackwell. 35

WaveSurfer at a glance

WaveSurfer at a glance WaveSurfer at a glance WaveSurfer has a simple but powerful interface. The basic document you work with is a sound. When WaveSurfer is first started, it contains an empty sound. You can load a sound file

More information

ELAN Linguistic Annotator

ELAN Linguistic Annotator ELAN Linguistic Annotator Introduction to Speech and Video Annotation with ELAN http://www.lat-mpi.eu/tools/elan Han Sloetjes, Max-Planck-Institute for Psycholinguistics November 2011 Introduction to ELAN

More information

PresenterPro: A tool for recording, indexing and processing prompted speech with Praat

PresenterPro: A tool for recording, indexing and processing prompted speech with Praat Zurich Open Repository and Archive University of Zurich Main Library Strickhofstrasse 39 CH-8057 Zurich www.zora.uzh.ch Year: 2016 PresenterPro: A tool for recording, indexing and processing prompted speech

More information

ELAR: instructions for depositors

ELAR: instructions for depositors ELAR: instructions for depositors As a requirement of your ELDP grant, you must deposit your data with the Endangered Languages Archive (ELAR) at SOAS on an annual basis at the same time when you hand

More information

The Annotation Graph Toolkit: Software Components for Building Linguistic Annotation Tools

The Annotation Graph Toolkit: Software Components for Building Linguistic Annotation Tools The Annotation Graph Toolkit: Software Components for Building Linguistic Annotation Kazuaki Maeda, Steven Bird, Xiaoyi Ma and Haejoong Lee Linguistic Data Consortium, University of Pennsylvania 3615 Market

More information

Spock - a Spoken Corpus Client

Spock - a Spoken Corpus Client Spock - a Spoken Corpus Client Maarten Janssen, Tiago Freitas IULA/ILTEC, ILTEC Plaça de la Mercé 10-12 Barcelona, Rua Conde de Redondo 74-5 Lisboa maarten@iltec.pt, taf@iltec.pt Abstract Spock is an open

More information

AlignTool Documentation (Version 0c2855a, August 2017)

AlignTool Documentation (Version 0c2855a, August 2017) AlignTool Documentation (Version 0c2855a, August 2017) Eva Belke, Verena Keite and Lars Schillingmann Content 1. What is AlignTool?... 2 2. Getting started... 3 2.1. Requirements... 3 2.2. AlignTool folder...

More information

Installing Audacity Sound Editing Software Open up a web browser such as Internet Explorer and type the following Internet address in to the address field: http://audacity.sourceforge.net/download/ The

More information

Praat Scripting Workshop

Praat Scripting Workshop Praat Scripting Workshop North Carolina State University Friday, September 18, 2015 Why Scripting? What Even is Scripting? Objectives Why Scripting? Making measurements by hand is really tedious and repetitive.

More information

User Guide for ELAN Linguistic Annotator

User Guide for ELAN Linguistic Annotator User Guide for ELAN Linguistic Annotator version 5.0.0 This user guide was last updated on 2017-05-02 The latest version can be downloaded from: http://tla.mpi.nl/tools/tla-tools/elan/ Author: Maddalena

More information

ELAN teaching set. Introduction. Step 1: Adapting the basic template

ELAN teaching set. Introduction. Step 1: Adapting the basic template Working with ELAN and FLEx together: an ELAN-FLEx- ELAN teaching set Tim Gaved (tim_gaved@soas.ac.uk) and Sophie Salffner (ss123@soas.ac.uk) January 2014 Introduction This document describes a possible

More information

TUTORIAL ADoReVA. Download ADoReVA

TUTORIAL ADoReVA. Download ADoReVA TUTORIAL ADoReVA Download ADoReVA The clustering algorithm that we developed is named ADoReVA (and stands for «Automatic Detection of Register Variations Algorithm»). It was specifically conceived to be

More information

MUSE: AN OPEN SOURCE SPEECH TECHNOLOGY RESEARCH PLATFORM. Peter Cahill and Julie Carson-Berndsen

MUSE: AN OPEN SOURCE SPEECH TECHNOLOGY RESEARCH PLATFORM. Peter Cahill and Julie Carson-Berndsen MUSE: AN OPEN SOURCE SPEECH TECHNOLOGY RESEARCH PLATFORM Peter Cahill and Julie Carson-Berndsen CNGL, School of Computer Science and Informatics, University College Dublin, Dublin, Ireland. {peter.cahill

More information

DATABASES IN LANGUAGE DOCUMENTATION NICK THIEBERGER & TOSHIHIDE NAKAYAMA SESSION 2

DATABASES IN LANGUAGE DOCUMENTATION NICK THIEBERGER & TOSHIHIDE NAKAYAMA SESSION 2 DATABASES IN LANGUAGE DOCUMENTATION NICK THIEBERGER & TOSHIHIDE NAKAYAMA SESSION 2 IN CLASS PRESENTATIONS Prepare a short (5 min) presentation on a particular database or spreadsheet you have built or

More information

If you re using a Mac, follow these commands to prepare your computer to run these demos (and any other analysis you conduct with the Audio BNC

If you re using a Mac, follow these commands to prepare your computer to run these demos (and any other analysis you conduct with the Audio BNC If you re using a Mac, follow these commands to prepare your computer to run these demos (and any other analysis you conduct with the Audio BNC sample). All examples use your Workshop directory (e.g. /Users/peggy/workshop)

More information

Best practices in the design, creation and dissemination of speech corpora at The Language Archive

Best practices in the design, creation and dissemination of speech corpora at The Language Archive LREC Workshop 18 2012-05-21 Istanbul Best practices in the design, creation and dissemination of speech corpora at The Language Archive Sebastian Drude, Daan Broeder, Peter Wittenburg, Han Sloetjes The

More information

Phonological CorpusTools Workshop. Kathleen Currie Hall & Scott Mackie Annual Meeting on Phonology, Vancouver, BC 9 October 2015

Phonological CorpusTools Workshop. Kathleen Currie Hall & Scott Mackie Annual Meeting on Phonology, Vancouver, BC 9 October 2015 Phonological CorpusTools Workshop Kathleen Currie Hall & Scott Mackie kathleen.hall@ubc.ca Annual Meeting on Phonology, Vancouver, BC 9 October 2015 I. Introduction A. What is PCT? i. a free, downloadable

More information

Time Group Analyzer (TGA)

Time Group Analyzer (TGA) Time Group Analyzer (TGA) TGA: An Online Tool for Time Group Analysis Dafydd Gibbon http://wwwhomes.uni-bielefeld.de/gibbon/tga Interspeech Methodology Tutorial, Dresden 2015 Time Group Analyzer: Summary

More information

UiT Open Research Dataset Guidelines

UiT Open Research Dataset Guidelines UiT Open Research Dataset Guidelines Contents: I. Summary... 2 II. File naming... 3 III. Persistent file formats... 3 IV. Saving or converting your data into a consistent format... 5 A. Audio... 5 1. Recording...

More information

Story Workbench Quickstart Guide Version 1.2.0

Story Workbench Quickstart Guide Version 1.2.0 1 Basic Concepts Story Workbench Quickstart Guide Version 1.2.0 Mark A. Finlayson (markaf@mit.edu) Annotation An indivisible piece of data attached to a text is called an annotation. Annotations, also

More information

Topics in Linguistic Theory: Laboratory Phonology Spring 2007

Topics in Linguistic Theory: Laboratory Phonology Spring 2007 MIT OpenCourseWare http://ocw.mit.edu 24.910 Topics in Linguistic Theory: Laboratory Phonology Spring 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Kibana, Grafana and Zeppelin on Monitoring data

Kibana, Grafana and Zeppelin on Monitoring data Kibana, Grafana and Zeppelin on Monitoring data Internal group presentaion Ildar Nurgaliev OpenLab Summer student Presentation structure About IT-CM-MM Section and myself Visualisation with Kibana 4 and

More information

Power BI 1 - Create a dashboard on powerbi.com... 1 Power BI 2 - Model Data with the Power BI Desktop... 1

Power BI 1 - Create a dashboard on powerbi.com... 1 Power BI 2 - Model Data with the Power BI Desktop... 1 Our course outlines are 1 and 2 hour sessions (all courses 1 hour unless stated) that are designed to be delivered presentation style with an instructor guiding attendees through scenario based examples

More information

Ball Aerospace s Open Source Command and Control System. Ryan Melton Ball Aerospace & Technologies Corp. Boulder, CO

Ball Aerospace s Open Source Command and Control System. Ryan Melton Ball Aerospace & Technologies Corp. Boulder, CO Ball Aerospace s Open Source Command and Control System Ryan Melton Ball Aerospace & Technologies Corp. Boulder, CO 8/5/2016 VISION To enable small satellite developers to easily and cost effectively command

More information

ELAN. Multimedia Annotation Tool. Max-Planck-Institute for Psycholinguistics Han Sloetjes

ELAN. Multimedia Annotation Tool. Max-Planck-Institute for Psycholinguistics   Han Sloetjes ELAN Multimedia Annotation Tool Max-Planck-Institute for Psycholinguistics http://www.lat-mpi.eu/tools/elan Han Sloetjes (han.sloetjes@mpi.nl) Augsburg, 30 July 2009 ELAN written in Java programming language

More information

CORLI. a linguistic consortium for corpus, language and interaction

CORLI. a linguistic consortium for corpus, language and interaction CORLI a linguistic consortium for corpus, language and interaction CORLI and HUMA-NUM CORLI = Corpus, Languages, and Interaction a French consortium of Huma-Num involved in linguistic research and teaching

More information

IDMT Transcription API Documentation

IDMT Transcription API Documentation IDMT Transcription API Documentation 06.01.2016 Fraunhofer IDMT Hanna Lukashevich, lkh@idmt.fraunhofer.de Sascha Grollmisch, goh@idmt.fraunhofer.de Jakob Abeßer, abr@idmt.fraunhofer.de 1 Contents 1 Introduction

More information

Documentation and analysis of an. endangered language: aspects of. the grammar of Griko

Documentation and analysis of an. endangered language: aspects of. the grammar of Griko Documentation and analysis of an endangered language: aspects of the grammar of Griko Database and Website manual Antonis Anastasopoulos Marika Lekakou NTUA UOI December 12, 2013 Contents Introduction...............................

More information

Annotation by category - ELAN and ISO DCR

Annotation by category - ELAN and ISO DCR Annotation by category - ELAN and ISO DCR Han Sloetjes, Peter Wittenburg Max Planck Institute for Psycholinguistics P.O. Box 310, 6500 AH Nijmegen, The Netherlands E-mail: Han.Sloetjes@mpi.nl, Peter.Wittenburg@mpi.nl

More information

Towards Corpus Annotation Standards The MATE Workbench 1

Towards Corpus Annotation Standards The MATE Workbench 1 Towards Corpus Annotation Standards The MATE Workbench 1 Laila Dybkjær, Niels Ole Bernsen Natural Interactive Systems Laboratory Science Park 10, 5230 Odense M, Denmark E-post: laila@nis.sdu.dk, nob@nis.sdu.dk

More information

Preservation. Session 4: Techniques & Audio. Arienne M. Dwyer University of Kansas. Yoshi Ono University of Alberta

Preservation. Session 4: Techniques & Audio. Arienne M. Dwyer University of Kansas. Yoshi Ono University of Alberta Session 4: Techniques & Audio University of California at Santa Barbara, June 24-27, Arienne M. Dwyer University of Kansas Yoshi Ono University of Alberta 1 Session 4 s focus I. Homework review II. Transcriber

More information

arxiv: v6 [cs.cl] 10 Jan 2018

arxiv: v6 [cs.cl] 10 Jan 2018 CoPaSul Manual Contour-based, parametric, and superpositional intonation stylization Uwe D. Reichel Research Institute for Linguistics Hungarian Academy of Sciences uwe.reichel@nytud.mta.hu Version 0.7.x,

More information

Scripting with Praat. Day 3: Make Praat make decisions for you! 1

Scripting with Praat. Day 3: Make Praat make decisions for you! 1 Scripting with Praat Day 3: Make Praat make decisions for you! 1 Housekeeping Please sign your name on this sheet and indicate Audit, P/F or letter grade Office Hours: moved to Library basement still Tuesday

More information

Splicing Instructions

Splicing Instructions Splicing Instructions When we create our experiments, we need to be able to play individual words as experiment stimuli. In order to get these individual words, we have a native speaker of whatever language

More information

LORD PCAA LIONS Mat.Hr.Sec School, Reserve Line, Sivakasi

LORD PCAA LIONS Mat.Hr.Sec School, Reserve Line, Sivakasi Virudhunagar District schools Common First Mid Term Test, July 2018 Standard 12 - computer science Part - I I. Choose the correct answer for the following : 10 X 1 = 10 1. Shift + Tab key is used to move

More information

ANALOR Manual.

ANALOR Manual. ANALOR Manual http://www.lattice.cnrs.fr/analor ABOUT ANALOR is software provides an automatic segmentation of recordings. ANALOR first developed for the prosodic analysis of French. The method of analysis

More information

Growing interests in. Urgent needs of. Develop a fieldworkers toolkit (fwtk) for the research of endangered languages

Growing interests in. Urgent needs of. Develop a fieldworkers toolkit (fwtk) for the research of endangered languages ELPR IV International Conference 2002 Topics Reitaku University College of Foreign Languages Developing Tools for Creating-Maintaining-Analyzing Field Shoju CHIBA Reitaku University, Japan schiba@reitaku-u.ac.jp

More information

TGP USER MANUAL Automatic Praat's TextGrid File Parser

TGP USER MANUAL Automatic Praat's TextGrid File Parser TGP USER MANUAL Automatic Praat's TextGrid File Parser release 1.0 Pere Milà 20 August, 2008 IRCS, University of Pennsylvania pere@peremila.com tgp.peremila.com CONTENTS 1 Introduction...3 2 Installing

More information

Release notes for version 3.1

Release notes for version 3.1 Release notes for version 3.1 - Now includes support for script lines and character names. o When creating an Excel file project, it is possible to specify columns used for script lines and for character

More information

XO Hosted PBX Recording Custom Greetings LAST UPDATED: 21 Mar 2013

XO Hosted PBX Recording Custom Greetings LAST UPDATED: 21 Mar 2013 About This Document This document explains how to create and use your own audio files with the Auto Attendant feature of XO Hosted PBX. Recording a Custom Auto Attendant Greeting Your company can create

More information

TrainingCentre Getting Started with the Universal

TrainingCentre Getting Started with the Universal TrainingCentre Getting Started with the Universal Communications Format Toolkit InterCall, a subsidiary of West Corporation, in partnership with WebEx Communications, Inc provides TrainingCentre web conferencing

More information

webqda v3 - Distinguishing features

webqda v3 - Distinguishing features webqda v3 - Distinguishing features This document is intended to be read in conjunction with the Choosing a CAQDAS Package Working Paper which provides a more general commentary of common CAQDAS functionality.

More information

Archi - ArchiMate Modelling. What s New in Archi 4.x

Archi - ArchiMate Modelling. What s New in Archi 4.x Archi - ArchiMate Modelling What s New in Archi 4.x Important Notice It's always a good idea to make backup copies of your data before installing and using a new version of Archi. Whilst we make every

More information

How to import text transcription

How to import text transcription How to import text transcription This document explains how to import transcriptions of spoken language created with a text editor or a word processor into the Partitur-Editor using the Simple EXMARaLDA

More information

ITTC Science of Communication Networks The University of Kansas EECS 784 Identifiers, Names, and Addressing

ITTC Science of Communication Networks The University of Kansas EECS 784 Identifiers, Names, and Addressing Science of Communication Networks The University of Kansas EECS 784 Identifiers, Names, and Addressing James P.G. Sterbenz Department of Electrical Engineering & Computer Science Information Technology

More information

Erhard Hinrichs, Thomas Zastrow University of Tübingen

Erhard Hinrichs, Thomas Zastrow University of Tübingen WebLicht A Service Oriented Architecture for Language Resources and Tools Erhard Hinrichs, Thomas Zastrow University of Tübingen Current Situation Many linguistic resources (corpora, dictionaries, ) and

More information

Cumulus Release Notes

Cumulus Release Notes The Cumulus version 11.0.2 is a maintenance release comprising bug fixes, performance enhancements, and feature anhancements. This release improves all Cumulus products, including Workgroup, Enterprise,

More information

An Open Source Speech Synthesis Frontend for HTS

An Open Source Speech Synthesis Frontend for HTS An Open Source Speech Synthesis Frontend for HTS Markus Toman and Michael Pucher FTW Telecommunications Research Center Vienna Donau-City-Straße 1, A-1220 Vienna, Austria http://www.ftw.at {toman,pucher}@ftw.at

More information

ATLAS.ti Windows 7.5 ATLAS.ti Windows v.8.0

ATLAS.ti Windows 7.5 ATLAS.ti Windows v.8.0 ATLAS.ti Windows 7.5 ATLAS.ti Windows v.8.0 Document last updated: December 6, 2016 Supported Data Types Text (txt, rtf, doc, docx) Open Office documents (odt) PDF (text and image) Images Audio Video Geo

More information

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

PDF hosted at the Radboud Repository of the Radboud University Nijmegen PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is a publisher's version. For additional information about this publication click this link. http://hdl.handle.net/2066/40896

More information

Soar and Related Projects Refresh on SourceForge

Soar and Related Projects Refresh on SourceForge Soar and Related Projects Refresh on SourceForge Standardization and Updates for Increased Usability June 26, 2003 Soar Workshop: Bob Marinier 1 Overview The way Soar and its projects were organized Changes

More information

PhonBank" Behind the Scenes. Carla Peddle"

PhonBank Behind the Scenes. Carla Peddle PhonBank" Behind the Scenes Carla Peddle" PhonBank: Behind the Scenes" Outline" Sneak peak into what goes on behind the scenes of PhonBank" Accomplishments we have made" Challenges we face; and" Improvements

More information

XII International PhD Workshop OWD 2010, October Efficient Diphone Database Creation for MBROLA, a Multilingual Speech Synthesiser

XII International PhD Workshop OWD 2010, October Efficient Diphone Database Creation for MBROLA, a Multilingual Speech Synthesiser XII International PhD Workshop OWD 2010, 23 26 October 2010 Efficient Diphone Database Creation for MBROLA, a Multilingual Speech Synthesiser Jolanta Bachan, Institute of Linguistics, Adam Mickiewicz University

More information

BUILDING CORPORA OF TRANSCRIBED SPEECH FROM OPEN ACCESS SOURCES

BUILDING CORPORA OF TRANSCRIBED SPEECH FROM OPEN ACCESS SOURCES BUILDING CORPORA OF TRANSCRIBED SPEECH FROM OPEN ACCESS SOURCES O.O. Iakushkin a, G.A. Fedoseev, A.S. Shaleva, O.S. Sedova Saint Petersburg State University, 7/9 Universitetskaya nab., St. Petersburg,

More information

Building a Unit Selection Synthesis Voice

Building a Unit Selection Synthesis Voice Building a Unit Selection Synthesis Voice Version 04/03/18, 02:36:09 PM Steps Prepare the database get speech data (and the accompanying text) annotate database using the Aligner convert to Festival format

More information

LING203: Corpus. March 9, 2009

LING203: Corpus. March 9, 2009 LING203: Corpus March 9, 2009 Corpus A collection of machine readable texts SJSU LLD have many corpora http://linguistics.sjsu.edu/bin/view/public/chltcorpora Each corpus has a link to a description page

More information

Digital Humanities. Tutorial Regular Expressions. March 10, 2014

Digital Humanities. Tutorial Regular Expressions. March 10, 2014 Digital Humanities Tutorial Regular Expressions March 10, 2014 1 Introduction In this tutorial we will look at a powerful technique, called regular expressions, to search for specific patterns in corpora.

More information

EUDICO, Annotation and Exploitation of Multi Media Corpora over the Internet

EUDICO, Annotation and Exploitation of Multi Media Corpora over the Internet EUDICO, Annotation and Exploitation of Multi Media Corpora over the Internet Hennie Brugman, Albert Russel, Daan Broeder, Peter Wittenburg Max Planck Institute for Psycholinguistics P.O. Box 310, 6500

More information

AudioGate version Release Information (Windows)

AudioGate version Release Information (Windows) AudioGate version 1.5.0 Release Information (Windows) Release Notes Changes and revisions in v1.5.0 from v1.0.1 - Added support for MR project files. AudioGate can now read MR project files directly by

More information

Speech Recognition. Project: Phone Recognition using Sphinx. Chia-Ho Ling. Sunya Santananchai. Professor: Dr. Kepuska

Speech Recognition. Project: Phone Recognition using Sphinx. Chia-Ho Ling. Sunya Santananchai. Professor: Dr. Kepuska Speech Recognition Project: Phone Recognition using Sphinx Chia-Ho Ling Sunya Santananchai Professor: Dr. Kepuska Objective Use speech data corpora to build a model using CMU Sphinx.Apply a built model

More information

Qtractor. An Audio/MIDI multi-track sequencer. Rui Nuno Capela rncbc.org.

Qtractor. An Audio/MIDI multi-track sequencer. Rui Nuno Capela rncbc.org. ENOS08@isep.ipp.pt Qtractor An Audio/MIDI multi-track sequencer Rui Nuno Capela rncbc.org http://qtractor.sourceforge.net September 2008 What is Qtractor? (1) Yet another Audio / MIDI sequencer? Multi-track

More information

Revision Control II. - svn

Revision Control II. - svn Revision Control II. - svn Tomáš Kalibera, Peter Libič Department of Distributed and Dependable Systems http://d3s.mff.cuni.cz CHARLES UNIVERSITY PRAGUE Faculty of Mathematics and Physics Subversion Whole

More information

Evaluation Board Quick Start

Evaluation Board Quick Start Publication: QS/PE0601-7262/1 CML Microcircuits COMMUNICATION SEMICONDUCTORS Evaluation Board Quick Start PE0601-7262 1 Introduction Thank you for your interest in the PE0601-7262 Evaluation Board. This

More information

Audio issues in MIR evaluation

Audio issues in MIR evaluation Audio issues in MIR evaluation Overview of audio formats Preferred presentation of audio files in an MIR testbed A set of simple recommendations Audio Formats I 1. Apple AIFF (Audio Interchange File Format)

More information

Modeling Coarticulation in Continuous Speech

Modeling Coarticulation in Continuous Speech ing in Oregon Health & Science University Center for Spoken Language Understanding December 16, 2013 Outline in 1 2 3 4 5 2 / 40 in is the influence of one phoneme on another Figure: of coarticulation

More information

3 Sound / Audio. CS 5513 Multimedia Systems Spring 2009 LECTURE. Imran Ihsan Principal Design Consultant

3 Sound / Audio. CS 5513 Multimedia Systems Spring 2009 LECTURE. Imran Ihsan Principal Design Consultant LECTURE 3 Sound / Audio CS 5513 Multimedia Systems Spring 2009 Imran Ihsan Principal Design Consultant OPUSVII www.opuseven.com Faculty of Engineering & Applied Sciences 1. The Nature of Sound Sound is

More information

Binary Markup Toolkit Quick Start Guide Release v November 2016

Binary Markup Toolkit Quick Start Guide Release v November 2016 Binary Markup Toolkit Quick Start Guide Release v1.0.0.1 November 2016 Overview Binary Markup Toolkit (BMTK) is a suite of software tools for working with Binary Markup Language (BML). BMTK includes tools

More information

DIGITIZING ANALOG AUDIO SOURCES USING AUDACITY

DIGITIZING ANALOG AUDIO SOURCES USING AUDACITY DIGITIZING ANALOG AUDIO SOURCES USING AUDACITY INTRODUCTION There are many ways to digitize and edit audio, all of which are dependant on the hardware and software used. This workflow provides instructions

More information

ATCOSIM - Air Traffic Control Simulation Speech Corpus Validation Report

ATCOSIM - Air Traffic Control Simulation Speech Corpus Validation Report ATCOSIM - Air Traffic Control Simulation Speech Corpus Validation Report Stefan Petrik October 31, 2007 Abstract The ATCOSIM speech corpus provided by Eurocontrol Experimental Centre has been validated

More information

Field Methods (ling404): Workflow

Field Methods (ling404): Workflow (ling404): Workflow Most of our work this semester will follow a regular pattern of elicitation transcription dictionary editing. This document lays out how, in detail, our work-week will proceed. 1. Broad

More information

Database of historical places, persons, and lemmas

Database of historical places, persons, and lemmas Database of historical places, persons, and lemmas Natalia Korchagina Outline 1. Introduction 1.1 Swiss Law Sources Foundation as a Digital Humanities project 1.2 Data to be stored 1.3 Final goal: how

More information

Finding (extracting) and coding tokens

Finding (extracting) and coding tokens coding_with_elan.doc February 15, 2010 (NGN) 1 Finding (extracting) and coding tokens Option 1: Word.doc and Excel spreadsheet (traditional) HLVC recordings have been transcribed in ELAN. You can Export

More information

Creating Codes with Spreadsheet Upload

Creating Codes with Spreadsheet Upload Creating Codes with Spreadsheet Upload In order to create a code, you must first have a group, prefix and account set up and associated to each other. This document will provide instructions on creating

More information

Enhanced ELAN functionality for sign language corpora

Enhanced ELAN functionality for sign language corpora Enhanced ELAN functionality for sign language corpora Onno Crasborn, Han Sloetjes Department of Linguistics, Radboud University Nijmegen PO Box 9103, NL-6500 HD Nijmegen, The Netherlands Max Planck Institute

More information

NEMO Reformating tool

NEMO Reformating tool NEMO Reformating tool Michèle Fichaut- Ifremer NEMO [current version 1.6.3] Can be downloaded from SeaDataNet Web site https://www.seadatanet.org/software/nemo Written in Java Language (Version >= 1.7)

More information

USER GUIDE FOR PREDICTION ERROR METHOD OF ADAPTIVE FEEDBACK CANCELLER ON ios PLATFORM FOR HEARING AID APPLICATIONS

USER GUIDE FOR PREDICTION ERROR METHOD OF ADAPTIVE FEEDBACK CANCELLER ON ios PLATFORM FOR HEARING AID APPLICATIONS Page 1 of 13 USER GUIDE FOR PREDICTION ERROR METHOD OF ADAPTIVE FEEDBACK CANCELLER ON ios PLATFORM FOR HEARING AID APPLICATIONS Parth Mishra, Anshuman Ganguly, Nikhil Shankar STATISTICAL SIGNAL PROCESSING

More information

Multimodal Transcription Software Programmes

Multimodal Transcription Software Programmes CAPD / CUROP 1 Multimodal Transcription Software Programmes ANVIL Anvil ChronoViz CLAN ELAN EXMARaLDA Praat Transana ANVIL describes itself as a video annotation tool. It allows for information to be coded

More information

f4analyse - Distinguishing features

f4analyse - Distinguishing features f4analyse - Distinguishing features This document is intended to be read in conjunction with the Choosing a CAQDAS Package Working Paper which provides a more general commentary of common CAQDAS functionality.

More information

Utilising ANNIS for search and analysis of historical data

Utilising ANNIS for search and analysis of historical data Utilising ANNIS for search and analysis of historical data Stephan Druskat Thomas Krause Carolin Odebrecht Institut für deutsche Sprache und Linguistik Humboldt-Universität zu Berlin Reuse or New Development:

More information

Release Notes. MYOB AccountEdge Pro v15 MYOB AccountEdge Network Edition v15

Release Notes. MYOB AccountEdge Pro v15 MYOB AccountEdge Network Edition v15 Release Notes MYOB AccountEdge Pro v15 MYOB AccountEdge Network Edition v15 AccountEdge has all the features you are familiar with, as well as new additional functions to make managing your business easier.

More information

Dexterity: Data Exchange Tools and Standards for Social Sciences

Dexterity: Data Exchange Tools and Standards for Social Sciences Dexterity: Data Exchange Tools and Standards for Social Sciences Louise Corti, Herve L Hours, Matthew Woollard (UKDA) Arofan Gregory, Pascal Heus (ODaF) I-Pres, 29-30 September 2008, London Introduction

More information

ATLAS.ti 6 Distinguishing features and functions

ATLAS.ti 6 Distinguishing features and functions ATLAS.ti 6 Distinguishing features and functions This document is intended to be read in conjunction with the Choosing a CAQDAS Package Working Paper which provides a more general commentary of common

More information

Accessible and Usable PDF Documents: Techniques for Document Authors Fourth Edition

Accessible and Usable PDF Documents: Techniques for Document Authors Fourth Edition Accessible and Usable PDF Documents: Techniques for Document Authors Fourth Edition Karen McCall, M.Ed. Contents From the Author... 4 Dedication... 4 Introduction... 20 What is PDF?... 21 History of PDF

More information

Antje Schweitzer, Nov. 2013, revised Sep. 2014, June 2015, Nov. 2015, Dec. 2015, Nov 2016

Antje Schweitzer, Nov. 2013, revised Sep. 2014, June 2015, Nov. 2015, Dec. 2015, Nov 2016 Praat Scripting Antje Schweitzer, Nov. 2013, revised Sep. 2014, June 2015, Nov. 2015, Dec. 2015, Nov 2016 antje.schweitzer@ims.uni-stuttgart.de Version: November 28, 2016-11:11 Intended users This tutorial

More information

Release Notes. KeyView Filter SDK. Contents. Version Revision 0

Release Notes. KeyView Filter SDK. Contents. Version Revision 0 R E L E A S E N O T E S KeyView Filter SDK Version 10.23 Release Notes Revision 0 This document describes new features and resolved issues for KeyView Filter SDK 10.23. You can retrieve the latest available

More information

Data Science Services Dirk Engfer Page 1 of 5

Data Science Services Dirk Engfer Page 1 of 5 Page 1 of 5 Services SAS programming Conform to CDISC SDTM and ADaM within clinical trials. Create textual outputs (tables, listings) and graphical output. Establish SAS macros for repetitive tasks and

More information

Some New Developments at the FSO - Service-based web publishing - Storytelling in the time of tablets - Interactive visualisation: New Atlas - M2M

Some New Developments at the FSO - Service-based web publishing - Storytelling in the time of tablets - Interactive visualisation: New Atlas - M2M Output Paths Some New Developments at the FSO - Service-based web publishing - Storytelling in the time of tablets - Interactive visualisation: New Atlas - M2M Armin Grossenbacher, November 2014 4 Questions

More information

YourBell INEXPENSIVE, RELIABLE USB PRODUCTS. 1

YourBell INEXPENSIVE, RELIABLE USB PRODUCTS.   1 YourBell INEXPENSIVE, RELIABLE USB PRODUCTS www.bcsideas.com 1 Table of Contents General Information...3 Installation...3 Software...3 Hardware...5 Figure 2 Hardware Over View 5 Figure 3 Preferred Wiring

More information

The Basics. As of December 12, 2016

The Basics. As of December 12, 2016 The Basics As of December 12, 2016 Accessing REDCap 1. To access REDCap, enter the URL into your internet browser: https://redcap.wakehealth.edu/ 2. Login using your Medical Center ID and password 3. FAQ

More information

Data for linguistics ALEXIS DIMITRIADIS. Contents First Last Prev Next Back Close Quit

Data for linguistics ALEXIS DIMITRIADIS. Contents First Last Prev Next Back Close Quit Data for linguistics ALEXIS DIMITRIADIS Text, corpora, and data in the wild 1. Where does language data come from? The usual: Introspection, questionnaires, etc. Corpora, suited to the domain of study:

More information

NOVEL IMPLEMENTATION OF SEARCH ENGINE FOR TELUGU DOCUMENTS WITH SYLLABLE N- GRAM MODEL

NOVEL IMPLEMENTATION OF SEARCH ENGINE FOR TELUGU DOCUMENTS WITH SYLLABLE N- GRAM MODEL NOVEL IMPLEMENTATION OF SEARCH ENGINE FOR TELUGU DOCUMENTS WITH SYLLABLE N- GRAM MODEL DR.B.PADMAJA RANI* AND DR.A.VINAY BABU 1 *Associate Professor Department of CSE JNTUCEH Hyderabad A.P. India http://jntuceh.ac.in/csstaff.htm

More information

Web 2.0 and iphone Application Development Workshop. Lab 5: Multimedia on iphone

Web 2.0 and iphone Application Development Workshop. Lab 5: Multimedia on iphone Web 2.0 and iphone Application Development Workshop This lab is prepared by: Department of Electrical and Electronic Engineering, Faculty of Engineering, The University of Hong Kong Lab 5: Multimedia on

More information

SenSyn Speech Synthesizer Package SenSyn UNIX

SenSyn Speech Synthesizer Package SenSyn UNIX SenSyn Speech Synthesizer Package SenSyn UNIX SENSIMETRICS Sensimetrics Corporation 48 Grove Street Suite 305 Somerville, MA 02144 Tel: 617.625.0600 Fax: 617.625.6612 Web: www.sens.com Email: sensimetrics@sens.com

More information

Activity 1: Activity 2: Activity 3:

Activity 1:   Activity 2:   Activity 3: In case you want to follow along Activity 1: http://bit.ly/1ctsgpf Activity 2: http://bit.ly/1gstpi3 Activity 3: http://bit.ly/1ajw8if Bertram Lyons Jason Evans Groth MAC 2015 Lexington, Kentucky Not Everything

More information

SUPPORT VECTOR MACHINES FOR THAI PHONEME RECOGNITION

SUPPORT VECTOR MACHINES FOR THAI PHONEME RECOGNITION International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems Vol. 0, No. 0 (199) 000 000 c World Scientific Publishing Company SUPPORT VECTOR MACHINES FOR THAI PHONEME RECOGNITION NUTTAKORN

More information

FANTOM: Functional and Taxonomic Analysis of Metagenomes

FANTOM: Functional and Taxonomic Analysis of Metagenomes FANTOM: Functional and Taxonomic Analysis of Metagenomes User Manual 1- FANTOM Introduction: a. What is FANTOM? FANTOM is an exploratory and comparative analysis tool for Metagenomic samples. b. What is

More information

Information Technology - Coding of Audiovisual Objects Part 3: Audio

Information Technology - Coding of Audiovisual Objects Part 3: Audio ISO/IEC CD 14496-3TTS ÃISO/IEC ISO/IEC JTC 1/SC 29 N 2203 Date:1997-10-31 ISO/IEC CD 14496-3 Subpart 6 ISO/IEC JTC 1/SC 29/WG 11 Secretariat: Information Technology - Coding of Audiovisual Objects Part

More information

Meeting Visuals UCF Toolkit User Guide

Meeting Visuals UCF Toolkit User Guide Meeting Visuals UCF Toolkit User Guide We provide Meeting Visuals web conferencing services. Because Meeting Visuals is powered by WebEx, this guide makes several references to the company name, platform

More information

Module 1: Information Extraction

Module 1: Information Extraction Module 1: Information Extraction Introduction to GATE Developer The University of Sheffield, 1995-2014 This work is licenced under the Creative Commons Attribution-NonCommercial-ShareAlike Licence About

More information

Outline. Group project Sagittal diagram Intro to Praat: basics Praat exercise HW 2

Outline. Group project Sagittal diagram Intro to Praat: basics Praat exercise HW 2 L541 Lab Week 2 Outline Group project Sagittal diagram Intro to Praat: basics Praat exercise HW 2 Group Project Draft sketch (i.e. project ideas) is due at 5pm next Friday (1/31). Email both Prof. de Jong

More information