arxiv: v1 [cs.dl] 25 Apr 2014

Similar documents
Publishing Math Lecture Notes as Linked Data

Reimplementing the Mathematics Subject Classification (MSC) as a Linked Open Dataset

OpenMath: Objectives Accomplished

On the relationship between OpenMath and MathML

An RDF NetAPI. Andy Seaborne. Hewlett-Packard Laboratories, Bristol

Introducing live graphics gems to educational material

smultiling.sty: Multilinguality Support for

Faceted Search for Mathematics

Open XML Requirements Specifications, a Xylia based application

Web Standards Mastering HTML5, CSS3, and XML

Introduction to XML. Asst. Prof. Dr. Kanda Runapongsa Saikaew Dept. of Computer Engineering Khon Kaen University

MathWebSearch 0.5: Scaling an Open Formula Search Engine

Introduction to XML 3/14/12. Introduction to XML

EBS goes social - The triumvirate Liferay, Application Express and EBS

MathWebSearch at NTCIR-11: Keywords, Frontend, & Scalability

MMT Objects. Florian Rabe. Computer Science, Jacobs University, Bremen, Germany

THE OUTLOOK FOR MATHEMATICS ON THE WEB

BLAHTEXML and multi-target document generation *

MathWebSearch at NTCIR-11

A JavaScript Framework for Presentations and Animations on Computer Science

A tutorial report for SENG Agent Based Software Engineering. Course Instructor: Dr. Behrouz H. Far. XML Tutorial.

Formal editing: jedit-mmt. Narrative editing: LaTeX-MMT. Browsing: MMT web server. Building: MMT scripting language. The MMT API: A Generic MKM System

Smart Pasting for ActiveMath Authoring

Publishing Technology 101 A Journal Publishing Primer. Mike Hepp Director, Technology Strategy Dartmouth Journal Services

XML: the document format of the future?

arxiv: v1 [cs.lo] 13 Jun 2013

Contents. Markup Language and the need of XML. Using environment XML and growth direction. To understand dxml standard.

Designing a Semantic Ground Truth for Mathematical Formulas

Delivery Options: Attend face-to-face in the classroom or remote-live attendance.

NTCIR-12 MathIR Task Wikipedia Corpus (v0.2.1)

Extensible Markup Language (XML) Hamid Zarrabi-Zadeh Web Programming Fall 2013

Towards P5. Lou Burnard Sebastian Rahtz Syd Bauman November Towards P5 1

The Adobe XML Architecture

Create web pages in HTML with a text editor, following the rules of XHTML syntax and using appropriate HTML tags Create a web page that includes

WME MathEdit. An initial report on the WME tool for creating & editing mathematics. by K. Cem Karadeniz

20480C: Programming in HTML5 with JavaScript and CSS3. Course Code: 20480C; Duration: 5 days; Instructor-led. JavaScript code.

Knowledge-poor Interpretation of Mathematical Expressions in Context

Automatic Approach to Understanding Mathematical Expressions Using MathML Parallel Markup Corpora

Agenda. XML Generics. XML for Java Developers G Session 1 - Main Theme Markup Language Technologies (Part I)

Extreme Java G Session 3 - Sub-Topic 5 XML Information Rendering. Dr. Jean-Claude Franchitti

The Prickly Pear Archive: A Portable Hypermedia for Scholarly Publication

Quantum, a Data Storage Solutions Leader, Delivers Responsive HTML5-Based Documentation Centers Using MadCap Flare

JAVA-Based XML Utility for the NIST Machine Tool Data Repository

References differences between SVG 1.1 Full and SVG 1.2 Tiny

Quizzes for TopModCS Spring 2016

Hospital System Lowers IT Costs After Epic Migration Flatirons Digital Innovations, Inc. All rights reserved.

Informatics 1: Data & Analysis

XML Overview, part 1

Delivery Options: Attend face-to-face in the classroom or via remote-live attendance.

Structured documents

ICD Wiki Framework for Enabling Semantic Web Service Definition and Orchestration

Jay Lofstead under the direction of Calton Pu

SERVICE PACK 12 FEATURE GUIDE FOR END-USERS. Updated for GRCC on August 22, 2013

OOoCon XML For The Massses An Open Office XML File Format by Michael Brauer

Prototype of Automated PLC Model Checking Using Continuous Integration Tools CERN Summer Student Report

Use OMDoc Representing Geometric Algebra

sref.sty: Semantic Cross-Referencing in L A TEX

Embracing HTML5 CSS </> JS javascript AJAX. A Piece of the Document Viewing Puzzle

x ide xml Integrated Development Environment Specifications Document 1 Project Description 2 Specifi fications

International Journal of Research in Advent Technology Available Online at:

Metadata in the Driver's Seat: The Nokia Metia Framework

1 Version management tools as a basis for integrating Product Derivation and Software Product Families

Etanova Enterprise Solutions

Formula Semantification and Automated Relation Finding in the On-line Encyclopedia for Integer Sequences2

Simile Tools Workshop Summary MacKenzie Smith, MIT Libraries

Life, the Universe, and CSS Tests XML Prague 2018

XML-based production of Eurostat publications

Creating a System for the Online Delivery of Oral History Content

Creating a System for the Online Delivery of Oral History Content

An Archiving System for Managing Evolution in the Data Web

SVG for Displaying OpenMath and MathML Formulae

smt-lib in xml clothes

Agenda. Summary of Previous Session. XML for Java Developers G Session 7 - Main Theme XML Information Rendering (Part II)

Effective Team Collaboration with Simulink

White Paper. Backup and Recovery Challenges with SharePoint. By Martin Tuip. October Mimosa Systems, Inc.

Some more XML applications and XML-related standards (XLink, XPointer, XForms)

Natural Language and Mathematics Processing for Applicable Theorem Search. Ștefan Anca

XML for Java Developers G Session 8 - Main Theme XML Information Rendering (Part II) Dr. Jean-Claude Franchitti

Activity Report at SYSTRAN S.A.

MICROSOFT VISUAL STUDIO 2010 Overview

0.1 Induction Challenge OMDoc Manager (ICOM)

Proposal for Implementing Linked Open Data on Libraries Catalogue

ACCESSIBLE MATH IN DESIRE2LEARN

University of Denver Denver, CO April 27, 2012

Browsing the Semantic Web

QUALIBETA at the NTCIR-11 Math 2 Task: An Attempt to Query Math Collections

ON TWO ADAPTIVE SYSTEMS FOR DOCUMENT MANAGEMENT * Vanyo G. Peychev, Ivo I. Damyanov

WME MathEdit. An initial report on the WME tool for creating & editing mathematics. by K. Cem Karadeniz

strategy IT Str a 2020 tegy

An Architecture for Linguistic and Semantic Analysis on the ARXMLIV Corpus

ALOE - A Socially Aware Learning Resource and Metadata Hub

Course Details. Skills Gained. Who Can Benefit. Prerequisites. View Online URL:

The Now Platform Reference Guide

D6.1. Project website and internal IT communication infrastructure HINT. 36 months FP7/

Automating Publishing Workflows through Standardization. XML Publishing with SDL

Programming Technologies for Web Resource Mining

extensible Markup Language

XML Update. Royal Society of the Arts London, December 8, Jon Bosak Sun Microsystems

Git! Fundamentals. IT Pro Roundtable! June 17, 2014!! Justin Elliott! ITS / TLT! Classroom and Lab Computing!! Michael Potter!

CodeValue. C ollege. Prerequisites: Basic knowledge of web development and especially JavaScript.

Transcription:

L A TExml 2012 A Year of L A TExml Deyan Ginev 1 and Bruce R. Miller 2 1 Computer Science, Jacobs University Bremen, Germany 2 National Institute of Standards and Technology, Gaithersburg, MD, USA arxiv:1404.6549v1 [cs.dl] 25 Apr 2014 Abstract. L A TExml, a TEX to XML converter, is being used in a wide range of MKM applications. In this paper, we present a progress report for the 2012 calendar year. Noteworthy enhancements include: increased coverage such as Wikipedia syntax; enhanced capabilities such as embeddable JavaScript and CSS resources and RDFa support; a web service for remote processing via web-sockets; along with general accuracy and reliability improvements. The outlook for an 0.8.0 release in mid-2013 is also discussed. 1 Introduction L A TExml [Mil] is a TEX to XML converter, bringing the well-known authoring syntax of TEX and L A TEX to the world of XML. Not a new face in the MKM crowd, L A TExml has been adopted in a wide range of MKM applications. Originally designed to support the development of NIST s Digital Library of Mathematical Functions (DLMF), it is now employed in publishing frameworks, authoring suites and for the preparation of a number of large-scale TEX corpora. In this paper, we present a progress report for the 2012 calendar year of L A TExml s master and development branches. In 2012, the L A TExml Subversion repository saw 30% of the total project commits since 2006. Currently,thetwoauthorsmaintainadeveloperandmasterbranchofL A TExml, respectively. The main branch contains all mature features of L A TExml. 2 Main Development Trunk L A TExml s processing model can be broken down into two phases: the basic conversion transforms the TEX/L A TEX markup into a L A TEX-like XML schema; a post-processing phase converts that XML into the target format, usually some format in the HTML family. The following sections highlight the progress made in support for these areas. 2.1 Document Conversion There has been a great deal of general progress in L A TExml s processing: the fidelity of TEX and L A TEX simulation is much improved; the set of control sequences covered is more complete. The I/O code has been reorganized to more

closely track TEX s behavior and to use a more consistent path searching logic. It also provides opportunities for more security hardening, while allowing flexibility regarding the data sources, needed by the planned web-services. Together these changes allow the direct processing of many more raw style files directly from the TEX installation (i.e., not requiring a specific L A TExml binding). This mechanism is, in fact, now used for loading input encoding definitions and multilanguage support (babel). Additionally, it provides a better infrastructure for stex. The support for colors and graphics has been enhanced, with a more complete color model that captures the capabilities of the xcolor package and a move towards generation of native SVG [FFJ03]. A summer student, Silviu Oprea, now at Oxford, developed a remarkable draft implementation supporting the conversion of pgf and tikz graphics markup into SVG; this code will be integrated into the 0.8 release. Native support for RDFa has been added to the schema, along with an optional package, lxrdfa, allowing the embedding of the semantic annotations within the TEX document. Various other L A TEX packages have also been implemented: cancel, epigraph. Additionally, the texvc package provides for the emulation of the texvc program used by Wikipedia for processing math markup; this allows L A TExml to be used to generate MathML from the existing wiki markup. 2.2 Document Post-Processing The conversion of the internal math representation to common external formats such as MathML and OpenMath has been improved. In particular, the framework fully supports parallel math markup with cross-referencing between the alternative formats. Thus presentation and content MathML can be enclosed within a m:semantics element, with the corresponding m:mi and m:ci tokens connected to each other via id and xref attributes. The evolution of MathML version 3 has also been tracked, as well as the current trends in implementations. Thus, we have shifted towards generating SMP (Supplemental Multilingual Plane, or Plane 1) Unicode and avoiding the m:mfenced element. Content MathML generation has been improved, particularly to cover the common (with L A TExml) situation where the true semantics are imperfectly recognized. Finally, a comprehensive overhaul of the XSLT processing was carried out which avoids the divergence between generation of the various HTML family of markup. The stylesheets are highly parameterized so that they are both more general, and yet allow generation of HTML5 specific markup; they should allow extension to further HTML-like applications like epub. Command-line options make these parameters available to the user. While the stylesheets are much more consistent and modular, allowing easy extension and customization, other changes lessen the need to customize. The set of CSS class names have been made much more consistent and predictable, if somewhat verbose, so that it should be easier for users to style the generated

HTML as they wish. Additionally, a resource element has been defined which allows binding developers to request certain CSS or JavaScript files or fragments to be associated with the document. A converted AMS article, now finally looks (somewhat) like an AMS article! 2.3 Unification Although the separation of the conversion and post-processing phases is a natural one from the developer s document processing point of view, it is sometimes artificial to users. Moreover, keeping the phases too far separated inhibits interesting applications, such as envisioned by the Daemon (see section 3) and automated document processing systems such as the one used for arxmliv. Thus, we have undertaken to bring all processing back under a single, consistent, umbrella, whether running in command-line mode, or in client/server mode. The goal is to simplify the common use-case of converting a single document to HTML, while still enabling the injection of intermediate processing. Some steps in that direction include more consistent error reporting at all phases of processing, with embedded locator information so that the original source of an error can (usually) be located in the source. Additionally, logs include the current SVN revision number to better enable tracking and fixing bugs. 3 Daemon Experimental Branch The Daemon branch [Gina] hosts experimental developments, primarily the development of client/server modules that support web services, optimize processing and improve the integration with external applications. Since the last report in CICM s S&P track [GSK11], the focus has fallen on increasing usability, security and robustness. The daemonized processing matured into a pair of robust HTTP servers, one optimized for local batch conversion jobs, the other for a real-time web-service, and aturnkeyclient executable that incorporatesallshapes andsizes ofl A TExml processing. Showing a commitment to maintaining prominent conversion scenarios, shorthand user-defined profiles were introduced in order to simplify complex L A TExml configurations, e.g. those of stex and PlanetMath[Pla]. An internal redesign of the configurationsetup and option handling of L A TExml contributed to facilitating these changes and promises a consistent internal API for supporting both the core and post-processing conversion phases. The RESTful [Fie00] web service offered via the Mojolicious [Rie] web framework now also supports multi-file L A TEX manuscripts via a ZIP archive workflow, also facilitated by an upload interface. Furthermore, the built-in web editor and showcase [Ginb] is available through a websocket route and enjoys an expanded list of examples, such as a L A TEX Turing machine and a PSTricks graphic. A significant new experimental feature is the addition of an ambiguous grammar for mathematical formulas. Based on Marpa [Keg], an efficient Earley-style

parser, the grammar embraces the common cases of ambiguity in mathematical expressions, e.g. that induced by invisible operators and overloaded operator symbols, in an attempt to set the stage for disambiguation to a correct operator tree. The current grammar in the main development trunk is heuristically geared to unambiguously recognize the mathematical formulas commonly used in DLMF and parts of arxiv. The long-term goal is for the ambiguous grammar to meet parity in coverage and implement advanced semantic techniques in order to establish the correct operator trees in a large variety of scientific domains. It is anticipated that the bulk of these developments will be merged back into the main trunk for the 0.8 release. The new ambiguous grammar and Mojolicious web service are two notable exceptions, which will not make master prior to the 0.9 release. 4 Outlook Although development was never stagnated, an official release is long overdue; a L A TExml 0.8 release is planned for mid-2013. It will incorporate the enhancements presented here: support for several L A TEX graphics packages, such as Tikz and Xypic; an overhauled XSLT and CSS styling framework; and a merge of daemonized processing to the master branch. References [FFJ03] Jon Ferraiolo, Jun Fujisawa, and Dean Jackson. Scalable Vector Graphics (SVG) 1.1 Specification. W3C Recommendation. World Wide Web Consortium (W3C), Jan. 14, 2003. url: http://www.w3.org/tr/ 2008/REC-SVG11-20030114/. [Fie00] Roy T. Fielding. Architectural Styles and the Design of Networkbased Software Architectures. PhD thesis. University of California, Irvine, 2000. url: http://www.ics.uci.edu/~fielding/pubs/ dissertation/top.htm. [Gina] DeyanGinev.LaTeXML:A L A TEX toxml Converter, arxmliv branch. url: https://svn.mathweb.org/repos/latexml/branches/ arxmliv (visited on Mar. 12, 2013). [Ginb] Deyan Ginev. The L A TEXML Web Showcase. url: http://latexml. mathweb.org/editor (visited on Mar. 12, 2013). [GSK11] Deyan Ginev, Heinrich Stamerjohanns, and Michael Kohlhase. The L A TEXML Daemon: Editable Math on the Collaborative Web. In: Intelligent Computer Mathematics. Ed. by James Davenport et al. LNAI 6824. Springer Verlag, 2011, pp. 292 294. isbn: 978-3-642-22672-4. url: https://svn.kwarc.info/repos/arxmliv/doc/cicmsystems11/paper.pdf. [Keg] Jeffrey Kegler. Marpa, A Practical General Parser. System homepage at http://jeffreykegler.github.com/marpa-web-site/.

[Mil] [Pla] [Rie] Bruce Miller. LaTeXML: A L A TEX to XML Converter. url: http:// dlmf.nist.gov/latexml/ (visited on Mar. 12, 2013). PlanetMath.org Math for the people, by the people. http://www. planetmath.org. seen March 2013. url: http://planetmath.org. Sebastian Riedel. Mojolicious - Perl real-time web framework. System homepage at http://mojolicio.us/.