polyglot Documentation Release Dave Young

Similar documents
The RestructuredText Book Documentation

Kindle Formatting Guide

BanzaiDB Documentation

Django-CSP Documentation

Moving ROOT Documentation from Docbook to Markdown

Software Development I

Archan. Release 2.0.1

Flask-Misaka Documentation

Unlearning the Internet: Submitting Your Log

Pulp Python Support Documentation

Installation and Introduction to Jupyter & RStudio

Sphinx prototype Documentation

More about HTML. Digging in a little deeper

IQReport Documentation

Reproducible Research with R and RStudio

odpdown - markdown to slides

DOWNLOAD OR READ : TOOLS FOR TEXT AND IMAGE ANALYSIS AN INTRODUCTION TO APPLIED SEMIOTICS PDF EBOOK EPUB MOBI

Beginners Guide to Snippet Master PRO

USER GUIDE MADCAP DOC-TO-HELP 5. Getting Started

CircuitPython with Jupyter Notebooks

MadCap Software. What's New Guide. Lingo 10

DOWNLOAD OR READ : QUICKSTEPS PDF EBOOK EPUB MOBI

Joomla Website User Guide

Inception Cloud User s Guide

e24paymentpipe Documentation

Installation Guide CSA Group Reader App for Windows Desktop

Application documentation Documentation

Beginning Web Site Design

docs-python2readthedocs Documentation

micawber Documentation

HTML TIPS FOR DESIGNING.

To create, upload, share, or view shared files through Google Apps, go to Documents in the black bar above.

catsup Documentation Release whtsky

How To Guide. Hannon Hill Corporation for Rowan University

Adobe RoboHelp (2019 release)

goose3 Documentation Release maintainers

ABBYY FineReader 14. User s Guide ABBYY Production LLC. All rights reserved.

RC Justified Gallery User guide for version 3.2.X. Last modified: 06/09/2016

Microsoft Word Tutorial

MadCap Flare Training

Tutorial. Activities. Code o o. Editor: Notepad Focus : Text manipulation & webpage skeleton. Open Notepad

LIFX NodeServer Documentation

powershell for beginners start from absolute zero and learn to use the windows powershell as it was meant to be used

Version control. what is version control? setting up Git simple command-line usage Git basics

Mantis STIX Importer Documentation

Django with Python Course Catalog

DOWNLOAD OR READ : WORD 6 FOR WINDOWS ESSENTIALS PDF EBOOK EPUB MOBI

Overview 14 Table Definitions and Style Definitions 16 Output Objects and Output Destinations 18 ODS References and Resources 20

User Experience Modern, intuitive interface (OS X) No No No Yes Yes Yes. Modern, intuitive interface (Windows) No No No No No Yes

Creating and Building Websites

Figure 1 Properties panel, HTML mode

DOWNLOAD OR READ : WIKIPEDIA EDITING GUIDE PDF EBOOK EPUB MOBI

INTRODUCTION TO HTML5! HTML5 Page Structure!

The Printer Out plugin PRINTED MANUAL

Workshare Compare Server 9.5.2

Dolphin Dynamics. Enhancing Your Customer Facing Documents

Code::Blocks Student Manual

Created by: Nicolas Melillo 4/2/2017 Elastic Beanstalk Free Tier Deployment Instructions 2017

TangeloHub Documentation

GRAPHIC WEB DESIGNER PROGRAM

Enhancing Your Customer Facing Documents (updated 10/03/2017)

ebook Production Jumpstart

ME30_Lab1_18JUL18. August 29, ME 30 Lab 1 - Introduction to Anaconda, JupyterLab, and Python

Quark XML Author June 2017 Update for Platform with DITA

DOWNLOAD OR READ : WINDOWS 8 AND OFFICE 2013 FOR DUMMIES BOOK 2 DVD BUNDLE PDF EBOOK EPUB MOBI

Make sure to mark it private. Make sure to make it a Mercurial one and not a Git one.

Nbconvert Refactor Final 1.0

Python wrapper for Viscosity.app Documentation

polib Documentation Release David Jean Louis

xmljson Documentation

django-idioticon Documentation

LECTURE 26. Documenting Python

Emptying the trash...18 Webmail Settings...19 Displayed Name...19 Sort by...19 Default font style...19 Service language...

Python Project Example Documentation

Hg Mercurial Cheat Sheet

Microsoft Word 2007 on Windows

Airoscript-ng Documentation

Recommonmark Documentation

Signals Documentation

How to Export Your Book as epub and Mobi file formats with Microsoft Word and Calibre

Guidelines for doing the short exercises

FIT 100 LAB Activity 3: Constructing HTML Documents

Getting started with New Proquest RefWorks

Report Commander 2 User Guide

django-dynamic-db-router Documentation

Python simple arp table reader Documentation

Drupal Cloud Getting Started Guide Creating a Lab site with the MIT DLC Theme

FROM 4D WRITE TO 4D WRITE PRO INTRODUCTION. Presented by: Achim W. Peschke

GUIDE TO USING OU CAMPUS REGIONS. Office of Public Relations and Marketing

How to add content to your course

Introduction to the MODx Manager

DOWNLOAD PDF WHAT IS OPEN EBOOK

Academic Affairs Information System URIBO-NET User s Manual

Job Aid. Remote Access BAIRS Printing and Saving a Report. Table of Contents

Brandon Rhodes Python Atlanta, July 2009

Programmazione Web a.a. 2017/2018 HTML5

CREATING ANNOUNCEMENTS. A guide to submitting announcements in the UAFS Content Management System

End-User Reference Guide Troy University OU Campus Version 10

Human-Computer Interaction Design

ViXeN Documentation. Release 1.0rc3.dev0. Prabhu Ramachandran and Kadambari Devarajan

Transcription:

polyglot Documentation Release 1.0.0 Dave Young 2017

Getting Started 1 Installation 3 1.1 Development............................................... 3 1.1.1 Sublime Snippets........................................ 3 1.2 Issues................................................... 4 2 Command-Line Usage 5 3 Documentation 7 4 Command-Line Tutorial 9 4.1 Webpage Article to HTML document.................................. 9 4.2 Webpage Article to PDF......................................... 10 4.3 Webpage Article to ebook........................................ 10 4.4 Send a Webpage Article Straight to Your Kindle............................ 10 4.5 Converting Kindle Notebook HTML Exports to Markdown...................... 10 5 Installation 13 5.1 Development............................................... 13 5.1.1 Sublime Snippets........................................ 13 5.2 Issues................................................... 14 6 Command-Line Usage 15 7 Documentation 17 8 Command-Line Tutorial 19 8.1 Webpage Article to HTML document.................................. 19 8.2 Webpage Article to PDF......................................... 20 8.3 Webpage Article to ebook........................................ 20 8.4 Send a Webpage Article Straight to Your Kindle............................ 20 8.5 Converting Kindle Notebook HTML Exports to Markdown...................... 20 8.5.1 Subpackages.......................................... 21 8.5.1.1 polyglot (subpackage)................................. 21 8.5.1.2 polyglot.commonutils (subpackage)......................... 21 8.5.1.3 polyglot.markdown (subpackage)........................... 21 8.5.2 Modules............................................. 21 8.5.2.1 polyglot.cl_utils (module)............................... 21 i

8.5.2.2 polyglot.utkit (module)................................ 22 8.5.3 Classes............................................. 22 8.5.3.1 polyglot.ebook (class)................................. 22 8.5.3.1.1 Methods................................... 24 8.5.3.2 polyglot.htmlcleaner (class)............................. 24 8.5.3.2.1 Methods................................... 25 8.5.3.3 polyglot.kindle (class)................................. 25 8.5.3.3.1 Methods................................... 26 8.5.3.4 polyglot.printpdf (class)................................ 26 8.5.3.4.1 Methods................................... 27 8.5.3.5 polyglot.webarchive (class).............................. 27 8.5.3.5.1 Methods................................... 28 8.5.3.6 polyglot.markdown.kindle_notebook (class)..................... 28 8.5.3.6.1 Methods................................... 28 8.5.3.7 polyglot.markdown.translate (class).......................... 29 8.5.3.7.1 Methods................................... 29 8.5.3.8 polyglot.utkit.utkit (class).............................. 30 8.5.3.8.1 Methods................................... 30 8.5.4 Functions............................................ 30 8.6 Indexes.................................................. 30 8.7 Todo................................................... 30 Python Module Index 31 ii

A python package and command-line tools for translating documents and webpages to various markup languages and document formats (html, epub, mobi..). Here s a summary of what s included in the python package: Classes polyglot.ebook polyglot.htmlcleaner polyglot.kindle polyglot.printpdf polyglot.webarchive polyglot.markdown.kindle_notebook polyglot.markdown.translate The worker class for the ebook module A parser/cleaner to strip a webpage article of all cruft and neatly present it with some nice css Send documents or webpage articles to a kindle device or app PDF printer *Generate a macos webarchive given the URL to a webpage * convert the HTML export of kindle notebooks (from kindle apps) to markdown The Multimarkdown translator object Functions Getting Started 1

2 Getting Started

CHAPTER 1 Installation The easiest way to install polyglot is to use pip: pip install polyglot Or you can clone the github repo and install from a local version of the code: git clone git@github.com:thespacedoctor/polyglot.git cd polyglot python setup.py install To upgrade to the latest version of polyglot use the command: pip install polyglot --upgrade Development If you want to tinker with the code, then install in development mode. This means you can modify the code from your cloned repo: git clone git@github.com:thespacedoctor/polyglot.git cd polyglot python setup.py develop Pull requests are welcomed! Sublime Snippets If you use Sublime Text as your code editor, and you re planning to develop your own python code with polyglot, you might find my Sublime Snippets useful. 3

Issues Please report any issues here. 4 Chapter 1. Installation

CHAPTER 2 Command-Line Usage Documentation for polyglot can be found here: http://pypolyglot.readthedocs.org/en/ stable Translate documents and webpages to various markup languages and document formats (html, epub, mobi..) Usage: polyglot init polyglot [-oc] (pdf html epub mobi) <url> [<destinationfolder> -f <filename> -s <pathtosettingsfile>] polyglot kindle <url> [-f <filename> -s <pathtosettingsfile>] polyglot [-o] (epub mobi) <docx> [<destinationfolder> -f <filename> -s <pathtosettingsfile>] polyglot kindle <docx> [-f <filename> -s <pathtosettingsfile>] polyglot [-o] kindlenb2md <notebook> [<destinationfolder> -s <pathtosettingsfile>] Options: init polyglot settings file for the first time pdf pdf html download webpage to a local HTML document epub format book from a webpage URL kindle article straight to kindle -h, --help message -v, --version -o, --open after creation -c, --clean clean styling to the output document setup the print webpage to parse and generate an epub send webpage show this help show version open the document add polyglot's 5

<url> the url of the article's webpage <docx> path to a DOCX file -s <pathtosettingsfile>, --settings <pathtosettingsfile> path to alternative settings file (optional) <destinationfolder> the folder to save the parsed PDF or HTML document to (optional) -f <filename>, --filename <filename> the name of the file to save, otherwise use webpage title as filename (optional) 6 Chapter 2. Command-Line Usage

CHAPTER 3 Documentation Documentation for polyglot is hosted by Read the Docs (last stable version and latest version). 7

8 Chapter 3. Documentation

CHAPTER 4 Command-Line Tutorial Before you begin using polyglot you will need to populate some custom settings within the polyglot settings file. To setup the default settings file at ~/.config/polyglot/polyglot.yaml run the command: polyglot init This should create and open the settings file; follow the instructions in the file to populate the missing settings values (usually given an XXX placeholder). polyglot often relies on a bunch on other excellent tools to get it s results like electron-pdf, pandoc and kidlegen. Depending on how you use polyglot, these tools may need to be install on your system. To read the basic usage intructions just run polyglot -h. Webpage Article to HTML document To generate a parsed, cleaned local HTML document from a webpage at a given URL use polyglot s html command: polyglot html https://en.wikipedia.org/wiki/volkswagen The filename for the output file is take from the webpage s title and output in the current directory. Here s the result. To instead give both a destination output and a specified filename for the resulting document: polyglot html https://en.wikipedia.org/wiki/volkswagen ~/Desktop -f cars_and_stuff To style the result with polyglots simple styling and easy to read fonts, use the -c flag: polyglot -c html https://en.wikipedia.org/wiki/volkswagen -f Volkswagen_Styled See the result here. 9

Webpage Article to PDF To instead print the webpage to PDF, you can either just print the original webpage: polyglot pdf https://en.wikipedia.org/wiki/volkswagen with this result, or you can choose again to use polyglot s styling: polyglot -c pdf https://en.wikipedia.org/wiki/volkswagen -f Volkswagen_Styled resulting in this PDF. Note if you are going to be running polyglot in a windowless environment, to generate the PDFs with electron-pdf <`https://github.com/fraserxu/electron-pdf> _ you will need to install xvfb. To install and setup do something similar to the following (depending on your flavour of OS): sudo apt-get install xvfb then in whatever bash scripts you write add this before any polyglot commands: export DISPLAY=':99.0' Xvfb :99 -screen 0 1024x768x24 > /dev/null 2>&1 & Webpage Article to ebook To generate an epub book from a webpage article run the command: polyglot epub http://www.thespacedoctor.co.uk/blog/2016/09/26/mysqlsucker-index.html Here is the output of this command. If you prefer a mobi output, use the command: polyglot mobi http://www.thespacedoctor.co.uk/blog/2016/09/26/mysqlsucker-index.html To get this mobi book. Send a Webpage Article Straight to Your Kindle Polyglot can go even further than creating a mobi ebook from the web-article; it can also send the ebook straight to your kindle device or smart phone app (or both at the same time) as long as you have the email settings populated in the polyglot settings file. polyglot kindle http://www.thespacedoctor.co.uk/blog/2016/09/26/mysqlsucker-index.html And here s the book appearing on a smart phone kindle app: Converting Kindle Notebook HTML Exports to Markdown On the Kindle app for ios you can export an HTML document of your notes and annotations via email. 10 Chapter 4. Command-Line Tutorial

The colors of the annotation convert to markdown with the following color-key: blue : code, yellow : text, orange : quote, pink : header Todo add a tutorial to convert kindle annotations 4.5. Converting Kindle Notebook HTML Exports to Markdown 11

12 Chapter 4. Command-Line Tutorial

CHAPTER 5 Installation The easiest way to install polyglot is to use pip: pip install polyglot Or you can clone the github repo and install from a local version of the code: git clone git@github.com:thespacedoctor/polyglot.git cd polyglot python setup.py install To upgrade to the latest version of polyglot use the command: pip install polyglot --upgrade Development If you want to tinker with the code, then install in development mode. This means you can modify the code from your cloned repo: git clone git@github.com:thespacedoctor/polyglot.git cd polyglot python setup.py develop Pull requests are welcomed! Sublime Snippets If you use Sublime Text as your code editor, and you re planning to develop your own python code with polyglot, you might find my Sublime Snippets useful. 13

Issues Please report any issues here. 14 Chapter 5. Installation

CHAPTER 6 Command-Line Usage Documentation for polyglot can be found here: http://pypolyglot.readthedocs.org/en/ stable Translate documents and webpages to various markup languages and document formats (html, epub, mobi..) Usage: polyglot init polyglot [-oc] (pdf html epub mobi) <url> [<destinationfolder> -f <filename> -s <pathtosettingsfile>] polyglot kindle <url> [-f <filename> -s <pathtosettingsfile>] polyglot [-o] (epub mobi) <docx> [<destinationfolder> -f <filename> -s <pathtosettingsfile>] polyglot kindle <docx> [-f <filename> -s <pathtosettingsfile>] polyglot [-o] kindlenb2md <notebook> [<destinationfolder> -s <pathtosettingsfile>] Options: init polyglot settings file for the first time pdf pdf html download webpage to a local HTML document epub format book from a webpage URL kindle article straight to kindle -h, --help message -v, --version -o, --open after creation -c, --clean clean styling to the output document setup the print webpage to parse and generate an epub send webpage show this help show version open the document add polyglot's 15

<url> the url of the article's webpage <docx> path to a DOCX file -s <pathtosettingsfile>, --settings <pathtosettingsfile> path to alternative settings file (optional) <destinationfolder> the folder to save the parsed PDF or HTML document to (optional) -f <filename>, --filename <filename> the name of the file to save, otherwise use webpage title as filename (optional) 16 Chapter 6. Command-Line Usage

CHAPTER 7 Documentation Documentation for polyglot is hosted by Read the Docs (last stable version and latest version). 17

18 Chapter 7. Documentation

CHAPTER 8 Command-Line Tutorial Before you begin using polyglot you will need to populate some custom settings within the polyglot settings file. To setup the default settings file at ~/.config/polyglot/polyglot.yaml run the command: polyglot init This should create and open the settings file; follow the instructions in the file to populate the missing settings values (usually given an XXX placeholder). polyglot often relies on a bunch on other excellent tools to get it s results like electron-pdf, pandoc and kidlegen. Depending on how you use polyglot, these tools may need to be install on your system. To read the basic usage intructions just run polyglot -h. Webpage Article to HTML document To generate a parsed, cleaned local HTML document from a webpage at a given URL use polyglot s html command: polyglot html https://en.wikipedia.org/wiki/volkswagen The filename for the output file is take from the webpage s title and output in the current directory. Here s the result. To instead give both a destination output and a specified filename for the resulting document: polyglot html https://en.wikipedia.org/wiki/volkswagen ~/Desktop -f cars_and_stuff To style the result with polyglots simple styling and easy to read fonts, use the -c flag: polyglot -c html https://en.wikipedia.org/wiki/volkswagen -f Volkswagen_Styled See the result here. 19

Webpage Article to PDF To instead print the webpage to PDF, you can either just print the original webpage: polyglot pdf https://en.wikipedia.org/wiki/volkswagen with this result, or you can choose again to use polyglot s styling: polyglot -c pdf https://en.wikipedia.org/wiki/volkswagen -f Volkswagen_Styled resulting in this PDF. Note if you are going to be running polyglot in a windowless environment, to generate the PDFs with electron-pdf <`https://github.com/fraserxu/electron-pdf> _ you will need to install xvfb. To install and setup do something similar to the following (depending on your flavour of OS): sudo apt-get install xvfb then in whatever bash scripts you write add this before any polyglot commands: export DISPLAY=':99.0' Xvfb :99 -screen 0 1024x768x24 > /dev/null 2>&1 & Webpage Article to ebook To generate an epub book from a webpage article run the command: polyglot epub http://www.thespacedoctor.co.uk/blog/2016/09/26/mysqlsucker-index.html Here is the output of this command. If you prefer a mobi output, use the command: polyglot mobi http://www.thespacedoctor.co.uk/blog/2016/09/26/mysqlsucker-index.html To get this mobi book. Send a Webpage Article Straight to Your Kindle Polyglot can go even further than creating a mobi ebook from the web-article; it can also send the ebook straight to your kindle device or smart phone app (or both at the same time) as long as you have the email settings populated in the polyglot settings file. polyglot kindle http://www.thespacedoctor.co.uk/blog/2016/09/26/mysqlsucker-index.html And here s the book appearing on a smart phone kindle app: Converting Kindle Notebook HTML Exports to Markdown On the Kindle app for ios you can export an HTML document of your notes and annotations via email. 20 Chapter 8. Command-Line Tutorial

The colors of the annotation convert to markdown with the following color-key: blue : code, yellow : text, orange : quote, pink : header Todo add a tutorial to convert kindle annotations Subpackages polyglot polyglot.commonutils polyglot.markdown common tools used throughout package Generate a markdown rendering from various sources and formats polyglot (subpackage) polyglot.commonutils (subpackage) common tools used throughout package polyglot.markdown (subpackage) Generate a markdown rendering from various sources and formats Modules polyglot.cl_utils Documentation for polyglot can be found here: http:// pypolyglot.readthedocs.org/en/stable polyglot.utkit Unit testing tools polyglot.cl_utils (module) Documentation for polyglot can be found here: http://pypolyglot.readthedocs.org/en/stable Translate documents and webpages to various markup languages and document formats (html, epub, mobi..) Usage: polyglot init polyglot [-oc] (pdf html epub mobi) <url> [<destinationfolder> -f <filename> -s <pathtosettingsfile>] polyglot kindle <url> [-f <filename> -s <pathtosettingsfile>] polyglot [-o] (epub mobi) <docx> [<destinationfolder> -f <filename> -s <pathtosettingsfile>] polyglot kindle <docx> [-f <filename> -s <path- ToSettingsFile>] polyglot [-o] kindlenb2md <notebook> [<destinationfolder> -s <pathtosettingsfile>] Options: init setup the polyglot settings file for the first time pdf print webpage to pdf html parse and download webpage to a local HTML document epub generate an epub format book from a webpage URL kindle send webpage article straight to kindle -h, --help -v, --version -o, --open show this help message show version open the document after creation 8.5. Converting Kindle Notebook HTML Exports to Markdown 21

-c, --clean add polyglot s clean styling to the output document <url> the url of the article s webpage <docx> path to a DOCX file -s <pathtosettingsfile>, settings <path- ToSettingsFile> path to alternative settings file (optional) <destinationfolder> the folder to save the parsed PDF or HTML document to (optional) -f <filename>, filename <filename> the name of the file to save, otherwise use webpage title as filename (optional) polyglot.cl_utils.main(arguments=none) The main function used when cl_utils.py is run as a single script from the cl, or when installed as a cl command polyglot.utkit (module) Unit testing tools class polyglot.utkit.utkit(moduledirectory) Override dryx utkit Classes polyglot.ebook polyglot.htmlcleaner polyglot.kindle polyglot.printpdf polyglot.webarchive polyglot.markdown.kindle_notebook polyglot.markdown.translate polyglot.utkit.utkit The worker class for the ebook module A parser/cleaner to strip a webpage article of all cruft and neatly present it with some nice css Send documents or webpage articles to a kindle device or app PDF printer *Generate a macos webarchive given the URL to a webpage * convert the HTML export of kindle notebooks (from kindle apps) to markdown The Multimarkdown translator object Override dryx utkit polyglot.ebook (class) class polyglot.ebook(log, settings, urlorpath, outputdirectory, bookformat, title=false, header=false, footer=false) The worker class for the ebook module Key Arguments: log logger settings the settings dictionary urlorpath the url or path to the content source bookformat the output format (epub, mobi) outputdirectory path to the directory to save the output html file to. title the title of the output document. I. False then use the title of the original source. Default False header content to add before the article/book content in the resulting ebook. Default False footer content to add at the end of the article/book content in the resulting ebook. Default False 22 Chapter 8. Command-Line Tutorial

Usage: WebToEpub To generate an ebook from an article found on the web, using the webpages s title as the filename for the book: from polyglot import ebook epub = ebook( log=log, settings=settings, urlorpath="http://www.thespacedoctor.co.uk/blog/2016/09/26/ mysqlsucker-index.html", title=false, bookformat="epub", outputdirectory="/path/to/output/folder" ) pathtoepub = epub.get() To add a header and footer to the epub book, and specify the title/filename for the book: from polyglot import ebook epub = ebook( log=log, settings=settings, urlorpath="http://www.thespacedoctor.co.uk/blog/2016/09/26/ mysqlsucker-index.html", title="mysql Sucker", bookformat="epub", outputdirectory="/path/to/output/folder", header='<a href="http://www.thespacedoctor.co.uk">thespacedoctor</a>', footer='<a href="http://www.thespacedoctor.co.uk">thespacedoctor</a>' ) pathtoepub = epub.get() WebToMobi To generate a mobi version of the webarticle, just switch epub for mobi: from polyglot import ebook mobi = ebook( log=log, settings=settings, urlorpath="http://www.thespacedoctor.co.uk/blog/2016/09/26/ mysqlsucker-index.html", title="mysql Sucker", bookformat="mobi", outputdirectory="/path/to/output/folder", header='<a href="http://www.thespacedoctor.co.uk">thespacedoctor</a>', footer='<a href="http://www.thespacedoctor.co.uk">thespacedoctor</a>' ) pathtomobi = mobi.get() DocxToEpub To instead convert a DOCX document to epub, simply switch out the URL for the path to the DOCX file, like so: from polyglot import ebook epub = ebook( 8.5. Converting Kindle Notebook HTML Exports to Markdown 23

log=log, settings=settings, urlorpath="/path/to/volkswagen.docx", title="a book about a car", bookformat="epub", outputdirectory="/path/to/output/folder", header='<a href="http://www.thespacedoctor.co.uk">thespacedoctor</a>', footer='<a href="http://www.thespacedoctor.co.uk">thespacedoctor</a>' ) pathtoepub = epub.get() DocxToMobi You can work it out yourself by now! init (log, settings, urlorpath, outputdirectory, bookformat, title=false, header=false, footer=false) Methods init (log, settings, urlorpath,...[,...]) get() get the ebook object polyglot.htmlcleaner (class) class polyglot.htmlcleaner(log, settings, url, outputdirectory=false, title=false, style=true, metadata=true, h1=true) A parser/cleaner to strip a webpage article of all cruft and neatly present it with some nice css Key Arguments: Usage: log logger settings the settings dictionary url the URL to the HTML page to parse and clean outputdirectory path to the directory to save the output html file to title title of the document to save. If False will take the title of the HTML page as the filename. Default False. style add polyglot s styling to the HTML document. Default True metadata include metadata in generated HTML. Default True h1 include title as H1 at the top of the doc. Default True To generate the HTML page, using the title of the webpage as the filename: from polyglot import htmlcleaner cleaner = htmlcleaner( log=log, settings=settings, url="http://www.thespacedoctor.co.uk/blog/2016/09/26/mysqlsucker- index.html", outputdirectory="/tmp" 24 Chapter 8. Command-Line Tutorial

) cleaner.clean() Or specify the title of the document and remove styling, metadata and title: from polyglot import htmlcleaner cleaner = htmlcleaner( log=log, settings=settings, url="http://www.thespacedoctor.co.uk/blog/2016/09/26/mysqlsucker- index.html", outputdirectory="/tmp", title="my_clean_doc", style=false, metadata=false, h1=false ) cleaner.clean() init (log, settings, url, outputdirectory=false, title=false, style=true, metadata=true, h1=true) Methods init (log, settings, url[,...]) clean() parse and clean the html document with Mercury Parser polyglot.kindle (class) class polyglot.kindle(log, settings, urlorpath, title=false, header=false, footer=false) Send documents or webpage articles to a kindle device or app Key Arguments: Usage: log logger settings the settings dictionary urlorpath the url or path to the content source title the title of the output document. I. False then use the title of the original source. Default False header content to add before the article/book content in the resulting ebook. Default False footer content to add at the end of the article/book content in the resulting ebook. Default False To send content from a webpage article straight to your kindle device or smart phone app, you will first need to populate the email settings with polyglot s settings file at ~.config/polyglot/ polyglot.yaml, then use the following code: from polyglot import kindle sender = kindle( log=log, settings=settings, 8.5. Converting Kindle Notebook HTML Exports to Markdown 25

urlorpath="http://www.thespacedoctor.co.uk/blog/2016/09/26/ mysqlsucker-index.html", header='<a href="http://www.thespacedoctor.co.uk">thespacedoctor</a>', footer='<a href="http://www.thespacedoctor.co.uk">thespacedoctor</a>' ) success = sender.send() Success is True or False depending on the success/failure of sending the email to the kindle email address(es). init (log, settings, urlorpath, title=false, header=false, footer=false) Methods init (log, settings, urlorpath[, title,...]) get() get_attachment(file_path) send() get the ebook object Get file as MIMEBase message send the mobi book generated to kindle email address(es) polyglot.printpdf (class) class polyglot.printpdf(log, settings=false, url=false, title=false, folderpath=false, append=false, readability=true) PDF printer Key Arguments: Usage: log logger settings the settings dictionary url the webpage url title title of pdf folderpath path at which to save pdf append append this at the end of the file name (not title) readability clean text with Mercury Parser To print a webpage to PDF without any cleaning of the content using the title of the webpage as filename: from polyglot import printpdf pdf = printpdf( log=log, settings=settings, url="https://en.wikipedia.org/wiki/volkswagen", folderpath="/path/to/output", readability=false ).get() To give the PDF an alternative title use: 26 Chapter 8. Command-Line Tutorial

from polyglot import printpdf pdf = printpdf( log=log, settings=settings, url="https://en.wikipedia.org/wiki/volkswagen", folderpath="/path/to/output", title="cars", readability=false ).get() Or to append a string to the end of the filename before.pdf extension (useful for indexing or adding date created etc): from datetime import datetime, date, time now = datetime.now() now = now.strftime("%y%m%dt%h%m%s") from polyglot import printpdf pdf = printpdf( log=log, settings=settings, url="https://en.wikipedia.org/wiki/volkswagen", folderpath="/path/to/output", append="_"+now, readability=false ).get() To clean the content using the Mercury Parser and apply some simple styling and pretty fonts: from polyglot import printpdf pdf = printpdf( log=log, settings=settings, url="https://en.wikipedia.org/wiki/volkswagen", folderpath=pathtooutputdir, readability=true ).get() init (log, settings=false, url=false, title=false, folderpath=false, append=false, readability=true) Methods init (log[, settings, url, title,...]) get() get the PDF polyglot.webarchive (class) class polyglot.webarchive(log, settings=false) *Generate a macos webarchive given the URL to a webpage * Key Arguments: log logger 8.5. Converting Kindle Notebook HTML Exports to Markdown 27

Usage: settings the settings dictionary To setup your logger, settings and database connections, please use the fundamentals package (see tutorial here). Generate a webarchive docuemnt from a URL, use the following: from polyglot import webarchive wa = webarchive( log=log, settings=settings ) wa.create(url="https://en.wikipedia.org/wiki/volkswagen", pathtowebarchive=pathtooutputdir + "Volkswagen.webarchive") init (log, settings=false) Methods init (log[, settings]) create(url, pathtowebarchive) create the webarchive object polyglot.markdown.kindle_notebook (class) class polyglot.markdown.kindle_notebook(log, kindleexportpath, outputpath) convert the HTML export of kindle notebooks (from kindle apps) to markdown Key Arguments: Usage: log logger kindleexportpath path to the exported kindle HTML file outputpath the output path to the md file. To convert the exported HTML file of annotation and notes from a kindle book or document to markdown, run the code: from polyglot.markdown import kindle_notebook nb = kindle_notebook( log=log, kindleexportpath="/path/to/kindle_export.html", outputpath="/path/to/coverted_annotations.md" ) nb.convert() The colours of the annotations convert to markdown attributes via the following key: init (log, kindleexportpath, outputpath) Methods 28 Chapter 8. Command-Line Tutorial

init (log, kindleexportpath, outputpath) clean(text) convert() converttomd(kindlenote) find_component(divtag, annotations) convert the kindle_notebook object polyglot.markdown.translate (class) class polyglot.markdown.translate(log, settings=false) The Multimarkdown translator object Key Arguments: Usage: log logger settings the settings dictionary To setup your logger, settings and database connections, please use the fundamentals package (see tutorial here). To initiate a translate object, use the following: from polyglot.markdown import translate md = translate( log=log, settings=settings ) init (log, settings=false) Methods init (log[, settings]) blockquote(text) bold(text) cite(title[, author, year, url, publisher,...]) code(text) codeblock(text[, lang]) comment(text) definition(text, definition) em(text) footnote(text) glossary(term, definition) header(text, level) headerlink(headertext[, text]) hl(text) image(url[, title, width]) inline_link(text, url) math_block(text) math_inline(text) convert plain-text to MMD blockquote convert plain-text to MMD bolded text generate a MMD citation convert plain-text to MMD inline-code text convert plain-text to MMD fenced codeblock convert plain-text to MMD comment genarate a MMD definition convert plain-text to MMD italicised text convert plain-text to MMD footnote genarate a MMD glossary convert plain-text to MMD header generate a link to a MMD header convert plain-text to MMD critical markup highlighted text create MMD image link generate a MMD sytle link convert plain-text to MMD math block convert plain-text to MMD inline math Continued on next page 8.5. Converting Kindle Notebook HTML Exports to Markdown 29

ol(text) strike(text) ul(text) underline(text) url(text) Table 8.10 continued from previous page convert plain-text to MMD ordered list convert plain-text to HTML strike-through text convert plain-text to MMD unordered list convert plain-text to HTML underline text convert plain-text to MMD clickable URL polyglot.utkit.utkit (class) class polyglot.utkit.utkit(moduledirectory) Override dryx utkit init (moduledirectory) Methods init (moduledirectory) setupmodule() teardownmodule() The setupmodule method The teardownmodule method Functions Indexes Module Index Full Index Todo Todolist 30 Chapter 8. Command-Line Tutorial

Python Module Index p polyglot, 21 polyglot.cl_utils, 21 polyglot.commonutils, 21 polyglot.markdown, 21 polyglot.utkit, 22 31

32 Python Module Index

Index Symbols init () (polyglot.ebook method), 24 init () (polyglot.htmlcleaner method), 25 init () (polyglot.kindle method), 26 init () (polyglot.markdown.kindle_notebook method), 28 init () (polyglot.markdown.translate method), 29 init () (polyglot.printpdf method), 27 init () (polyglot.utkit.utkit method), 30 init () (polyglot.webarchive method), 28 E ebook (class in polyglot), 22 H htmlcleaner (class in polyglot), 24 K kindle (class in polyglot), 25 kindle_notebook (class in polyglot.markdown), 28 M main() (in module polyglot.cl_utils), 22 P polyglot (module), 21 polyglot.cl_utils (module), 21 polyglot.commonutils (module), 21 polyglot.markdown (module), 21 polyglot.utkit (module), 22 printpdf (class in polyglot), 26 T translate (class in polyglot.markdown), 29 U utkit (class in polyglot.utkit), 22, 30 W webarchive (class in polyglot), 27 33