django-data-migration Documentation

Similar documents
LECTURE 21. Database Interfaces

django-cron Documentation

django-embed-video Documentation

phoenixdb Release Nov 14, 2017

Easy-select2 Documentation

django-embed-video Documentation

LABORATORY OF DATA SCIENCE. Data Access: Relational Data Bases. Data Science and Business Informatics Degree

postgresql postgresql in Flask in 15 or so slides 37 slides

django-image-cropping Documentation

Runtime Dynamic Models Documentation Release 1.0

django-embed-video Documentation

Aldryn Installer Documentation

Archan. Release 2.0.1

open-helpdesk Documentation

django-auditlog Documentation

South Documentation. Release 1.0. Andrew Godwin

Transaction Isolation Level in ODI

Automatic MySQL Schema Management with Skeema. Evan Elias Percona Live, April 2017

django-allauth-2fa Documentation

agate-sql Documentation

django-composite-foreignkey Documentation

Graphene Documentation

Spade Documentation. Release 0.1. Sam Liu

Cookiecutter Django CMS Documentation

Traffic violations revisited

Connexion Sqlalchemy Utils Documentation

Django Wordpress API Documentation

django-cas Documentation

django-composite-foreignkey Documentation

Garment Documentation

CID Documentation. Release Francis Reyes

Confire Documentation

CE419 Web Programming. Session 15: Django Web Framework

django-intercom Documentation

coopy Documentation Release 0.4b Felipe Cruz

django-responsive2 Documentation

web-transmute Documentation

LABORATORY OF DATA SCIENCE. Data Access: Relational Data Bases. Data Science and Business Informatics Degree

Django-frontend-notification Documentation

Django Groups Manager Documentation

Transactions for web developers

requests-cache Documentation

Django Groups Manager Documentation

Signals Documentation

newauth Documentation

Oracle Sql Describe Schemas Query To Find Database

How to upgrade an Access application to use a SQL Server backend

Django-CSP Documentation

django-app-metrics Documentation

silk Documentation Release 0.3 Michael Ford

django-crucrudile Documentation

Context-Oriented Programming with Python

django-private-chat Documentation

Tracking Reproducible Revisions

BanzaiDB Documentation

pysqlw Documentation Release plausibility

djangotribune Documentation

Django IPRestrict Documentation

Manual Trigger Sql Server 2008 Insert Multiple Rows At Once

Gunnery Documentation

MapEntity Documentation

Python wrapper for Viscosity.app Documentation

django-geoip Documentation

Flask-Migrate Documentation. Miguel Grinberg

Torndb Release 0.3 Aug 30, 2017

Frontier Documentation

django-push Documentation

django-celery Documentation

Mysql Workbench Import Sql No Database. Selected >>>CLICK HERE<<<

Sql 2008 Copy Table Structure And Database To

django-mama-cas Documentation

django-users2 Documentation

Databases. Course October 23, 2018 Carsten Witt

Django MFA Documentation

django-ratelimit-backend Documentation

django-reinhardt Documentation

Splunk & Git. The joys and pitfalls of managing your Splunk deployment with Git. Copyright 2018

django simple pagination Documentation

POSTGRESQL - PYTHON INTERFACE

Django Localized Recurrence Documentation

django-teamwork Documentation

neo4django Documentation

Roman Numeral Converter Documentation

To use this booklet, just find the task or language topic you need a quick reminder on in the table of contents and jump to that section.

Watson - DB. Release 2.7.0

youckan Documentation

django-ad-code Documentation

Manual Trigger Sql Server 2008 Insert Update Delete Selection

wagtail-robots Documentation

Created by: Nicolas Melillo 4/2/2017 Elastic Beanstalk Free Tier Deployment Instructions 2017

IEMS 5722 Mobile Network Programming and Distributed Server Architecture

Cluster Upgrade Procedure with Job Queue Migration.

Manual Trigger Sql Server 2008 Insert Update Delete

coxtactoe Documentation

nucleon Documentation

nacelle Documentation

Poulpe Documentation. Release Edouard Klein

Django Extra Views Documentation

Tomasz Szumlak WFiIS AGH 23/10/2017, Kraków

yardstick Documentation

Transcription:

django-data-migration Documentation Release 0.2.1 Philipp Böhm Sep 27, 2017

Contents 1 Contents: 3 1.1 Installing................................................. 3 1.2 Writing Migrations............................................ 3 1.2.1 What is a Migration?...................................... 3 1.2.2 A complete Migration example................................. 4 1.2.3 Migration details........................................ 4 1.3 Migrating your data........................................... 9 1.4 Troubleshooting............................................. 9 1.4.1 GROUP_CONCAT column size limit in Mysql is too low................... 9 1.5 Changelog................................................ 9 1.5.1 Version 0.2.1.......................................... 9 1.5.2 Version 0.2.0.......................................... 9 1.5.3 Version < 0.2.0......................................... 10 Python Module Index 11 i

ii

django-data-migration is a reusable Django app that migrates your legacy data into your new django app. The only thing you have to supply is an appropriate SQL query that transforms your data fromthe old schema into your model structure. Dependencies between these migrations will be resolved automatically. Give it a try! Warning: missing. This documentation is a work in progress. Please open an issue on Github if any information are Contents 1

2 Contents

CHAPTER 1 Contents: Installing Installation of django-data-migration is straight forward, as it only requires the following steps (assuming you have already set up virtualenv and pip). 1. Install using pip: pip install django-data-migration 2. Add django-data-migration to your requirements.txt 3. Add to INSTALLED_APPS: 'data_migration', 4. Run./manage.py migrate or./manage.py syncdb to create the included models Writing Migrations What is a Migration? A migration is a Python class that should be placed in a file called data_migration_spec.py in one of your app-directories. django-data-migrations searches in each app, included in INSTALLED_APPS, for this file and imports all from it automatically. Your migration normally specifies the following things: A database connection to your legacy data (whereever this is) The model class, the migration should create instances for A corresponding SQL-Query, which maps the old DB-schema to the new Django-model-schema 3

You can specify what should be done with special columns, returned by the query (Many2Many-, ForeignKey-, One2One-Relations). With minimal configuration, these things can be migrated automatically. Dependencies to other models can be specified. This is used, to determine the order each migration can be applied. e.g. If a migration specifies a model as dependency, his migration will be executed before our migration will be processed. You can implement different hooks, where you normally manipulate the data returned by the query or do some things which are not possible by SQL itself. You can specify, if your migration should look for new instances on a second run. This is not the default case. A complete Migration example To give you an overview, how a common migration looks, the following listing shows a migration for a Post model. This is an excerpt from a data_migration_spec.py which can be found in a testing app, which is used by django-data-migration itself. The complete app can be found here... class PostMigration(BaseMigration): query = """ SELECT id, Title as title, Body as body, Posted as posted, Author as author, ( SELECT GROUP_CONCAT(id) FROM comments c WHERE c.post = p.id ) as comments FROM posts p; """ model = Post depends_on = [ Author, Comment ] column_description = { 'author': is_a(author, search_attr="id", fk=true, prefetch=false), 'comments': is_a(comment, search_attr="id", m2m=true, delimiter=",", prefetch=false) } @classmethod def hook_after_save(self, instance, row): # because of the auto_now_add flag, we have to set it hard to this value instance.posted = row['posted'] instance.save() As you can see, PostMigration inherits from a class called BaseMigration. This is one of the classes which is listed here Setup Database Connection. Migration details 4 Chapter 1. Contents:

Setup Database Connection django-data-migration should support as many databases as possible, so the connection part is not implemented directly for each database. You have to override the open_db_connection classmethod in your migration. Tip: The connection handling should be implemented once in a BaseMigration where all other Migrations inherit from. Important: django-data-migration requires that the database returns a DictCursor, where each row is a dict with column names as keys and the row as corresponding values. SQLite The following code implements an example database connection for SQLite: import sqlite3 class BaseMigration(Migration): @classmethod def open_db_connection(self): conn = sqlite3.connect(...) def dict_factory(cursor, row): d = {} for idx, col in enumerate(cursor.description): d[col[0]] = row[idx] return d conn.row_factory = dict_factory return conn MySQL You have to install the corresponding MySQL-Python-driver by executing: pip install MySQL-python The following code implements an example database connection for MySQL. import MySQLdb class BaseMigration(Migration): @classmethod def open_db_connection(self): return MySQLdb.connect(..., cursorclass=mysqldb.cursors.dictcursor ) 1.2. Writing Migrations 5

PostgreSQL You have to install the corresponding PostgreSQL-Python-driver by executing: pip install psycopg2 Important: a version of psycopg >= 2.5 is required as it allows the cursor_factory to be specified through connect() instead of get_cursor(). The following code implements an example database connection for PostgreSQL. import psycopg2 import psycopg2.extras class BaseMigration(Migration): @classmethod def open_db_connection(self): return psycopg2.connect(..., cursor_factory=psycopg2.extras.realdictcursor ) MS-SQL @aidanlister contributed a sample DB connection for MS-SQL using pyodbc, which has to be installed first: pip install pyodbc The following code implements an example database connection for MS-SQL. import pyodbc class ConnectionWrapper(object): def init (self, cnxn): self.cnxn = cnxn def getattr (self, attr): return getattr(self.cnxn, attr) def cursor(self): return CursorWrapper(self.cnxn.cursor()) class CursorWrapper(object): def init (self, cursor): self.cursor = cursor def getattr (self, attr): return getattr(self.cursor, attr) def fetchone(self): row = self.cursor.fetchone() if not row: return None 6 Chapter 1. Contents:

row)) return dict((t[0], value) for t, value in zip(self.cursor.description, row)) def fetchall(self): rows = self.cursor.fetchall() if not rows: return None dictrows = [] for row in rows: row = dict((t[0], value) for t, value in zip(self.cursor.description, dictrows.append(row) return dictrows class BaseMigration(Migration): @classmethod def open_db_connection(self): dsn = "DRIVER={SQL Server Native Client 11.0};SERVER=X;DATABASE=X;UID=X;PWD=X" cnxn = pyodbc.connect(dsn) wrapped_connection = ConnectionWrapper(cnxn) return wrapped_connection What can be configured in every migration In your migration classes you have several configuration options, which are listed below with a short description. For an in-depth explanation you can consult the paragraphs below. Writing effective Migration-queries Important: TODO Define dependencies Important: TODO Describe special columns Your query can include special columns, that are represented as special Django-relations (ForeignKey-, Many2Manyor One2One-Relations). Or you can exclude specific columns from automatic processing. You will normally define these settings with an invocation of the is_a-function, which does some tests and returns the required settings. This will then be used by django-data-migration in different places. Some examples for is_a can be found here: A complete Migration example. 1.2. Writing Migrations 7

Using Migration Hooks data_migration.migration.migration defines a number of different hook-functions which will be called at different places allowing you to customize the migration work at different levels. Error-Handling In case of an exception when creating the instances, a default error handler will be called, to print the current row to stderr and than reraise the exception. You can override this hook in your migration if it requires special handling of errors. When this method returns without an exception, the next row from the query will be processed. Hook-Flowchart The following graphic shows each Hook-method and when it is called in contrast to the model handling which is done by django-data-migration. +------------------+ hook_before_all() +--------------+---+ +-----+ +--v----v--------------------+ hook_before_transformation() +-------+--------------------+ +---v--------------------+ instance = model(**data) +---+--------------------+ +-------v----------+ hook_before_save() +-------+----------+ +---v-----------+ instance.save() +---+-----------+ +-------v---------+ hook_after_save() +-------+---------+ +-------+--+ +-----------v-----+ hook_after_all() +-----------------+ Implement updateable Migrations 8 Chapter 1. Contents:

Important: TODO Migrating your data After you wrote all of your Migrations, you can put them in action by executing the migrate_legacy_data management command:./manage.py migrate_legacy_data [--commit] If you omit the --commit-flag, the data is not saved to the DB. This is useful when you develop your migrations and have some failing migrations, but the db is not cluttered with any incomplete data. When your migrations are succesful you can add --commit and your data is saved when no errors occur. Note: In older versions of this library, the management command is called migrate_this_shit. This has been deprecated, but it is still there. migrate_legacy_data should be more appropriate. Troubleshooting GROUP_CONCAT column size limit in Mysql is too low Mysql has fairly low limit for rows that can be merged by the GROUP_CONCAT function. For large result sets, this has to be increased. This can be done with the following SQL statement, which can be executed in open_db_connection.: @classmethod def open_db_connection(self): conn = MySQLdb.connect(...) cursor = conn.cursor() cursor.execute('set SESSION group_concat_max_len = 60000000;') return conn Changelog Version 0.2.1 atomic() is now used instead of commit_on_success() when it is available. This prevents deprecation warnings that are displayed with Django >= 1.7. Version 0.2.0 Introduced some performance improvements by implementing prefetching of related objects, that reduces the number of issued SQL-Queries dramatically. There is now the opportunity to assign related objects by their id instead of a full instance, which can reduce the memory usage. 1.3. Migrating your data 9

There are two new arguments in is_a: prefetch=true and assign_by_id=false. Because prefetching is enabled by default, it should bring a massive performance boost only by upgrading to this version Switching to a new minor release because of the changed behaviour in get_object Version < 0.2.0 There is no explicit Changelog until 0.2.0. Use git log to get the information from git. 10 Chapter 1. Contents:

Python Module Index d data_migration.migration, 7 11

12 Python Module Index

Index D data_migration.migration (module), 7 13