COMP20008 Elements of Data Processing. Week 1: Lecture 2. Data format and storage

Size: px
Start display at page:

Download "COMP20008 Elements of Data Processing. Week 1: Lecture 2. Data format and storage"

Transcription

1 COMP20008 Elements of Data Processing Week 1: Lecture 2 Data format and storage

2 Announcements Lecture recordings Lecture Capture: Current Technical Issue. There are currently long delays in processing recorded lectures. We apologise for the inconvenience.

3 Announcements Assessable content Includes material from lectures, workshops and assignments Study guide describing in detail what concepts to focus on, will be released towards end of semester

4 Announcements Student representatives Nelson Chen Joanna Lee Next week s workshop Is available on the LMS

5 Today Where is the data? How is data stored and in what formats? RDB, HTML, CSV, XML, JSON, RDF. Question: Why do we have different data formats and why do we wish to transform between different formats?

6 Relational databases (INFO20003) It is good to have structure for data! Easier to analyse, easier to query Easier to store Easier to clean, maintain consistency and security, especially with multiple users Relational databases, the classic method of storing structured data (banking, sales, airlines ) Data stored in tables, each row is a data item and columns describe attributes of the data item Can query the data using a high level language such as SQL

7 Examples from Silberschatz et al Database System Concepts Attributes

8 Sample relational database

9 SQL create table branch (branch_name char(15) not null, branch_city char(30), assets integer, primary key (branch_name))

10 SQL select account.balance from depositor, account where depositor.customer_id = and depositor.account_number=account.account_number

11 Database system structure

12 Database systems In INFO20003 subject you will cover topics like SQL Specification of integrity constraints Data modelling and relational database management systems Transactions and concurrency control Storage management Web-based databases. Highly relevant to data wrangling! Useful to do INFO20003 as part of a data science specialisation

13 Challenges Once data is into a relational database, it is easier to wrangle. But maybe hard to load it there in the first place Unstructured data: text, HTML, sequences, graphs

14 NoSQL Not only SQL databases Want a highly scalable and elastic system, distributed over many servers Can flexibly store different data types (documents, sequences, graphs, objects) Sacrifice some of the nice properties of relational databases Query language Consistency and integrity guarantees Examples Key-value storage, document database, graph database Google BigTable, Amazon SimpleDB, Apache CouchDB, MongoDB,.. We will revisit these in more detail when we cover distributed & cloud topic in a few weeks time

15 Spreadsheets: CSV Huge amounts of data lives in spreadsheets Businesses Hospitals. Microsoft (Excel), OpenOffice (Calc), Google docs CSV (comma separated values) also very popular These are human readable, versus binary XLS format (Excel) CSVs lack the formatting information of an XLS file Python libraries csv xlrd, openpyxl

16 Text data (documents..) Specifying patterns in text regular expressions Good for computing statistics, checking integrity, filtering, substitutions. Specifying patterns in text. matches any character ^ matches start of string $ matches end of string * zero or more repetitions + one or more repetitions the or operator - [] a set of characters, e.g. [abcd] or [a-za-z] regex101.com

17 Exercises (3 minutes) Write regular expressions to specify each of the following Two occurrences of letter e followed immediately by one n and then at least one t An h or an e or an x, followed by at least one `a, followed by an r Any 3 characters, possibly followed by a repeated sequence of the character x, followed by a c or a d

18 address regular expression One attempt See also Python library

19 Table data As well as relational databases and csv files, table data is abundant on the Web Encoded in HTML Google fusion tables

20 HTML Hypertext Markup language Marked up with elements, delineated by start and end tags. Elements correspond to logical units, such as a heading, paragraph or itemised list. Tags: Keywords contained in pairs of angle brackets. Not case sensitive. Browser determines how to display/present the logical units Not all elements need both start and end tags. Some elements can have attributes. Ordering of attributes is not significant.

21 HTML Example <div class="icon section5"> <hh2><a href="about/index.html">about the Melbourne School of Engineering</ a></h2> <ul> <li><a href="about/dean_welcome.html">dean's Welcome</a></li> <li><a href="about/staff.html">leadership & Professional Staff</a></li> <li><a href="about/contact.html">contact Us</a></li> <li><a href=" Computer Resources</a></li> <li><a href="intranet/index.html">for Staff (intranet)</a></li> <li><a href="casual_staff/index.html">for Casual Staff</a></li> <li><a href="intranet/review/prof_staff.html">professional Staff Review</a></li> <li><a href="/about/safety/index.html">environment, Health & Safety</a></li> <li><a href="/about/committees/index.html">committees</a></li> </ul>

22 XML Extensible Markup Language Allows new elements to be defined Applications may generate and process XML Enables data exchange between different platforms Facilitates better encoding of semantics <CATALOG> <CD> <TITLE>Empire Burlesque</TITLE> <ARTIST>Bob Dylan</ARTIST> <COUNTRY>USA</COUNTRY> <COMPANY>Columbia</COMPANY> <PRICE CURRENCY="USD"> 10.90</PRICE> <YEAR>1985</YEAR> </CD> <CD> <TITLE>Hide your heart</title> <ARTIST>Bonnie Tyler</ARTIST> <COUNTRY>UK</COUNTRY> <COMPANY>CBS Records</COMPANY> <PRICE CURRENCY="USD">9.90</PRICE> <YEAR>1988</YEAR> </CD> </CATALOG>

23 JSON: JavaScript Object Notation { } "Catalog": [ ] { "CD": { "title": "Empire Burlesque", }, "artist": "Bon Dylan", "Country": "USA" "price": { "Currency": "USD", "value": }, "year": 1985 } { "CD": { } "title": "Hide your heart", "artist": "Bonnie Taylor", "Country": "UK", "price": { "currency": "USD", "value": 9.90 }, "year": 1988} } JSON is simpler and more compact/ lightweight than XML. Easy to parse. Common JSON application read and display data from a webserver using javascript. json_http.asp XML comes with a large family of other standards for querying and transforming (XQuery, XML Schema, XPATH, XSLT, namespaces, )

24 Jason format (from json.org)

25 JASON format (json.org)

26 Exercise Represent the following information in JSON <Person> <FirstName>Homer</FirstName> <LastName>Simpson</LastName> <Relatives> <Relative>Grandpa</Relative> <Relative>Marge</Relative> <Relative>Lisa</Relative> <Relative>Bart</Relative> </Relatives> <FavouriteBeer>Duff</FavouriteBeer> </Person>

27 Python libraries json ElementTree html.parser

28 Other forms of data Sequences Graphs Can be represented in multiple ways

29 Sequence data: biology Biological sequences (DNA, proteins) >gi sp P01013 OVAX_CHICK GENE X PROTEIN (OVALBUMIN-RELATED) QIKDLLVSSSTDLDTTLVLVNAIYFKGMWKTAFNAEDTREMPFHVTKQESKPVQMM CMNNSFNVATLPAEKMKILELPFASGDLSMLVLLPDEVSDLERIEKTINFEKLTEWT NPNTMEKRRVKVYLPQMKIEEKYNLTSVLMALGMTDLFIPSANLTGISSAESLKISQ AVHGAFMELSEDGIEMAGSTGVIEDIKHSPESEQFRADHPFLFLIKHNPTNTIVYFG RYWSP

30 Graphs: Social networks

31 Protein-Protein Interactions

32 The Internet Graph (

33 Graph Data - RDF RDF= Resource Description Framework Used for storing semantic data (relationships between concepts and objects) Used on the Semantic Web E.g. Freebase, Google knowledge graph

34 Graphs RDF (resource description framework) [materials from w3.org]

35 Serialisation of RDF Example Graph This graph can be serialised as XML (don t worry about syntax!) <?xml version="1.0"?> <rdf:rdf xmlns:rdf=" xmlns:contact=" contact#"> <contact:person rdf:about=" contact#me"> <contact:fullname>eric Miller</contact:fullName> <contact:mailbox rdf:resource="mailto:em@w3.org"/> <contact:personaltitle>dr.</contact:personaltitle> </contact:person>

36 Freebase A large database that connects entities together as a graph The basis of the Google Knowledge graph that is used to improve search. search/knowledge.html

37 RDF Triple Store An alternative format for storing RDF type data triple store < < 2000/10/swap/pim/contact#fullName> "Eric Miller". < < 2000/10/swap/pim/contact#mailbox> <mailto:e.miller123(at)example>. < < 2000/10/swap/pim/contact#personalTitle> "Dr.". < < 1999/02/22-rdf-syntax-ns#type> < pim/contact#person>.

38 Graphs: Matrix Representation A B A B C D A B C D C D Source A 1 in the matrix iff there is an edge from node X to node Y. Or use a relational table Destination A C D C B B

39 Next week Workshop for next week Available on the LMS Useful Unix tools Directory navigation and file manipulation, redirection, pipes, awk, sed, regex, grep... Look at Section 1a before you attend your workshop Lectures next week Data quality and data cleaning (lasting ~2 weeks)

40 Further reading Further reading Relational databases Pages of XML JSON RDF

Introduction to XML. M2 MIA, Grenoble Université. François Faure

Introduction to XML. M2 MIA, Grenoble Université. François Faure M2 MIA, Grenoble Université Example tove jani reminder dont forget me this weekend!

More information

Chapter 13: Advanced topic 3 Web 3.0

Chapter 13: Advanced topic 3 Web 3.0 Chapter 13: Advanced topic 3 Web 3.0 Contents Web 3.0 Metadata RDF SPARQL OWL Web 3.0 Web 1.0 Website publish information, user read it Ex: Web 2.0 User create content: post information, modify, delete

More information

The components of a basic XML system.

The components of a basic XML system. XML XML stands for EXtensible Markup Language. XML is a markup language much like HTML XML is a software- and hardware-independent tool for carrying information. XML is easy to learn. XML was designed

More information

10/18/2017. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414

10/18/2017. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414 Announcements Database Systems CSE 414 Lecture 11: NoSQL & JSON (mostly not in textbook only Ch 11.1) HW5 will be posted on Friday and due on Nov. 14, 11pm [No Web Quiz 5] Today s lecture: NoSQL & JSON

More information

Web 3.0 Overview: Interoperability in the Web dimension (1) Web 3.0 Overview: Interoperability in the Web dimension (2) Metadata

Web 3.0 Overview: Interoperability in the Web dimension (1) Web 3.0 Overview: Interoperability in the Web dimension (2) Metadata Information Network I Web 3.0 Youki Kadobayashi NAIST Web 3.0 Overview: Interoperability in the Web dimension (1) Interoperability of data: Assist in interacting with arbitrary (including unknown) resources

More information

Implementing and extending SPARQL queries over DLVHEX

Implementing and extending SPARQL queries over DLVHEX Implementing and extending SPARQL queries over DLVHEX Gennaro Frazzingaro Bachelor Thesis Presentation - October 5, 2007 From a work performed in Madrid, Spain Galway, Ireland Rende, Italy How to solve

More information

Information Network I Web 3.0. Youki Kadobayashi NAIST

Information Network I Web 3.0. Youki Kadobayashi NAIST Information Network I Web 3.0 Youki Kadobayashi NAIST Web 3.0 Overview: Interoperability in the Web dimension (1) Interoperability of data: Metadata Data about data Assist in interacting with arbitrary

More information

5/2/16. Announcements. NoSQL Motivation. The New Hipster: NoSQL. Serverless. What is the Problem? Database Systems CSE 414

5/2/16. Announcements. NoSQL Motivation. The New Hipster: NoSQL. Serverless. What is the Problem? Database Systems CSE 414 Announcements Database Systems CSE 414 Lecture 16: NoSQL and JSon Current assignments: Homework 4 due tonight Web Quiz 6 due next Wednesday [There is no Web Quiz 5 Today s lecture: JSon The book covers

More information

XML: Extensible Markup Language

XML: Extensible Markup Language XML: Extensible Markup Language CSC 375, Fall 2015 XML is a classic political compromise: it balances the needs of man and machine by being equally unreadable to both. Matthew Might Slides slightly modified

More information

Database Systems CSE 414

Database Systems CSE 414 Database Systems CSE 414 Lecture 16: NoSQL and JSon CSE 414 - Spring 2016 1 Announcements Current assignments: Homework 4 due tonight Web Quiz 6 due next Wednesday [There is no Web Quiz 5] Today s lecture:

More information

What's New in RDF 1.1

What's New in RDF 1.1 What's New in RDF 1.1 SemTechBiz June 2013 http://www.w3.org/2013/talks/0603-rdf11 Sandro Hawke, W3C Staff sandro@w3.org @sandhawke Overview 1. Stability and Interoperability 2. Non-XML Syntaxes Turtle

More information

CSE 344 JULY 9 TH NOSQL

CSE 344 JULY 9 TH NOSQL CSE 344 JULY 9 TH NOSQL ADMINISTRATIVE MINUTIAE HW3 due Wednesday tests released actual_time should have 0s not NULLs upload new data file or use UPDATE to change 0 ~> NULL Extra OOs on Mondays 5-7pm in

More information

COMP9321 Web Application Engineering

COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411 1 Extensible

More information

Chapter 1: Introduction

Chapter 1: Introduction Chapter 1: Introduction Chapter 1: Introduction Purpose of Database Systems Database Languages Relational Databases Database Design Data Models Database Internals Database Users and Administrators Overall

More information

Data Formats and APIs

Data Formats and APIs Data Formats and APIs Mike Carey mjcarey@ics.uci.edu 0 Announcements Keep watching the course wiki page (especially its attachments): https://grape.ics.uci.edu/wiki/asterix/wiki/stats170ab-2018 Ditto for

More information

DIGIT.B4 Big Data PoC

DIGIT.B4 Big Data PoC DIGIT.B4 Big Data PoC GROW Transpositions D04.01.Information System Table of contents 1 Introduction... 4 1.1 Context of the project... 4 1.2 Objective... 4 2 Technologies used... 5 2.1 Python... 5 2.2

More information

JSON - Overview JSon Terminology

JSON - Overview JSon Terminology Announcements Introduction to Database Systems CSE 414 Lecture 12: Json and SQL++ Office hours changes this week Check schedule HW 4 due next Tuesday Start early WQ 4 due tomorrow 1 2 JSON - Overview JSon

More information

CSE 344 APRIL 16 TH SEMI-STRUCTURED DATA

CSE 344 APRIL 16 TH SEMI-STRUCTURED DATA CSE 344 APRIL 16 TH SEMI-STRUCTURED DATA ADMINISTRATIVE MINUTIAE HW3 due Wednesday OQ4 due Wednesday HW4 out Wednesday (Datalog) Exam May 9th 9:30-10:20 WHERE WE ARE So far we have studied the relational

More information

5/1/17. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414

5/1/17. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414 Announcements Database Systems CSE 414 Lecture 15: NoSQL & JSON (mostly not in textbook only Ch 11.1) 1 Homework 4 due tomorrow night [No Web Quiz 5] Midterm grading hopefully finished tonight post online

More information

XSL Languages. Adding styles to HTML elements are simple. Telling a browser to display an element in a special font or color, is easy with CSS.

XSL Languages. Adding styles to HTML elements are simple. Telling a browser to display an element in a special font or color, is easy with CSS. XSL Languages It started with XSL and ended up with XSLT, XPath, and XSL-FO. It Started with XSL XSL stands for EXtensible Stylesheet Language. The World Wide Web Consortium (W3C) started to develop XSL

More information

CSC 261/461 Database Systems. Fall 2017 MW 12:30 pm 1:45 pm CSB 601

CSC 261/461 Database Systems. Fall 2017 MW 12:30 pm 1:45 pm CSB 601 CSC 261/461 Database Systems Fall 2017 MW 12:30 pm 1:45 pm CSB 601 Agenda Administrative aspects Brief overview of the course Introduction to databases and SQL ADMINISTRATIVE ASPECTS Teaching Staff Instructor:

More information

Data. Notes. are required reading for the week. textbook reading and a few slides on data formats and data cleaning

Data. Notes. are required reading for the week. textbook reading and a few slides on data formats and data cleaning CS 725/825 Information Visualization Spring 2018 Data Dr. Michele C. Weigle http://www.cs.odu.edu/~mweigle/cs725-s18/ Notes } We will not cover these slides in class, but they are required reading for

More information

COMP9321 Web Application Engineering

COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2017 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid= 2465 1

More information

Descriptions. Robert Grimm New York University

Descriptions. Robert Grimm New York University Descriptions Robert Grimm New York University The Final Assignment! Your own application! Discussion board! Think: Paper summaries! Web cam proxy! Think: George Orwell or JenCam! Visitor announcement and

More information

Descriptions. Robert Grimm New York University

Descriptions. Robert Grimm New York University Descriptions Robert Grimm New York University The Final Assignment! Your own application! Discussion board! Think: Paper summaries! Time tracker! Think: Productivity tracking! Web cam proxy! Think: George

More information

Introduction to NoSQL Databases

Introduction to NoSQL Databases Introduction to NoSQL Databases Roman Kern KTI, TU Graz 2017-10-16 Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 1 / 31 Introduction Intro Why NoSQL? Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 2 / 31 Introduction

More information

Translating XSLT into XQuery

Translating XSLT into XQuery Translating into Albin Laga, Praveen Madiraju, Darrel A. Mazzari and Gowri Dara Department of Mathematics, Statistics, and Computer Science Marquette University P.O. Box 1881, Milwaukee, WI 53201 albin.laga,

More information

B4M36DS2, BE4M36DS2: Database Systems 2

B4M36DS2, BE4M36DS2: Database Systems 2 B4M36DS2, BE4M36DS2: Database Systems 2 h p://www.ksi.mff.cuni.cz/~svoboda/courses/171-b4m36ds2/ Lecture 2 Data Formats Mar n Svoboda mar n.svoboda@fel.cvut.cz 9. 10. 2017 Charles University in Prague,

More information

COMP9321 Web Application Engineering

COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 12 (Wrap-up) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411

More information

WebLearning IBM Curriculum

WebLearning IBM Curriculum WebLearning IBM Curriculum WebSphere Commerce Suite Marketplace Edition Implementation Table of Contents: Overview Who Should Take This Course What You Are Taught Topics Include Prerequisites Duration:

More information

Event Stores (I) [Source: DB-Engines.com, accessed on August 28, 2016]

Event Stores (I) [Source: DB-Engines.com, accessed on August 28, 2016] Event Stores (I) Event stores are database management systems implementing the concept of event sourcing. They keep all state changing events for an object together with a timestamp, thereby creating a

More information

COMP9321 Web Application Engineering

COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 1, 2017 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 12 (Wrap-up) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2457

More information

Requirements Specification

Requirements Specification Requirements Specification Smart Scheduling Requested by: Dr. Robert Yoder Associate Professor of Computer Science Computer Science Department Head Siena College Tom Mottola Jason Czajkowski Brian Maxwell

More information

COMP9321 Web Application Engineering. Extensible Markup Language (XML)

COMP9321 Web Application Engineering. Extensible Markup Language (XML) COMP9321 Web Application Engineering Extensible Markup Language (XML) Dr. Basem Suleiman Service Oriented Computing Group, CSE, UNSW Australia Semester 1, 2016, Week 4 http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2442

More information

Acceptance Test. Smart Scheduling. Empire Unlimited. Requested by:

Acceptance Test. Smart Scheduling. Empire Unlimited. Requested by: Smart Scheduling Requested by: Dr. Robert Yoder Computer Science Department Head Siena College Department of Computer Science Prepared by: Meghan Servello Thomas Mottola Jonathan Smith Jason Czajkowski

More information

Database System Concepts

Database System Concepts s Design Chapter 1: Introduction Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2009/2010 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth

More information

M359 Block5 - Lecture12 Eng/ Waleed Omar

M359 Block5 - Lecture12 Eng/ Waleed Omar Documents and markup languages The term XML stands for extensible Markup Language. Used to label the different parts of documents. Labeling helps in: Displaying the documents in a formatted way Querying

More information

Informatics 1: Data & Analysis

Informatics 1: Data & Analysis Informatics 1: Data & Analysis Lecture 9: Trees and XML Ian Stark School of Informatics The University of Edinburgh Tuesday 11 February 2014 Semester 2 Week 5 http://www.inf.ed.ac.uk/teaching/courses/inf1/da

More information

Introduction to JSON. Roger Lacroix MQ Technical Conference v

Introduction to JSON. Roger Lacroix  MQ Technical Conference v Introduction to JSON Roger Lacroix roger.lacroix@capitalware.com http://www.capitalware.com What is JSON? JSON: JavaScript Object Notation. JSON is a simple, text-based way to store and transmit structured

More information

Document stores using CouchDB

Document stores using CouchDB 2018 Document stores using CouchDB ADVANCED DATABASE PROJECT APARNA KHIRE, MINGRUI DONG aparna.khire@vub.be, mingdong@ulb.ac.be 1 Table of Contents 1. Introduction... 3 2. Background... 3 2.1 NoSQL Database...

More information

DIABLO VALLEY COLLEGE CATALOG

DIABLO VALLEY COLLEGE CATALOG COMPUTER SCIENCE COMSC Despina Prapavessi, Dean Math and Computer Science Division Math Building, Room 267 The computer science department offers courses in three general areas, each targeted to serve

More information

Assignment 11 (NF) - Repetition

Assignment 11 (NF) - Repetition Assignment 11 (NF) - Repetition -- no due date, no submission -- This assignment is meant to help you prepare for the exam. It is not necessary to turn in your solutions. The solutions will be discussed

More information

EAE-2037 Loading transactions into your EAE/ABSuite system. Unite 2012 Mike Bardell

EAE-2037 Loading transactions into your EAE/ABSuite system. Unite 2012 Mike Bardell EAE-2037 Loading transactions into your EAE/ABSuite system Unite 2012 Mike Bardell EAE 2037 Have you ever been asked to enter data from an external source into your application. Typically you would suggest

More information

Unit 10 Databases. Computer Concepts Unit Contents. 10 Operational and Analytical Databases. 10 Section A: Database Basics

Unit 10 Databases. Computer Concepts Unit Contents. 10 Operational and Analytical Databases. 10 Section A: Database Basics Unit 10 Databases Computer Concepts 2016 ENHANCED EDITION 10 Unit Contents Section A: Database Basics Section B: Database Tools Section C: Database Design Section D: SQL Section E: Big Data Unit 10: Databases

More information

Announcements. Two Classes of Database Applications. Class Overview. NoSQL Motivation. RDBMS Review: Serverless

Announcements. Two Classes of Database Applications. Class Overview. NoSQL Motivation. RDBMS Review: Serverless Introduction to Database Systems CSE 414 Lecture 11: NoSQL 1 HW 3 due Friday Announcements Upload data with DataGrip editor see message board Azure timeout for question 5: Try DataGrip or SQLite HW 2 Grades

More information

XSL Transformation (XSLT) XSLT Processors. Example XSLT Stylesheet. Calling XSLT Processor. XSLT Structure

XSL Transformation (XSLT) XSLT Processors. Example XSLT Stylesheet. Calling XSLT Processor. XSLT Structure Transformation (T) SOURCE The very best of Cat Stevens UK 8.90 1990 Empire Burlesque Bob

More information

CSC Web Technologies, Spring Web Data Exchange Formats

CSC Web Technologies, Spring Web Data Exchange Formats CSC 342 - Web Technologies, Spring 2017 Web Data Exchange Formats Web Data Exchange Data exchange is the process of transforming structured data from one format to another to facilitate data sharing between

More information

Dealing with Data Especially Big Data

Dealing with Data Especially Big Data Dealing with Data Especially Big Data INFO-GB-2346.01 Fall 2017 Professor Norman White nwhite@stern.nyu.edu normwhite@twitter Teaching Assistant: Frenil Sanghavi fps241@stern.nyu.edu Administrative Assistant:

More information

COSC 416 NoSQL Databases. NoSQL Databases Overview. Dr. Ramon Lawrence University of British Columbia Okanagan

COSC 416 NoSQL Databases. NoSQL Databases Overview. Dr. Ramon Lawrence University of British Columbia Okanagan COSC 416 NoSQL Databases NoSQL Databases Overview Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca Databases Brought Back to Life!!! Image copyright: www.dragoart.com Image

More information

COURSE OVERVIEW THE RELATIONAL MODEL. CS121: Relational Databases Fall 2017 Lecture 1

COURSE OVERVIEW THE RELATIONAL MODEL. CS121: Relational Databases Fall 2017 Lecture 1 COURSE OVERVIEW THE RELATIONAL MODEL CS121: Relational Databases Fall 2017 Lecture 1 Course Overview 2 Introduction to relational database systems Theory and use of relational databases Focus on: The Relational

More information

DATA COLLECTION. Slides by WESLEY WILLETT 13 FEB 2014

DATA COLLECTION. Slides by WESLEY WILLETT 13 FEB 2014 DATA COLLECTION Slides by WESLEY WILLETT INFO VISUAL 340 ANALYTICS D 13 FEB 2014 WHERE DOES DATA COME FROM? We tend to think of data as a thing in a database somewhere WHY DO YOU NEED DATA? (HINT: Usually,

More information

COURSE OVERVIEW THE RELATIONAL MODEL. CS121: Introduction to Relational Database Systems Fall 2016 Lecture 1

COURSE OVERVIEW THE RELATIONAL MODEL. CS121: Introduction to Relational Database Systems Fall 2016 Lecture 1 COURSE OVERVIEW THE RELATIONAL MODEL CS121: Introduction to Relational Database Systems Fall 2016 Lecture 1 Course Overview 2 Introduction to relational database systems Theory and use of relational databases

More information

Automated Classification. Lars Marius Garshol Topic Maps

Automated Classification. Lars Marius Garshol Topic Maps Automated Classification Lars Marius Garshol Topic Maps 2007 2007-03-21 Automated classification What is it? Why do it? 2 What is automated classification? Create parts of a topic map

More information

15-388/688 - Practical Data Science: Data collection and scraping. J. Zico Kolter Carnegie Mellon University Spring 2017

15-388/688 - Practical Data Science: Data collection and scraping. J. Zico Kolter Carnegie Mellon University Spring 2017 15-388/688 - Practical Data Science: Data collection and scraping J. Zico Kolter Carnegie Mellon University Spring 2017 1 Outline The data collection process Common data formats and handling Regular expressions

More information

Creating an Online Catalogue Search for CD Collection with AJAX, XML, and PHP Using a Relational Database Server on WAMP/LAMP Server

Creating an Online Catalogue Search for CD Collection with AJAX, XML, and PHP Using a Relational Database Server on WAMP/LAMP Server CIS408 Project 5 SS Chung Creating an Online Catalogue Search for CD Collection with AJAX, XML, and PHP Using a Relational Database Server on WAMP/LAMP Server The catalogue of CD Collection has millions

More information

relational Key-value Graph Object Document

relational Key-value Graph Object Document NoSQL Databases Earlier We have spent most of our time with the relational DB model so far. There are other models: Key-value: a hash table Graph: stores graph-like structures efficiently Object: good

More information

Data Foundations. Topic Objectives. and list subcategories of each. its properties. before producing a visualization. subsetting

Data Foundations. Topic Objectives. and list subcategories of each. its properties. before producing a visualization. subsetting CS 725/825 Information Visualization Fall 2013 Data Foundations Dr. Michele C. Weigle http://www.cs.odu.edu/~mweigle/cs725-f13/ Topic Objectives! Distinguish between ordinal and nominal values and list

More information

CS50 Quiz Review. November 13, 2017

CS50 Quiz Review. November 13, 2017 CS50 Quiz Review November 13, 2017 Info http://docs.cs50.net/2017/fall/quiz/about.html 48-hour window in which to take the quiz. You should require much less than that; expect an appropriately-scaled down

More information

Chapter 1: Introduction. Chapter 1: Introduction

Chapter 1: Introduction. Chapter 1: Introduction Chapter 1: Introduction Database System Concepts, 5th Ed. See www.db-book.com for conditions on re-use Chapter 1: Introduction Purpose of Database Systems View of Data Database Languages Relational Databases

More information

Semantic Web Tools. Federico Chesani 18 Febbraio 2010

Semantic Web Tools. Federico Chesani 18 Febbraio 2010 Semantic Web Tools Federico Chesani 18 Febbraio 2010 Outline A unique way for identifying concepts How to uniquely identified concepts? -> by means of a name system... SW exploits an already available

More information

Now go to bash and type the command ls to list files. The unix command unzip <filename> unzips a file.

Now go to bash and type the command ls to list files. The unix command unzip <filename> unzips a file. wrangling data unix terminal and filesystem Grab data-examples.zip from top of lecture 4 notes and upload to main directory on c9.io. (No need to unzip yet.) Now go to bash and type the command ls to list

More information

A tutorial report for SENG Agent Based Software Engineering. Course Instructor: Dr. Behrouz H. Far. XML Tutorial.

A tutorial report for SENG Agent Based Software Engineering. Course Instructor: Dr. Behrouz H. Far. XML Tutorial. A tutorial report for SENG 609.22 Agent Based Software Engineering Course Instructor: Dr. Behrouz H. Far XML Tutorial Yanan Zhang Department of Electrical and Computer Engineering University of Calgary

More information

20762B: DEVELOPING SQL DATABASES

20762B: DEVELOPING SQL DATABASES ABOUT THIS COURSE This five day instructor-led course provides students with the knowledge and skills to develop a Microsoft SQL Server 2016 database. The course focuses on teaching individuals how to

More information

Overview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL

Overview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL * Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy * Towards NewSQL Overview * Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy *TowardsNewSQL NoSQL

More information

Introduction to Azure DocumentDB. Jeff Renz, BI Architect RevGen Partners

Introduction to Azure DocumentDB. Jeff Renz, BI Architect RevGen Partners Introduction to Azure DocumentDB Jeff Renz, BI Architect RevGen Partners Thank You Presenting Sponsors Gain insights through familiar tools while balancing monitoring and managing user created content

More information

Designing Database Solutions for Microsoft SQL Server 2012

Designing Database Solutions for Microsoft SQL Server 2012 Designing Database Solutions for Microsoft SQL Server 2012 Course 20465A 5 Days Instructor-led, Hands-on Introduction This course describes how to design and monitor high performance, highly available

More information

Chapter 1: Introduction

Chapter 1: Introduction Chapter 1: Introduction Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Outline The Need for Databases Data Models Relational Databases Database Design Storage Manager Query

More information

NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY WHAT IS NOSQL? Stands for No-SQL or Not Only SQL. Class of non-relational data storage systems E.g.

More information

Outline. Databases and DBMS s. Recent Database Applications. Earlier Database Applications. CMPSCI445: Information Systems.

Outline. Databases and DBMS s. Recent Database Applications. Earlier Database Applications. CMPSCI445: Information Systems. Outline CMPSCI445: Information Systems Overview of databases and DBMS s Course topics and requirements Yanlei Diao University of Massachusetts Amherst Databases and DBMS s Commercial DBMS s A database

More information

USING THE MUSICBRAINZ DATABASE IN THE CLASSROOM. Cédric Mesnage Southampton Solent University United Kingdom

USING THE MUSICBRAINZ DATABASE IN THE CLASSROOM. Cédric Mesnage Southampton Solent University United Kingdom USING THE MUSICBRAINZ DATABASE IN THE CLASSROOM Cédric Mesnage Southampton Solent University United Kingdom Abstract Musicbrainz is a crowd-sourced database of music metadata. The level 6 class of Data

More information

CSE 544 Principles of Database Management Systems. Fall 2016 Lecture 4 Data models A Never-Ending Story

CSE 544 Principles of Database Management Systems. Fall 2016 Lecture 4 Data models A Never-Ending Story CSE 544 Principles of Database Management Systems Fall 2016 Lecture 4 Data models A Never-Ending Story 1 Announcements Project Start to think about class projects More info on website (suggested topics

More information

Database Management System. Fundamental Database Concepts

Database Management System. Fundamental Database Concepts Database Management System Fundamental Database Concepts CONTENTS Basics of DBMS Purpose of DBMS Applications of DBMS Views of Data Instances and Schema Data Models Database Languages Responsibility of

More information

Announcements. PS 3 is out (see the usual place on the course web) Be sure to read my notes carefully Also read. Take a break around 10:15am

Announcements. PS 3 is out (see the usual place on the course web) Be sure to read my notes carefully Also read. Take a break around 10:15am Announcements PS 3 is out (see the usual place on the course web) Be sure to read my notes carefully Also read SQL tutorial: http://www.w3schools.com/sql/default.asp Take a break around 10:15am 1 Databases

More information

Comp 336/436 - Markup Languages. Fall Semester Week 4. Dr Nick Hayward

Comp 336/436 - Markup Languages. Fall Semester Week 4. Dr Nick Hayward Comp 336/436 - Markup Languages Fall Semester 2018 - Week 4 Dr Nick Hayward XML - recap first version of XML became a W3C Recommendation in 1998 a useful format for data storage and exchange config files,

More information

A practical introduction to database design

A practical introduction to database design A practical introduction to database design Dr. Chris Tomlinson Bioinformatics Data Science Group, Room 126, Sir Alexander Fleming Building chris.tomlinson@imperial.ac.uk Computer Skills Classes 17/01/19

More information

Data Science Services Dirk Engfer Page 1 of 5

Data Science Services Dirk Engfer Page 1 of 5 Page 1 of 5 Services SAS programming Conform to CDISC SDTM and ADaM within clinical trials. Create textual outputs (tables, listings) and graphical output. Establish SAS macros for repetitive tasks and

More information

Relational Database Features

Relational Database Features Relational Features s Why has the relational model been so successful? Data independence High level query language - SQL Query optimisation Support for integrity constraints Well-understood database design

More information

XML Processing & Web Services. Husni Husni.trunojoyo.ac.id

XML Processing & Web Services. Husni Husni.trunojoyo.ac.id XML Processing & Web Services Husni Husni.trunojoyo.ac.id Based on Randy Connolly and Ricardo Hoar Fundamentals of Web Development, Pearson Education, 2015 Objectives 1 XML Overview 2 XML Processing 3

More information

CS425 Fall 2016 Boris Glavic Chapter 1: Introduction

CS425 Fall 2016 Boris Glavic Chapter 1: Introduction CS425 Fall 2016 Boris Glavic Chapter 1: Introduction Modified from: Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Textbook: Chapter 1 1.2 Database Management System (DBMS)

More information

Pre-Requisites: CS2510. NU Core Designations: AD

Pre-Requisites: CS2510. NU Core Designations: AD DS4100: Data Collection, Integration and Analysis Teaches how to collect data from multiple sources and integrate them into consistent data sets. Explains how to use semi-automated and automated classification

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri and Shamkant B. Navathe CHAPTER 1 Databases and Database Users Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Slide 1-2 OUTLINE Types of Databases and Database Applications

More information

FAQs. Business (CIP 2.2) AWS Market Place Troubleshooting and FAQ Guide

FAQs. Business (CIP 2.2) AWS Market Place Troubleshooting and FAQ Guide FAQs 1. What is the browser compatibility for logging into the TCS Connected Intelligence Data Lake for Business Portal? Please check whether you are using Mozilla Firefox 18 or above and Google Chrome

More information

BUILDING THE SEMANTIC WEB

BUILDING THE SEMANTIC WEB BUILDING THE SEMANTIC WEB You might have come across the term Semantic Web Applications often, during talks about the future of Web apps. Check out what this is all about There are two aspects to the possible

More information

XML. Jonathan Geisler. April 18, 2008

XML. Jonathan Geisler. April 18, 2008 April 18, 2008 What is? IS... What is? IS... Text (portable) What is? IS... Text (portable) Markup (human readable) What is? IS... Text (portable) Markup (human readable) Extensible (valuable for future)

More information

Chapter 1: Introduction

Chapter 1: Introduction This image cannot currently be displayed. Chapter 1: Introduction Database System Concepts, 5th Ed. See www.db-book.com for conditions on re-use Chapter 1: Introduction Purpose of Database Systems View

More information

Introduction to Database Systems CSE 414

Introduction to Database Systems CSE 414 Introduction to Database Systems CSE 414 Lecture 14-15: XML CSE 414 - Spring 2013 1 Announcements Homework 4 solution will be posted tomorrow Midterm: Monday in class Open books, no notes beyond one hand-written

More information

CSCI3030U Database Models

CSCI3030U Database Models CSCI3030U Database Models CSCI3030U RELATIONAL MODEL SEMISTRUCTURED MODEL 1 Content Design of databases. relational model, semistructured model. Database programming. SQL, XPath, XQuery. Not DBMS implementation.

More information

CS / Cloud Computing. Recitation 7 October 7 th and 9 th, 2014

CS / Cloud Computing. Recitation 7 October 7 th and 9 th, 2014 CS15-319 / 15-619 Cloud Computing Recitation 7 October 7 th and 9 th, 2014 15-619 Project Students enrolled in 15-619 Since 12 units, an extra project worth 3-units Project will be released this week Team

More information

Introduction to Programming

Introduction to Programming Introduction to Programming Course ISI-1329 - Three Days - Instructor-Led Introduction This three-day, instructor-led course introduces students to computer programming. Students will learn the fundamental

More information

Command Line and Python Introduction. Jennifer Helsby, Eric Potash Computation for Public Policy Lecture 2: January 7, 2016

Command Line and Python Introduction. Jennifer Helsby, Eric Potash Computation for Public Policy Lecture 2: January 7, 2016 Command Line and Python Introduction Jennifer Helsby, Eric Potash Computation for Public Policy Lecture 2: January 7, 2016 Today Assignment #1! Computer architecture Basic command line skills Python fundamentals

More information

Database Technology Introduction. Heiko Paulheim

Database Technology Introduction. Heiko Paulheim Database Technology Introduction Outline The Need for Databases Data Models Relational Databases Database Design Storage Manager Query Processing Transaction Manager Introduction to the Relational Model

More information

Chapter 1: Introduction

Chapter 1: Introduction Chapter 1: Introduction Slides are slightly modified by F. Dragan Database System Concepts, 5th Ed. See www.db-book.com for conditions on re-use Chapter 1: Introduction Purpose of Database Systems View

More information

8/1/2016. XSL stands for EXtensible Stylesheet Language. CSS = Style Sheets for HTML XSL = Style Sheets for XML. XSL consists of four parts:

8/1/2016. XSL stands for EXtensible Stylesheet Language. CSS = Style Sheets for HTML XSL = Style Sheets for XML. XSL consists of four parts: XSL stands for EXtensible Stylesheet Language. CSS = Style Sheets for HTML XSL = Style Sheets for XML http://www.w3schools.com/xsl/ kasunkosala@yahoo.com 1 2 XSL consists of four parts: XSLT - a language

More information

Delivery Options: Attend face-to-face in the classroom or remote-live attendance.

Delivery Options: Attend face-to-face in the classroom or remote-live attendance. XML Programming Duration: 5 Days Price: $2795 *California residents and government employees call for pricing. Discounts: We offer multiple discount options. Click here for more info. Delivery Options:

More information

Introduction to Database Systems CSE 414

Introduction to Database Systems CSE 414 Introduction to Database Systems CSE 414 Lecture 13: XML and XPath 1 Announcements Current assignments: Web quiz 4 due tonight, 11 pm Homework 4 due Wednesday night, 11 pm Midterm: next Monday, May 4,

More information

Microsoft. [MS20762]: Developing SQL Databases

Microsoft. [MS20762]: Developing SQL Databases [MS20762]: Developing SQL Databases Length : 5 Days Audience(s) : IT Professionals Level : 300 Technology : Microsoft SQL Server Delivery Method : Instructor-led (Classroom) Course Overview This five-day

More information

Designing Database Solutions for Microsoft SQL Server 2012

Designing Database Solutions for Microsoft SQL Server 2012 Course 20465 : Designing Database Solutions for Microsoft SQL Server 2012 Page 1 of 6 Designing Database Solutions for Microsoft SQL Server 2012 Course 20465: 4 days; Instructor-Led Introduction This course

More information

From the Web to the Semantic Web: RDF and RDF Schema

From the Web to the Semantic Web: RDF and RDF Schema From the Web to the Semantic Web: RDF and RDF Schema Languages for web Master s Degree Course in Computer Engineering - (A.Y. 2016/2017) The Semantic Web [Berners-Lee et al., Scientific American, 2001]

More information

Microsoft Developing SQL Databases

Microsoft Developing SQL Databases 1800 ULEARN (853 276) www.ddls.com.au Length 5 days Microsoft 20762 - Developing SQL Databases Price $4290.00 (inc GST) Version C Overview This five-day instructor-led course provides students with the

More information