Big Data with rubygems.org Download Data. Aja Hammerly
|
|
- Avice Hensley
- 5 years ago
- Views:
Transcription
1 Big Data with rubygems.org Download Data Aja Hammerly
2 Aja Hammerly
3
4 Lawyer Cat Says: Any code is copyright Google and licensed Apache
5 @thagomizer_rb Big Data
6 @thagomizer_rb DATA
7 @thagomizer_rb Big Data
8 @thagomizer_rb Storage is Cheap
9 @thagomizer_rb Intimidating
10 @thagomizer_rb OMG Statistics
11 @thagomizer_rb
12 @thagomizer_rb Machine Learning
13 @thagomizer_rb Exploratory
14 Rubygems Download
15 @thagomizer_rb Overview
16 @thagomizer_rb rubygems
17 Column Name id name created_at updated_at slug Type integer varchar datetime datetime
18 Column Name id name created_at updated_at slug Type integer varchar datetime datetime
19 @thagomizer_rb 126,007
20 @thagomizer_rb gem_downloads
21 Column Name Type id integer rubygem_id integer version_id integer count
22 @thagomizer_rb 883,848
23 @thagomizer_rb dependencies
24 Column Name id requirements rubygem_id version_id scope created_at updated_at unresolved_name Type integer varchar integer integer varchar datetime datetime
25 Column Name id requirements rubygem_id version_id scope created_at updated_at unresolved_name Type integer varchar integer integer varchar datetime datetime
26 @thagomizer_rb 3,638,968
27 @thagomizer_rb linksets
28 Column Name id rubygem_id home wiki docs mail code bugs created_at updated_at Type integer integer varchar varchar varchar varchar varchar varchar datetime
29 @thagomizer_rb 125,932
30 @thagomizer_rb versions
31 Column Name Type Column Name Type id integer authors text rubygem_id integer description text size integer summary text position integer requirements text number varchar platform varchar indexed boolean full_name varchar prerelease boolean licenses varchar latest boolean required_ruby_version varchar yanked_at datetime required_rubygems_version varchar built_at datetime info_checksum varchar updated_at datetime metadata hstore created_at datetime sha256
32 Column Name Type Column Name Type id integer authors text rubygem_id integer description text size integer summary text position integer requirements text number varchar platform varchar indexed boolean full_name varchar prerelease boolean licenses varchar latest boolean required_ruby_version varchar yanked_at datetime required_rubygems_version varchar built_at datetime info_checksum varchar updated_at datetime metadata hstore created_at datetime sha256
33 Column Name Type Column Name Type id integer authors text rubygem_id integer description text size integer summary text position integer requirements text number varchar platform varchar indexed boolean full_name varchar prerelease boolean licenses varchar latest boolean required_ruby_version varchar yanked_at datetime required_rubygems_version varchar built_at datetime info_checksum varchar updated_at datetime metadata hstore created_at datetime sha256
34 Column Name Type Column Name Type id integer authors text rubygem_id integer description text size integer summary text position integer requirements text number varchar platform varchar indexed boolean full_name varchar prerelease boolean licenses varchar latest boolean required_ruby_version varchar yanked_at datetime required_rubygems_version varchar built_at datetime info_checksum varchar updated_at datetime metadata hstore created_at datetime sha256
35 Column Name Type Column Name Type id integer authors text rubygem_id integer description text size integer summary text position integer requirements text number varchar platform varchar indexed boolean full_name varchar prerelease boolean licenses varchar latest boolean required_ruby_version varchar yanked_at datetime required_rubygems_version varchar built_at datetime info_checksum varchar updated_at datetime metadata hstore created_at datetime sha256
36 Column Name Type Column Name Type id integer authors text rubygem_id integer description text size integer summary text position integer requirements text number varchar platform varchar indexed boolean full_name varchar prerelease boolean licenses varchar latest boolean required_ruby_version varchar yanked_at datetime required_rubygems_version varchar built_at datetime info_checksum varchar updated_at datetime metadata hstore created_at datetime sha256
37 @thagomizer_rb 757,920
38 @thagomizer_rb Asking Questions
39 @thagomizer_rb Domain Knowledge
40 @thagomizer_rb Hypothesis
41 @thagomizer_rb Examples
42 The gem with the most downloads is
43 MiniTest is more popular than
44 Gems released in the last year require ruby >
45 Rails 3 is still more popular than rails
46 Fewer gems are released during
47 @thagomizer_rb Largish Data
48 @thagomizer_rb BigQuery
49 @thagomizer_rb What
50 @thagomizer_rb Why
51 @thagomizer_rb How
52 @thagomizer_rb I BigQuery
53 @thagomizer_rb SQL
54 @thagomizer_rb Fast
55 @thagomizer_rb Scales
56 @thagomizer_rb Complex Enough
57 @thagomizer_rb Demo
58 @thagomizer_rb Vocabulary
59 @thagomizer_rb Dataset
60 @thagomizer_rb Table
61 @thagomizer_rb Import
62 @thagomizer_rb Streaming
63 @thagomizer_rb gcloud
64 @thagomizer_rb pg
65 require 'pg' require 'gcloud' ENV["GOOGLE_CLOUD_PROJECT"] = "rubygems-bigquery" ENV["GOOGLE_CLOUD_KEYFILE"] =
66 gcloud bigquery = Gcloud.new = gcloud.bigquery bq_database = bigquery.dataset
67 @thagomizer_rb postgres = PG.connect dbname: "rubygems"
68 bq_table = bq_database.create_table("gems") do s s.integer s.string "id" "name" end s.timestamp "created_at" s.timestamp
69 @thagomizer_rb columns = %w[id name created_at updated_at]
70 postgres.exec("select * FROM rubygems") do pg_table pg_table.each do row hashed_row = Hash[columns.zip(row.values)] bq_table.insert(data) end
71 postgres.exec("select * FROM rubygems") do pg_table pg_table.each do row hashed_row = Hash[columns.zip(row.values)] bq_table.insert(data) end
72 postgres.exec("select * FROM rubygems") do pg_table pg_table.each do row hashed_row = Hash[columns.zip(row.values)] bq_table.insert(data) end
73 postgres.exec("select * FROM rubygems") do pg_table pg_table.each do row hashed_row = Hash[columns.zip(row.values)] bq_table.insert(hashed_row) end
74 @thagomizer_rb Zip & Hash[]
75 [ key1, key2, key3, key4 ] [ val1, val2, val3, val4
76 @thagomizer_rb zip
77 [ key1, key2, key3, key4 ] [ val1, val2, val3, val4 ] [[, ], [, ], [, ], [,
78 [ key1, key2, key3, key4 ] [ val1, val2, val3, val4 ] [[ key1, val1], [ key2, val2], [ key3, val3], [ key4,
79 [[key1, val1], [key2, val2], [key3, val3], [key4,
80 @thagomizer_rb Hash::[]
81 Hash[[key1, val1], [key2, val2], [key3, val3], [key4,
82 { key1 => val1, key2 => val2, key3 => val3, key4 => val4
83 @thagomizer_rb Hash[keys.zip(values)]
84 postgres.exec("select * FROM rubygems") do pg_table pg_table.each do row hashed_row = Hash[columns.zip(row.values)] bq_table.insert(hashed_row) end
85 @thagomizer_rb Batch
86 @thagomizer_rb Formats
87 @thagomizer_rb CSV
88 @thagomizer_rb JSON
89 @thagomizer_rb Avro
90 @thagomizer_rb CSV
91 require 'pg' require 'csv' require
92 postgres = PG.connect dbname: "rubygems" cols = %w[id requirements created_at updated_at rubygem_id version_id
93 query = "SELECT #{cols.join(',')} FROM dependencies" CSV.open(csv_path, "wb") do csv postgres.exec(query) do pg_table pg_table.each do row csv << row.values end end
94 storage = Gcloud.new.storage bucket = storage.bucket "goruco2016-bg-files" bucket.create_file csv_path,
95 @thagomizer_rb Import
96
97
98 @thagomizer_rb What Now?
99 @thagomizer_rb rubygems
100 @thagomizer_rb Simple
101 @thagomizer_rb Rails has the most downloads.
102 Which gem has the most
103 SELECT name, count FROM [rubygems.downloads] JOIN rubygems.gems ON rubygems.gems.id = rubygems.downloads.rubygem_id ORDER BY count DESC LIMIT
104 name count rake 107,076,261 rack 100,955,906 multi_json 100,171,080 json 95,715,131 bundler
105 SELECT name, sum(count) as total FROM [rubygems.downloads] JOIN rubygems.gems ON rubygems.gems.id = rubygems.downloads.rubygem_id GROUP BY name ORDER BY total DESC LIMIT
106 name count rake 214,152,212 rack 201,911,759 multi_json 200,342,260 json 191,430,173 bundler
107 How many downloads does Rails
108 SELECT name, sum(count) as total FROM [rubygems.downloads] JOIN rubygems.gems ON rubygems.gems.id = rubygems.downloads.rubygem_id WHERE name =
109 name total rails
110 Minitest is more popular than
111 SELECT name, sum(count) as total FROM [rubygems.downloads] JOIN rubygems.gems ON rubygems.gems.id = rubygems.downloads.rubygem_id GROUP BY name HAVING name IN ('minitest',
112 name total minitest rspec
113 Gems released in the last year require ruby >
114 SELECT required_ruby_version, COUNT(*) AS total FROM rubygems.versions WHERE created_at > DATE_ADD(CURRENT_TIMESTAMP(), -1, "YEAR") GROUP BY required_ruby_version ORDER BY total
115 name total >= 0 95,857 >= ,069 >= ,624 >= 2.0 1,648 >=
116 @thagomizer_rb Complex
117 Rails 3 has more downloads than the other Rails major
118 SELECT name, REGEXP_EXTRACT(number,r'(\d\.)') AS major, sum(rubygems.downloads.count) AS total FROM [rubygems.versions] JOIN rubygems.gems ON rubygems.gems.id = rubygems.versions.rubygem_id JOIN rubygems.downloads ON rubygems.versions.rubygem_id = rubygems.downloads.rubygem_id WHERE rubygems.gems.name = 'rails' GROUP BY name, major ORDER BY
119 SELECT name, REGEXP_EXTRACT(number,r'(\d\.)') as major, sum(rubygems.downloads.count) as total FROM [rubygems.versions] JOIN rubygems.gems ON rubygems.gems.id = rubygems.versions.rubygem_id JOIN rubygems.downloads ON rubygems.versions.rubygem_id = rubygems.downloads.rubygem_id WHERE rubygems.gems.name = 'rails' GROUP BY name, major order by
120 @thagomizer_rb REGEXP_EXTRACT(number,r'(\d\.)') as major
121 version downloads 0 2,890,350, ,064,535, ,991,436, ,378,651, ,662,487,252 5
122 version downloads 0 2, , , , ,662 5
123 Gems released in the last year require ruby >
124 SELECT required_ruby_version, COUNT(*) AS total FROM rubygems.versions WHERE created_at > DATE_ADD(CURRENT_TIMESTAMP(), -1, "YEAR") GROUP BY required_ruby_version ORDER BY total
125 SELECT REGEXP_EXTRACT(required_ruby_version, r'(.*?\d\.?)') AS version, COUNT(*) AS total FROM rubygems.versions WHERE created_at > DATE_ADD(CURRENT_TIMESTAMP(), -1, "YEAR") GROUP BY version ORDER BY total
126 name total >= 0 95,851 >= 1 13,080 >= 2 12,944 ~> 2 2,040 > 2
127 @thagomizer_rb Thank You
128 @thagomizer_rb
Announcements. Multi-column Keys. Multi-column Keys. Multi-column Keys (3) Multi-column Keys (2) Introduction to Data Management CSE 414
Introduction to Data Management CSE 414 Lecture 3: More SQL (including most of Ch. 6.1-6.2) Announcements WQ2 will be posted tomorrow and due on Oct. 17, 11pm HW2 will be posted tomorrow and due on Oct.
More informationIntroduction to Data Management CSE 414
Introduction to Data Management CSE 414 Lecture 3: More SQL (including most of Ch. 6.1-6.2) Overload: https://goo.gl/forms/2pfbteexg5l7wdc12 CSE 414 - Fall 2017 1 Announcements WQ2 will be posted tomorrow
More informationAnnouncements. Multi-column Keys. Multi-column Keys (3) Multi-column Keys. Multi-column Keys (2) Introduction to Data Management CSE 414
Introduction to Data Management CSE 414 Announcements Reminder: first web quiz due Sunday Lecture 3: More SQL (including most of Ch. 6.1-6.2) CSE 414 - Spring 2017 1 CSE 414 - Spring 2017 2 Multi-column
More informationStupid Ideas for Many Computers. Aja
Stupid Ideas for Many Computers Aja Hammerly @thagomizer_rb My first Ruby Conf https://www.flickr.com/ @thagomizer_rb photos/jamisonjudd/ 110% More Bad Ideas AT SCALE @thagomizer_rb Aja Hammerly http://github.com/thagomizer/stupidideas
More informationDatabases - Have it your way
Databases - Have it your way Frederick Cheung - kgb fred@texperts.com http://www.spacevatican.org 1 kgb Operates a number of Directory Enquiry type products in several countries Runs the 542542 Ask Us
More informationComp 97: Design Document
Tufts University School of Engineering Department of Electrical and Computer Engineering Comp 97: Design Document Fall 2013 Name: Jillian Silver Josh Fishbein Jillian.Silver@ tufts.edu Joshua.fishbein@tufts.edu
More informationApache Drill. Interactive Analysis of Large-Scale Datasets. Tomer Shiran
Apache Drill Interactive Analysis of Large-Scale Datasets Tomer Shiran Latency Matters Ad-hoc analysis with interactive tools Real-time dashboards Event/trend detection Network intrusions Fraud Failures
More informationKeeping Rails on the Tracks
Keeping Rails on the Tracks Mikel Lindsaar @raasdnil lindsaar.net Working in Rails & Ruby for 5+ Years http://lindsaar.net/ http://stillalive.com/ http://rubyx.com/ On the Rails? What do I mean by on the
More informationIntroduction to Hive Cloudera, Inc.
Introduction to Hive Outline Motivation Overview Data Model Working with Hive Wrap up & Conclusions Background Started at Facebook Data was collected by nightly cron jobs into Oracle DB ETL via hand-coded
More informationMarathon Documentation
Marathon Documentation Release 3.0.0 Top Free Games Feb 07, 2018 Contents 1 Overview 3 1.1 Features.................................................. 3 1.2 Architecture...............................................
More informationMySQL Workshop. Scott D. Anderson
MySQL Workshop Scott D. Anderson Workshop Plan Part 1: Simple Queries Part 2: Creating a database: creating a table inserting, updating and deleting data handling NULL values datatypes Part 3: Joining
More informationA Tutorial on Apache Spark
A Tutorial on Apache Spark A Practical Perspective By Harold Mitchell The Goal Learning Outcomes The Goal Learning Outcomes NOTE: The setup, installation, and examples assume Windows user Learn the following:
More information레드마인 설치 작성일 : 작성자 : 김종열
레드마인 2.3.3 설치 작성일 : 2013-11-2 작성자 : 김종열 기준문서 : http://www.redmine.or.kr/projects/community/wiki/%eb%a0%88%eb%93%9c%eb%a7%88%ec%9d %B8_%EC%84%A4%EC%B9%98(Windows) 설치홖경 OS: Windows 7 64 DB: Mysql 5.5 이상
More informationAsanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks
Asanka Padmakumara ETL 2.0: Data Engineering with Azure Databricks Who am I? Asanka Padmakumara Business Intelligence Consultant, More than 8 years in BI and Data Warehousing A regular speaker in data
More informationUn'introduzione a Kafka Streams e KSQL and why they matter! ITOUG Tech Day Roma 1 Febbraio 2018
Un'introduzione a Kafka Streams e KSQL and why they matter! ITOUG Tech Day Roma 1 Febbraio 2018 R E T H I N K I N G Stream Processing with Apache Kafka Kafka the Streaming Data Platform 1.0 Enterprise
More informationApache Hive for Oracle DBAs. Luís Marques
Apache Hive for Oracle DBAs Luís Marques About me Oracle ACE Alumnus Long time open source supporter Founder of Redglue (www.redglue.eu) works for @redgluept as Lead Data Architect @drune After this talk,
More informationMySQL 101. Designing effective schema for InnoDB. Yves Trudeau April 2015
MySQL 101 Designing effective schema for InnoDB Yves Trudeau April 2015 About myself : Yves Trudeau Principal architect at Percona since 2009 With MySQL then Sun, 2007 to 2009 Focus on MySQL HA and distributed
More informationSQL. Often times, in order for us to build the most functional website we can, we depend on a database to store information.
Often times, in order for us to build the most functional website we can, we depend on a database to store information. If you ve ever used Microsoft Excel or Google Spreadsheets (among others), odds are
More informationRails: Models. Computer Science and Engineering College of Engineering The Ohio State University. Lecture 25
Rails: Models Computer Science and Engineering College of Engineering The Ohio State University Lecture 25 Recall: Rails Architecture Recall: Rails Architecture Mapping Tables to Objects General strategy
More informationMachine Learning & Google Big Query. Data collection and exploration notes from the field
Machine Learning & Google Big Query Data collection and exploration notes from the field Limited to support of Machine Learning (ML) tasks Review tasks common to ML use cases Data Exploration Text Classification
More informationIngesting Streaming Data for Analysis in Apache Ignite. Pat Patterson
Ingesting Streaming Data for Analysis in Apache Ignite Pat Patterson StreamSets pat@streamsets.com @metadaddy Agenda Product Support Use Case Continuous Queries in Apache Ignite Integrating StreamSets
More informationCMSC 330: Organization of Programming Languages. Markup & Query Languages
CMSC 330: Organization of Programming Languages Markup & Query Languages Other Language Types Markup languages Set of annotations to text Query languages Make queries to databases & information systems
More informationCMSC 330: Organization of Programming Languages
CMSC 330: Organization of Programming Languages Markup & Query Languages Other Language Types Markup languages Set of annotations to text Query languages Make queries to databases & information systems
More informationSQL (and MySQL) Useful things I have learnt, borrowed and stolen
SQL (and MySQL) Useful things I have learnt, borrowed and stolen MySQL truncates data MySQL truncates data CREATE TABLE pets ( id INT UNSIGNED NOT NULL AUTO_INCREMENT, type CHAR(3) NOT NULL, PRIMARY KEY
More informationHadoop is supplemented by an ecosystem of open source projects IBM Corporation. How to Analyze Large Data Sets in Hadoop
Hadoop Open Source Projects Hadoop is supplemented by an ecosystem of open source projects Oozie 25 How to Analyze Large Data Sets in Hadoop Although the Hadoop framework is implemented in Java, MapReduce
More informationPostgreSQL Installation Guide
PostgreSQL Installation Guide Version 1 Updated March 2018 Copyright 2018 Boston University. All Rights Reserved. Contents Introduction... 3 PostgreSQL Overview... 3 Downloading PostgreSQL... 4 Step 1:
More informationHandling Non-Relational Databases on Cloud using Scheduling Approach with Performance Analysis
Handling Non-Relational Databases on Cloud using Scheduling Approach with Performance Analysis *1 Bansri Kotecha, 2 Hetal Joshiyara 1 PG Scholar, 2 Assistant Professor 1, 2 Computer science Department,
More informationitexamdump 최고이자최신인 IT 인증시험덤프 일년무료업데이트서비스제공
itexamdump 최고이자최신인 IT 인증시험덤프 http://www.itexamdump.com 일년무료업데이트서비스제공 Exam : Professional-Cloud-Architect Title : Google Certified Professional - Cloud Architect (GCP) Vendor : Google Version : DEMO Get
More informationGet Table Schema In Sql Server 2005 Modify. Column Datatype >>>CLICK HERE<<<
Get Table Schema In Sql Server 2005 Modify Column Datatype Applies To: SQL Server 2014, SQL Server 2016 Preview Specifies the properties of a column that are added to a table by using ALTER TABLE. Is the
More informationrelational Key-value Graph Object Document
NoSQL Databases Earlier We have spent most of our time with the relational DB model so far. There are other models: Key-value: a hash table Graph: stores graph-like structures efficiently Object: good
More informationCitusDB Documentation
CitusDB Documentation Release 4.0.1 Citus Data June 07, 2016 Contents 1 Installation Guide 3 1.1 Supported Operating Systems...................................... 3 1.2 Single Node Cluster...........................................
More informationCAST(HASHBYTES('SHA2_256',(dbo.MULTI_HASH_FNC( tblname', schemaname'))) AS VARBINARY(32));
>Near Real Time Processing >Raphael Klebanov, Customer Experience at WhereScape USA >Definitions 1. Real-time Business Intelligence is the process of delivering business intelligence (BI) or information
More informationContents in Detail. Foreword by Xavier Noria
Contents in Detail Foreword by Xavier Noria Acknowledgments xv xvii Introduction xix Who This Book Is For................................................ xx Overview...xx Installation.... xxi Ruby, Rails,
More informationStream Processing Platforms Storm, Spark,.. Batch Processing Platforms MapReduce, SparkSQL, BigQuery, Hive, Cypher,...
Data Ingestion ETL, Distcp, Kafka, OpenRefine, Query & Exploration SQL, Search, Cypher, Stream Processing Platforms Storm, Spark,.. Batch Processing Platforms MapReduce, SparkSQL, BigQuery, Hive, Cypher,...
More informationStream Processing Platforms Storm, Spark,.. Batch Processing Platforms MapReduce, SparkSQL, BigQuery, Hive, Cypher,...
Data Ingestion ETL, Distcp, Kafka, OpenRefine, Query & Exploration SQL, Search, Cypher, Stream Processing Platforms Storm, Spark,.. Batch Processing Platforms MapReduce, SparkSQL, BigQuery, Hive, Cypher,...
More informationPostgreSQL Query Optimization. Step by step techniques. Ilya Kosmodemiansky
PostgreSQL Query Optimization Step by step techniques Ilya Kosmodemiansky (ik@) Agenda 2 1. What is a slow query? 2. How to chose queries to optimize? 3. What is a query plan? 4. Optimization tools 5.
More informationColumn-Family Databases Cassandra and HBase
Column-Family Databases Cassandra and HBase Kevin Swingler Google Big Table Google invented BigTableto store the massive amounts of semi-structured data it was generating Basic model stores items indexed
More informationApache Kudu. Zbigniew Baranowski
Apache Kudu Zbigniew Baranowski Intro What is KUDU? New storage engine for structured data (tables) does not use HDFS! Columnar store Mutable (insert, update, delete) Written in C++ Apache-licensed open
More informationData Access 3. Managing Apache Hive. Date of Publish:
3 Managing Apache Hive Date of Publish: 2018-07-12 http://docs.hortonworks.com Contents ACID operations... 3 Configure partitions for transactions...3 View transactions...3 View transaction locks... 4
More informationHadoop ecosystem. Nikos Parlavantzas
1 Hadoop ecosystem Nikos Parlavantzas Lecture overview 2 Objective Provide an overview of a selection of technologies in the Hadoop ecosystem Hadoop ecosystem 3 Hadoop ecosystem 4 Outline 5 HBase Hive
More informationDelving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture
Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture Hadoop 1.0 Architecture Introduction to Hadoop & Big Data Hadoop Evolution Hadoop Architecture Networking Concepts Use cases
More informationDATABASE SYSTEMS. Introduction to MySQL. Database System Course, 2016
DATABASE SYSTEMS Introduction to MySQL Database System Course, 2016 AGENDA FOR TODAY Administration Database Architecture on the web Database history in a brief Databases today MySQL What is it How to
More informationRails: MVC in action
Ruby on Rails Basic Facts 1. Rails is a web application framework built upon, and written in, the Ruby programming language. 2. Open source 3. Easy to learn; difficult to master. 4. Fun (and a time-saver)!
More informationShark. Hive on Spark. Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker
Shark Hive on Spark Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker Agenda Intro to Spark Apache Hive Shark Shark s Improvements over Hive Demo Alpha
More informationRuby on Rails 3. Robert Crida Stuart Corbishley. Clue Technologies
Ruby on Rails 3 Robert Crida Stuart Corbishley Clue Technologies Topic Overview What is Rails New in Rails 3 New Project Generators MVC Active Record UJS RVM Bundler Migrations Factory Girl RSpec haml
More informationNoSQL + SQL = MySQL. Nicolas De Rico Principal Solutions Architect
NoSQL + SQL = MySQL Nicolas De Rico Principal Solutions Architect nicolas.de.rico@oracle.com Safe Harbor Statement The following is intended to outline our general product direction. It is intended for
More informationDruid Power Interactive Applications at Scale. Jonathan Wei Software Engineer
Druid Power Interactive Applications at Scale Jonathan Wei Software Engineer History & Motivation Demo Overview Storage Internals Druid Architecture Motivation Motivation Visibility and analysis for complex
More informationHadoop. Course Duration: 25 days (60 hours duration). Bigdata Fundamentals. Day1: (2hours)
Bigdata Fundamentals Day1: (2hours) 1. Understanding BigData. a. What is Big Data? b. Big-Data characteristics. c. Challenges with the traditional Data Base Systems and Distributed Systems. 2. Distributions:
More informationJRuby and Ioke. On Google AppEngine. Ola Bini
JRuby and Ioke On Google AppEngine Ola Bini ola.bini@gmail.com http://olabini.com/blog Vanity slide ThoughtWorks consultant/developer/programming language geek JRuby Core Developer From Stockholm, Sweden
More informationDb2 Alter Table Alter Column Set Data Type Char
Db2 Alter Table Alter Column Set Data Type Char I am trying to do 2 alters to a column in DB2 in the same alter command, and it doesn't seem to like my syntax alter table tbl alter column col set data
More informationDatabricks, an Introduction
Databricks, an Introduction Chuck Connell, Insight Digital Innovation Insight Presentation Speaker Bio Senior Data Architect at Insight Digital Innovation Focus on Azure big data services HDInsight/Hadoop,
More informationIntroduction To Postgres. Rodrigo Menezes
Introduction To Postgres Rodrigo Menezes I joined in 2013, when we were ~20 people Acquired by Oracle during summer of 2017 Currently, we re about ~250 people I started off as a frontend developer This
More informationShark: Hive (SQL) on Spark
Shark: Hive (SQL) on Spark Reynold Xin UC Berkeley AMP Camp Aug 21, 2012 UC BERKELEY SELECT page_name, SUM(page_views) views FROM wikistats GROUP BY page_name ORDER BY views DESC LIMIT 10; Stage 0: Map-Shuffle-Reduce
More informationMicrosoft. Perform Data Engineering on Microsoft Azure HDInsight Version: Demo. Web: [ Total Questions: 10]
Microsoft 70-775 Perform Data Engineering on Microsoft Azure HDInsight Web: www.marks4sure.com Email: support@marks4sure.com Version: Demo [ Total Questions: 10] IMPORTANT NOTICE Feedback We have developed
More informationPostgres Copy Table From One Schema To Another
Postgres Copy Table From One Schema To Another PostgreSQL: how to periodically copy many tables from one database to another but am free to export a copy of both to another server and do whatever I want
More informationSQL, Scaling, and What s Unique About PostgreSQL
SQL, Scaling, and What s Unique About PostgreSQL Ozgun Erdogan Citus Data XLDB May 2018 Punch Line 1. What is unique about PostgreSQL? The extension APIs 2. PostgreSQL extensions are a game changer for
More informationPart 1 Configuring Oracle Big Data SQL
Oracle Big Data, Data Science, Advance Analytics & Oracle NoSQL Database Securely analyze data across the big data platform whether that data resides in Oracle Database 12c, Hadoop or a combination of
More informationMajor Features: Postgres 10
Major Features: Postgres 10 BRUCE MOMJIAN POSTGRESQL is an open-source, full-featured relational database. This presentation gives an overview of the Postgres 10 release. Creative Commons Attribution License
More informationApache Bahir Writing Applications using Apache Bahir
Apache Big Data Seville 2016 Apache Bahir Writing Applications using Apache Bahir Luciano Resende About Me Luciano Resende (lresende@apache.org) Architect and community liaison at Have been contributing
More informationApache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context
1 Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes
More informationTanium Asset User Guide. Version 1.3.1
Tanium Asset User Guide Version 1.3.1 June 12, 2018 The information in this document is subject to change without notice. Further, the information provided in this document is provided as is and is believed
More informationP!"#r$%
P!"#r$% D$&'%"()$* @+r,(#-$r%"($.% PSA: Macs Postgres.app PSA #2 http://postgresweekly.com PSA #3 CVE 2013-1899 UPGRADE Agenda Brief History Developing w/ Postgres Postgres Performance Querying Postgres
More informationMastering phpmyadmiri 3.4 for
Mastering phpmyadmiri 3.4 for Effective MySQL Management A complete guide to getting started with phpmyadmin 3.4 and mastering its features Marc Delisle [ t]open so 1 I community experience c PUBLISHING
More informationMatej Kovačič. Jožef Stefan Institute
PostgreSQL Analysing Open Data Matej Kovačič matej.kovacic@ijs.si Jožef Stefan Institute Centre for Knowledge Transfer in Information Technologies Artificial Intelligence Laboratory SQL and PostgreSQL
More informationDB Export/Import/Generate data tool
DB Export/Import/Generate data tool Main functions: quick connection to any database using defined UDL files show list of available tables and/or queries show data from selected table with possibility
More informationPrototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.org Pete Skomoroch Research Scientist at LinkedIn Consultant at Data Wrangling @peteskomoroch 09/29/09 1 Talk Outline TrendingTopics Overview Wikipedia Page
More informationResearch Students Lecture Series 2015
Research Students Lecture Series 215 Analyse your big data with this one weird probabilistic approach! Or: applied probabilistic algorithms in 5 easy pieces Advait Sarkar advait.sarkar@cl.cam.ac.uk Research
More informationData Analysis R&D. Jim Pivarski. February 5, Princeton University DIANA-HEP
Data Analysis R&D Jim Pivarski Princeton University DIANA-HEP February 5, 2018 1 / 20 Tools for data analysis Eventual goal Query-based analysis: let physicists do their analysis by querying a central
More informationAn API for Your Data. David Brennan, PhUSE An API for Your Data, David Brennan, AD08, PhUSE
An API for Your Data David Brennan, PhUSE An API for Your Data, David Brennan, AD08, PhUSE 1 Background Tables, tables, tables Open-source tools Web development frameworks Javascript libraries html/css
More informationEvolution of an Apache Spark Architecture for Processing Game Data
Evolution of an Apache Spark Architecture for Processing Game Data Nick Afshartous WB Analytics Platform May 17 th 2017 May 17 th, 2017 About Me nafshartous@wbgames.com WB Analytics Core Platform Lead
More informationNoSQL Databases An efficient way to store and query heterogeneous astronomical data in DACE. Nicolas Buchschacher - University of Geneva - ADASS 2018
NoSQL Databases An efficient way to store and query heterogeneous astronomical data in DACE DACE https://dace.unige.ch Data and Analysis Center for Exoplanets. Facility to store, exchange and analyse data
More informationMicrosoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo
Microsoft Exam Questions 70-775 Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo NEW QUESTION 1 You have an Azure HDInsight cluster. You need to store data in a file format that
More informationPackage bigqueryr. June 8, 2018
Package bigqueryr June 8, 2018 Title Interface with Google BigQuery with Shiny Compatibility Version 0.4.0 Interface with 'Google BigQuery', see for more information.
More informationdoc. RNDr. Tomáš Skopal, Ph.D. RNDr. Michal Kopecký, Ph.D.
course: Database Systems (NDBI025) SS2017/18 doc. RNDr. Tomáš Skopal, Ph.D. RNDr. Michal Kopecký, Ph.D. Department of Software Engineering, Faculty of Mathematics and Physics, Charles University in Prague
More informationImpala Intro. MingLi xunzhang
Impala Intro MingLi xunzhang Overview MPP SQL Query Engine for Hadoop Environment Designed for great performance BI Connected(ODBC/JDBC, Kerberos, LDAP, ANSI SQL) Hadoop Components HDFS, HBase, Metastore,
More informationAn Introduction to BigQuery
An Introduction to BigQuery (in less than 10 minutes) brought to you by The ISB Cancer Genomics Cloud This is what you should see the first time you go to the BigQuery Web UI at bigquery.cloud.google.com
More informationApache Phoenix We put the SQL back in NoSQL
Apache Phoenix We put the SQL back in NoSQL http://phoenix.incubator.apache.org James Taylor @JamesPlusPlus Maryann Xue @MaryannXue Eli Levine @teleturn About James o o Engineer at Salesforce.com in BigData
More informationexam. Microsoft Perform Data Engineering on Microsoft Azure HDInsight. Version 1.0
70-775.exam Number: 70-775 Passing Score: 800 Time Limit: 120 min File Version: 1.0 Microsoft 70-775 Perform Data Engineering on Microsoft Azure HDInsight Version 1.0 Exam A QUESTION 1 You use YARN to
More informationOracle 1Z0-882 Exam. Volume: 100 Questions. Question No: 1 Consider the table structure shown by this output: Mysql> desc city:
Volume: 100 Questions Question No: 1 Consider the table structure shown by this output: Mysql> desc city: 5 rows in set (0.00 sec) You execute this statement: SELECT -,-, city. * FROM city LIMIT 1 What
More informationTypus Documentation. Release beta. Francesc Esplugas
Typus Documentation Release 4.0.0.beta Francesc Esplugas November 20, 2014 Contents 1 Key Features 3 2 Support 5 3 Installation 7 4 Configuration 9 4.1 Initializers................................................
More informationMore MySQL ELEVEN Walkthrough examples Walkthrough 1: Bulk loading SESSION
SESSION ELEVEN 11.1 Walkthrough examples More MySQL This session is designed to introduce you to some more advanced features of MySQL, including loading your own database. There are a few files you need
More informationDATABASE SYSTEMS. Introduction to MySQL. Database System Course, 2016
DATABASE SYSTEMS Introduction to MySQL Database System Course, 2016 AGENDA FOR TODAY Administration Database Architecture on the web Database history in a brief Databases today MySQL What is it How to
More informationProcessing Big Data. with AZURE DATA LAKE ANALYTICS. Sean Forgatch - Senior Consultant. 6/23/ TALAVANT. All Rights Reserved.
Processing Big Data with AZURE DATA LAKE ANALYTICS Sean Forgatch - Senior Consultant 6/23/2018 2018 TALAVANT. All Rights Reserved. 1 SQL Saturday Iowa 2018 6/23/2018 2018 TALAVANT. All Rights Reserved.
More informationExam Questions
Exam Questions 70-775 Perform Data Engineering on Microsoft Azure HDInsight (beta) https://www.2passeasy.com/dumps/70-775/ NEW QUESTION 1 You are implementing a batch processing solution by using Azure
More informationKey Terms. Attribute join Target table Join table Spatial join
Key Terms Attribute join Target table Join table Spatial join Lect 10A Building Geodatabase Create a new file geodatabase Map x,y data Convert shape files to geodatabase feature classes Spatial Data Formats
More informationHacking PostgreSQL Internals to Solve Data Access Problems
Hacking PostgreSQL Internals to Solve Data Access Problems Sadayuki Furuhashi Treasure Data, Inc. Founder & Software Architect A little about me... > Sadayuki Furuhashi > github/twitter: @frsyuki > Treasure
More informationStructured Streaming. Big Data Analysis with Scala and Spark Heather Miller
Structured Streaming Big Data Analysis with Scala and Spark Heather Miller Why Structured Streaming? DStreams were nice, but in the last session, aggregation operations like a simple word count quickly
More informationPackage rbraries. April 18, 2018
Title Interface to the 'Libraries.io' API Package rbraries April 18, 2018 Interface to the 'Libraries.io' API (). 'Libraries.io' indexes data from 36 different package managers
More informationIn this Lecture. More SQL Data Definition. Deleting Tables. Creating Tables. ALTERing Columns. Changing Tables. More SQL
In this Lecture Database Systems Lecture 6 Natasha Alechina More SQL DROP TABLE ALTER TABLE INSERT, UPDATE, and DELETE Data dictionary Sequences For more information Connolly and Begg chapters 5 and 6
More informationIn-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet
In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet Ema Iancuta iorhian@gmail.com Radu Chilom radu.chilom@gmail.com Big data analytics / machine learning 6+ years
More informationSpring 2017 EXTERNAL SORTING (CH. 13 IN THE COW BOOK) 2/7/17 CS 564: Database Management Systems; (c) Jignesh M. Patel,
Spring 2017 EXTERNAL SORTING (CH. 13 IN THE COW BOOK) 2/7/17 CS 564: Database Management Systems; (c) Jignesh M. Patel, 2013 1 Motivation for External Sort Often have a large (size greater than the available
More informationICS4U Project Development Example Discovery Day Project Requirements. System Description
ICS4U Project Development Example Discovery Day Project Requirements System Description The discovery day system is designed to allow students to register themselves for the West Carleton Discovery Day
More informationPackage bigqueryr. October 23, 2017
Package bigqueryr October 23, 2017 Title Interface with Google BigQuery with Shiny Compatibility Version 0.3.2 Interface with 'Google BigQuery', see for more information.
More informationPython, PySpark and Riak TS. Stephen Etheridge Lead Solution Architect, EMEA
Python, PySpark and Riak TS Stephen Etheridge Lead Solution Architect, EMEA Agenda Introduction to Riak TS The Riak Python client The Riak Spark connector and PySpark CONFIDENTIAL Basho Technologies 3
More informationEffective Rails Testing Practices
Effective Rails Testing Practices Mike Swieton atomicobject.com atomicobject.com 2007: 16,000 hours General testing strategies Integration tests View tests Controller tests Migration tests Test at a high
More informationDatabase Acceleration Solution Using FPGAs and Integrated Flash Storage
Database Acceleration Solution Using FPGAs and Integrated Flash Storage HK Verma, Xilinx Inc. August 2017 1 FPGA Analytics in Flash Storage System In-memory or Flash storage based DB reduce disk access
More informationAn Introduction to Big Data Formats
Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION
More informationLearn Relational Database from Scratch. Dan Li, Ph.D. Associate Professor Computer Science Eastern Washington University
Learn Relational Database from Scratch Dan Li, Ph.D. Associate Professor Computer Science Eastern Washington University Self-Introduction Associate professor of Computer Science at EWU Area of expertise
More informationFATWORM IMPLEMENTATION. Chenyang Wu
FATWORM IMPLEMENTATION Chenyang Wu FATWORM IMPLEMENTATION Chenyang Wu OVERVIEW Keywords A Traditional Architecture A Traditional Implementation Resources KEYWORDS KEYWORDS RDBMS, scratch Simplified SQL
More information2/26/2017. Originally developed at the University of California - Berkeley's AMPLab
Apache is a fast and general engine for large-scale data processing aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes Low latency: sub-second
More information