Using Spark SQL in Spark 2

Size: px

Start display at page:

Download "Using Spark SQL in Spark 2"

Andrew Gardner
5 years ago
Views:

1 Using Spark SQL in Spark 2

2 Spark SQL in Spark 2: Overview and SparkSession Chapter 1

3 Course Chapters Spark SQL in Spark 2: Overview and SparkSession Spark SQL in Spark 2: Datasets, DataFrames, and SQL Queries Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrigen consent from Cloudera. 01-3

4 Course ObjecNves During this course, you will learn The major differences between Apache Spark SQL in Spark 2 and previous versions How to create, configure, and use a SparkSession object The differences between Datasets and DataFrames How to create and transform Datasets in Scala How to use new features in the DataFrame API How to take advantage of performance enhancements in Spark 2 Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrigen consent from Cloudera. 01-4

5 Chapter Topics Spark SQL in Spark 2: Overview and SparkSession What's New in Spark SQL in Spark 2? Working With the SparkSession object Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrigen consent from Cloudera. 01-5

6 Spark SQL Is the Main Entry Point for Spark Datasets and DataFrames provide the primary API for Spark When working with structured data Spark SQL, DataFrames, and Datasets are built on RDDs You can svll work directly with RDDs when needed When working on unstructured data such as text When fine-tuned control needed When working with legacy Spark code Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrigen consent from Cloudera. 01-6

7 Datasets and DataFrames Datasets are now fully supported Originally introduced in Spark 1.5 as experimental The Datasets and DataFrames APIs have been streamlined and unified DataFrames in Scala are implemented as Datasets of Row objects DataFrameReader now supports CSV files Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrigen consent from Cloudera. 01-7

FROM seller WHERE region='emea') Copyright 2010-2017 Cloudera.

8 Improved SQL Support Support for the SQL 2003 standard NaVve parser for ANSI SQL Support for subqueries SELECT * FROM order WHERE seller_id IN (SELECT seller_id FROM seller WHERE region='emea') Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrigen consent from Cloudera. 01-8

9 Spark SQL Performance Major performance enhancements to Catalyst (the Spark SQL query opvmizer) Spark 2 includes the second generavon of the Tungsten engine Generates opnmized JVM bytecode for individual stages Referred to as whole stage code generanon Two to 10 Vmes faster for common workloads Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrigen consent from Cloudera. 01-9

New SparkSession Class Provides a unified entry point for core Spark and Spark SQL Replaces SQLContext and HiveContext Configured using a Spark session builder import org.apache.spark.sql.

10 New SparkSession Class Provides a unified entry point for core Spark and Spark SQL Replaces SQLContext and HiveContext Configured using a Spark session builder import org.apache.spark.sql.sparksession val spark = SparkSession. builder. appname("myapp"). getorcreate Language: Scala Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrigen consent from Cloudera

11 Structured Streaming Experimental feature added in Spark 2.0 Not yet supported for producnon Higher level API than Spark Streaming Uses the Datasets and DataFrames API Improved consistency, fault tolerance, and handling of out-of-order events val socketdf = spark. readstream. format("socket"). option("host", "localhost"). option("port", 9999). load() Language: Scala Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrigen consent from Cloudera

12 Changes in Core Spark Older versions of supported languages are deprecated Python 2.6 Use Python 2.7+ or 3.4+ Java 7 Use Java 8 Scala 2.10 Use Scala 2.11 All features that were deprecated in Spark 1.x have been removed Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrigen consent from Cloudera

13 Spark 2 and CDH CDH 5.7+ svll includes Spark 1.6 Spark 2 is available as an add-on service parcel Cloudera Manager is required for installanon Spark 1.6 and Spark 2 can both be installed Use spark2-submit for Spark 2, spark-submit for Spark 1.6 Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrigen consent from Cloudera

14 Chapter Topics Spark SQL in Spark 2: Overview and SparkSession What's New in Spark SQL in Spark 2? Working with the SparkSession object Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrigen consent from Cloudera

15 SparkSession Overview The SparkSession object is the new, unified entry point to Spark and Spark SQL Replaces SQLContext and HiveContext These remain in Spark 2 for backwards companbility Encapsulates the Spark context Simplifies creanon and configuranon of the SparkContext object Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrigen consent from Cloudera

$CreaNng a SparkSession Object (1) The Spark shell automavcally creates a SparkSession object called spark Welcome to / / / / \ \/ _ \/ _ `/ / '_/ / /. /\_,_/_/ /_/\_\ /_/ version 2.0.$

16 CreaNng a SparkSession Object (1) The Spark shell automavcally creates a SparkSession object called spark Welcome to / / / / \ \/ _ \/ _ `/ / '_/ / /. /\_,_/_/ /_/\_\ /_/ version cloudera1 Language: Python Using Python version (default, Nov :07:18) SparkSession available as 'spark'. >>> spark <pyspark.sql.session.sparksession object at 0x1928b90> Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrigen consent from Cloudera

17 CreaNng a SparkSession Object (2) In a Spark applicavon, you will need to create the SparkSession object yourself Call the object spark by convennon SparkSession.builder returns a Builder object To create and configure the SparkSession object The getorcreate builder funcvon returns the exisvng SparkSession object if it exists Creates a new SparkSession if none exists A Spark applicanon can have mulnple Spark sessions Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrigen consent from Cloudera

$Example: CreaNng a SparkSession Object from pyspark.sql import SparkSession spark = SparkSession.builder. \ appname("my-spark-app"). \ getorcreate() Language: Python import org.apache.spark.sql.sparksession val spark = SparkSession.$

18 Example: CreaNng a SparkSession Object from pyspark.sql import SparkSession spark = SparkSession.builder. \ appname("my-spark-app"). \ getorcreate() Language: Python import org.apache.spark.sql.sparksession val spark = SparkSession.builder. appname("my-spark-app"). getorcreate() Language: Scala Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrigen consent from Cloudera

19 CreaNng a New Spark Session You can create a new Spark session from an exisvng one The new one is considered a child of the first Both sessions refer to the same Spark context The second session can have a different configuravon from the first Example: Create a child SparkSession object with a default file format of JSON val sparkchild = spark.newsession sparkchild.conf.set("spark.sql.sources.default","json") Language: Python Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrigen consent from Cloudera

20 Working with SparkContext and SparkSession CreaVng a Spark session also creates an underlying Spark context if none exists Reuses exisnng Spark context if one does exist The Spark shell automavcally exposes this as sc In a Spark applicavon, use spark.sparkcontext to access it spark.sparkcontext.setloglevel("error") val myrdd = spark.sparkcontext.textfile("myfile") Language: Scala spark.sparkcontext.setloglevel("error") myrdd = spark.sparkcontext.textfile("myfile") Language: Python Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrigen consent from Cloudera

21 Using the Spark SQL API with SparkSession The SparkSession object provides access to the DataFrames and Datasets APIs Works similarly to SQLContext and HiveContext for creanng and querying DataFrames and Datasets For example, use SparkSession.read to return a DataFrameReader val mydf = spark.read.json("myfile.json") Language: Scala mydf = spark.read.json("myfile.json") Language: Python Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrigen consent from Cloudera

22 Spark SQL in Spark 2: Datasets, DataFrames, and SQL Queries Chapter 2

23 Course Chapters Spark SQL in Spark 2: Overview and SparkSession Spark SQL in Spark 2: Datasets, DataFrames, and SQL Queries Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrihen consent from Cloudera. 02-2

24 Chapter ObjecLves During this chapter, you will learn The differences between Datasets and DataFrames How to create and transform Datasets in Scala How to use new features in the DataFrame API How to use the Catalog API to manage SQL query tables and views Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrihen consent from Cloudera. 02-3

25 Chapter Topics Spark SQL in Spark 2: Datasets, DataFrames, and SQL Queries Dataset Overview CreaLng Datasets Dataset OperaLons DataFrames SQL Queries and the Catalog API Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrihen consent from Cloudera. 02-4

26 What Is a Dataset? A distributed collecnon of strongly typed objects PrimiLve types such as Int or String Product objects based on case classes Mapped to a relanonal schema The schema is defined by an encoder Built on RDDs Combine the type safety of RDDs with the structure of DataFrames TransformaLons can be expressed as SQL-like queries Implemented only in Scala and Java, not Python or R Python and R are not strongly-typed, compiled languages, therefore the concept is not applicable Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrihen consent from Cloudera. 02-5

27 Comparing Datasets in Spark 2.0 and Spark 1.6 Datasets were introduced as experimental in Spark 1.6 and are fully supported in Spark 2.0 DataFrame and Dataset APIs are now unified DataFrame is now an alias for Dataset[Row] (in Scala and Java) In Spark 1.6, DataFrame and Dataset were separate classes The Spark 2 docs do not include an API for DataFrame Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrihen consent from Cloudera. 02-6

28 Datasets and DataFrames DataFrames and Datasets represent different types of data DataFrames (Datasets of type Row) represent tabular data Datasets represent typed, object-oriented data DataFrame transformanons are referred to as untyped Rows can hold elements of any type Schemas defining column types are not applied unll run Lme Dataset transformanons are typed Object properles are inherently typed at compile Lme Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrihen consent from Cloudera. 02-7

29 Chapter Topics Spark SQL in Spark 2: Datasets, DataFrames, and SQL Queries Dataset Overview CreaNng Datasets Dataset OperaLons DataFrames SQL Queries and the Catalog API Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrihen consent from Cloudera. 02-8

30 CreaLng Datasets: A Simple Example Use spark.createdataset(seq) to create a Dataset from inmemory data (experimental) The Dataset type is the type of the elements of the sequence Example: Create a Dataset of type String (Dataset[String]) val strings = Seq("a string","another string") val stringds = spark.createdataset(strings) stringds.show value a string another string Language: Scala Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrihen consent from Cloudera. 02-9

Datasets and Case Classes (1) Scala case classes are a useful way to represent data in a Dataset They are ozen used for crealng simple data-holding objects in Scala Instances of case classes are

31 Datasets and Case Classes (1) Scala case classes are a useful way to represent data in a Dataset They are ozen used for crealng simple data-holding objects in Scala Instances of case classes are called products case class Name(firstName: String, lastname: String) val names = Seq(Name("Fred","Flintstone"), Name("Barney","Rubble")) names.foreach(name => println(name.firstname)) Language: Scala Fred Barney Note: example con2nues on next slide Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrihen consent from Cloudera

32 Datasets and Case Classes (2) Encoders define a Dataset s schema using reflecnon on the object type Case class arguments are treated as columns import spark.implicits._ // required if not running in shell val namesds = spark.createdataset(names) namesds.show firstname lastname Fred Flintstone Barney Rubble Language: Scala Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrihen consent from Cloudera

33 CreaLng a Dataset from a DataFrame (1) Use DataFrame.as[Type] to create a Dataset from a DataFrame Encoders convert Row elements to the Dataset s type The DataFrame.as funclon is experimental Example: Read a JSON file into a Dataset of type Name Data File: names.json {"firstname":"grace","lastname":"hopper"} {"firstname":"alan","lastname":"turing"} {"firstname":"ada","lastname":"lovelace"} {"firstname":"charles","lastname":"babbage"} Example connnued on next slide Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrihen consent from Cloudera

CreaLng a Dataset from a DataFrame (2) val namesdf = spark.read.json("names.json") Language: Scala namesdf: org.apache.spark.sql.dataframe = [firstname: string, lastname: string] namesdf.

34 CreaLng a Dataset from a DataFrame (2) val namesdf = spark.read.json("names.json") Language: Scala namesdf: org.apache.spark.sql.dataframe = [firstname: string, lastname: string] namesdf.show firstname lastname Grace Hopper Alan Turing Ada Lovelace Charles Babbage Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrihen consent from Cloudera

CreaLng a Dataset from a DataFrame (3) val namesds = namesdf.as[name] Language: Scala namesds: org.apache.spark.sql.dataset[name] = [firstname: string, lastname: string] namesds.

35 CreaLng a Dataset from a DataFrame (3) val namesds = namesdf.as[name] Language: Scala namesds: org.apache.spark.sql.dataset[name] = [firstname: string, lastname: string] namesds.show firstname lastname Grace Hopper Alan Turing Ada Lovelace Charles Babbage Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrihen consent from Cloudera

CreaLng Datasets from RDDs Datasets can be created based on RDDs Useful with unstructured or semi-structured data such as text val namesrdd = spark.

36 CreaLng Datasets from RDDs Datasets can be created based on RDDs Useful with unstructured or semi-structured data such as text val namesrdd = spark.sparkcontext.textfile("names.txt"). map(line => line.split(",")). map(fields => Name(fields(1),fields(0))) val namesds = spark.createdataset(namesrdd) namesds.show firstname lastname Grace Hopper Alan Turing Ada Lovelace Charles Babbage Language: Scala Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrihen consent from Cloudera

Type Safety: Datasets and DataFrames Type safety means that type errors are found at compile Nme rather than run Nme Example: assigning a String value to an Int variable Language: Scala val i:int =

37 Type Safety: Datasets and DataFrames Type safety means that type errors are found at compile Nme rather than run Nme Example: assigning a String value to an Int variable Language: Scala val i:int = namesds.first.lastname // Name(Grace,Hopper) CompilaLon: error: type mismatch; found: String / required: Int val row = namesdf.first // Row(Grace,Hopper) val i:int = row.getint(row.fieldindex("lastname")) Run Lme: java.lang.classcastexception: java.lang.string cannot be cast to java.lang.integer Language: Scala Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrihen consent from Cloudera

38 Chapter Topics Spark SQL in Spark 2: Datasets, DataFrames, and SQL Queries Dataset Overview CreaLng Datasets Dataset OperaNons DataFrames SQL Queries and the Catalog API Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrihen consent from Cloudera

39 Typed and Untyped TransformaLons (1) Typed transformanons create a new Dataset based on an exisnng Dataset Typed transformalons can be used on Datasets based on any type (including Row) Untyped transformanons return DataFrames (Datasets containing Row objects) or untyped Columns, regardless of the type of the parent Dataset Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrihen consent from Cloudera

40 Typed and Untyped TransformaLons (2) Untyped operanons include join groupby col drop select (using column names or Columns) Typed operanons include filter (and its alias, where) distinct limit sort (and its alias, orderby) groupbykey (experimental) Lambda operalons such as map, flatmap, reduce, and foreach (experimental) Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrihen consent from Cloudera

Example: Typed and Untyped TransformaLons (1) Language: Scala case class Person(pcode:String, lastname:string, firstname:string, age:int) val people = Seq(Person("02134","Hopper","Grace",48), ) val

41 Example: Typed and Untyped TransformaLons (1) Language: Scala case class Person(pcode:String, lastname:string, firstname:string, age:int) val people = Seq(Person("02134","Hopper","Grace",48), ) val peopleds = spark.createdataset(people) peopleds: org.apache.spark.sql.dataset[person] = [pcode: string, firstname: string... 2 more fields] Note: example con2nues on next slide Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrihen consent from Cloudera

Example: Typed and Untyped TransformaLons (2) Typed operanons return Datasets based on the starnng Dataset Untyped operanons return DataFrames (Datasets of Rows) val sortedds = peopleds.

42 Example: Typed and Untyped TransformaLons (2) Typed operanons return Datasets based on the starnng Dataset Untyped operanons return DataFrames (Datasets of Rows) val sortedds = peopleds.sort("age") Language: Scala sortedds: org.apache.spark.sql.dataset[person] = [pcode: string, lastname: string... 2 more fields] val firstlastdf = peopleds.select("firstname","lastname") firstlastdf: org.apache.spark.sql.dataframe = [firstname: string, lastname: string] Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrihen consent from Cloudera

Example: Combining Typed and Untyped OperaLons val combinedf = peopleds.sort("lastname"). where("age > 40").select("firstName","lastName") Language: Scala combinedf: org.apache.spark.sql.

43 Example: Combining Typed and Untyped OperaLons val combinedf = peopleds.sort("lastname"). where("age > 40").select("firstName","lastName") Language: Scala combinedf: org.apache.spark.sql.dataframe = [firstname: string, lastname: string] combinedf.show firstname lastname Charles Babbage Grace Hopper Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrihen consent from Cloudera

44 Chapter Topics Spark SQL in Spark 2: Datasets, DataFrames, and SQL Queries Dataset Overview CreaLng Datasets Dataset OperaLons DataFrames SQL Queries and the Catalog API Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrihen consent from Cloudera

45 What s New in Spark 2 DataFrames? DataFrames are now implemented as Datasets of type Row Rows may contain columns of any type DataFrames support all Dataset operalons, including typed operalons No change in basic funcnonality Improvements and streamlining of the API DataFrames can be created from an RDD of case class objects using reflecnon Class fields are used to define the schema DataFrameWriter now supports buckenng (bucketby, sortby) for Parquet, JSON, and ORC format data DataFrameReader and DataFrameWriter now support CSV format Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrihen consent from Cloudera

46 Reading and WriLng CSV Files DataFrameReader and DataFrameWriter now support CSV format files, in addinon to JSON, Parquet, and so on There are several configuranon opnons, including header: use the first line of the file to determine column names (defaults to false) inferschema: ahempt to determine the schema by reading through the file before loading (defaults to false) schema: override the default or inferred schema with the specified schema This avoids the extra file pass required for an inferred schema sep: sets the separator character (defaults to a comma) dateformat: specifies the format for parsing date and Lme values (defaults to null, which will use the standard Java date and Lme parser methods) Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrihen consent from Cloudera

usersdf = spark.read.option("header","true"). option("inferschema","true").csv("users.csv") usersdf.

47 Example: Reading a CSV File with a Header File: users.csv lastname,firstname,age,startdate Hopper,Grace,46, Turing,Alan,30, Lovelace,Ada,29, Babbage,Charles,48, val usersdf = spark.read.option("header","true"). option("inferschema","true").csv("users.csv") usersdf.printschema Language: Scala root -- lastname: string (nullable = true) -- firstname: string (nullable = true) -- age: integer (nullable = true) -- startdate: timestamp (nullable = true) Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrihen consent from Cloudera

read.option("inferschema","true").csv("users2.csv") usersdf.

48 Example: Reading a CSV File without a Header File: users2.csv Hopper,Grace,46, Turing,Alan,30, Lovelace,Ada,29, Babbage,Charles,48, val usersdf = spark.read.option("inferschema","true").csv("users2.csv") usersdf.printschema root -- _c0: string (nullable = true) -- _c1: string (nullable = true) -- _c2: integer (nullable = true) -- _c3: timestamp (nullable = true) Language: Scala Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrihen consent from Cloudera

49 Chapter Topics Spark SQL in Spark 2: Datasets, DataFrames, and SQL Queries Dataset Overview CreaLng Datasets Dataset OperaLons DataFrames SQL Queries and the Catalog API Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrihen consent from Cloudera

50 SQL Queries on Tables Spark SQL allows you to query tables using SQL val mydf = spark. sql("select acct_num,last_name FROM accounts") Tables are either Hive metastore tables or in-memory tables, depending on configuranon Set the spark.sql.catalogimplementation applicalon property to either hive or in-memory Note that this property is undocumented Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrihen consent from Cloudera

51 CreaLng Tables To create a table from a Dataset, use the DataFrameWriter.saveAsTable operanon namesds.write.saveastable(tablename) In Spark 1, this opnon was only available for Hive tables In Spark 2, you can also save the data in in-memory tables The data is saved to the Hive warehouse (or spark-warehouse if Hive is not configured) Override the localon by seang the spark.sql.warehouse.dir applicalon property Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrihen consent from Cloudera

52 SQL Queries on Views You can also query a view Views provide the temporary ability to perform SQL queries on a Dataset Views are the equivalent of temporary tables in Spark 1 CreaNng a temporary table in Spark 1 DataFrame.registerTempTable(tablename) CreaNng a temporary view in Spark 2 Dataset.createTempView(viewname) Dataset.createOrReplaceTempView(viewname) Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrihen consent from Cloudera

53 Example: Querying a View peopleds.createtempview("people") spark.sql("select firstname,lastname FROM people").show firstname lastname Grace Hopper Alan Turing Ada Lovelace Charles Babbage Niklaus Wirth Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrihen consent from Cloudera

54 The Catalog API Spark 2 introduced the new Catalog API for managing views and the underlying tables The entry point for the Catalog API is spark.catalog Methods include listdatabases: returns a Dataset (Scala) or list (Python) of exislng databases setcurrentdatabase(dbname): sets the current default database for the session listtables: returns a Dataset (Scala) or list (Python) of tables and views in the current database listcolumns(tablename): returns a Dataset (Scala) or list (Python) of the columns in the specified table or view droptempview(viewname): removes a temporary view Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrihen consent from Cloudera

false devices default null MANAGED false +------------+--------+-----------+---------+-----------+ Language: Python for table in spark.catalog.

55 Example: LisLng Tables and Views with the Catalog API Language: Scala spark.catalog.listtables.show name database description tabletype istemporary accounts default null MANAGED false devices default null MANAGED false Language: Python for table in spark.catalog.listtables(): print table Table(name=u'accounts', database=u'default', description=none, tabletype=u'managed', istemporary=false) Table(name=u'devices', database=u'default', description=none, tabletype=u'managed', istemporary=false) Copyright Cloudera. All rights reserved. Not to be reproduced or shared without prior wrihen consent from Cloudera

WHAT S NEW IN SPARK 2.0: STRUCTURED STREAMING AND DATASETS

WHAT S NEW IN SPARK 2.0: STRUCTURED STREAMING AND DATASETS Andrew Ray StampedeCon 2016 Silicon Valley Data Science is a boutique consulting firm focused on transforming your business through data science