Enable Spark SQL on NoSQL Hbase tables with HSpark IBM Code Tech Talk. February 13, 2018

Size: px
Start display at page:

Download "Enable Spark SQL on NoSQL Hbase tables with HSpark IBM Code Tech Talk. February 13, 2018"

Transcription

1 Enable Spark SQL on NoSQL Hbase tables with HSpark IBM Code Tech Talk February 13, >> MARC-ARTHUR PIERRE LOUIS: I want to welcome you to another of our tech talk series. Today is a great presentation again about technology. My name is Marc-Arthur Pierre Louis, your friendly host. Last week we spoke about [inaudible] connector that allows you to do streaming. Some requirement imposed by modern technology on an older technology JDDC. It's great, allows you to do streaming and nonblocking which is required today to have responsive websites. Today we have something else in the realm of database technology, HSpark enabling you to do SQL on a database that is not SQL. That is HSpark, the project that allows you to leverage Hbase on the Spark technology. To talk about this interesting technology is Bo Meng from the HSpark team who is going to take us through what is HSpark and how you can do SQL on [inaudible] database on the Spark. >> BO MENG: Hi, thank you for having me. This is a project I was working on before and this project is still going on. Let me introduce myself. My name is Bo Meng. I'm a senior software engineer in IBM, and also I'm working the open technology such as Spark, hadoop. My major interest is focus on big data, hadoop and the Hbase and Spark. Also I have my colleague here, a big data architect. He is lead architect for this project. He has hadoop [inaudible] years experiencing database domain. He will be my company for today. (distorted audio). After this talk. Let's get started. Today's agenda will be like this. First we will give a short introduction of what Hbase, what is Hbase, and how it, what it is and what it is not. Also we are talking about Hbase and traditional... (echoing). Traditional the comparison, because between the Hbase and the SQL database, and also I'm going to show some key features between why the Hbase has some performance advantage. Also it will help us to understand how we design the HSpark. After that

2 we will switch to the HSpark part. We are talking about its data, how it is mapping the traditional data type to HSpark data type. Also we will discuss iteration of HSpark. Finally we will give a short demo to show how to use the HSpark. This will be a review, but I will go through it and add comments along with as we do. Hbase is a open source technology, which right now is the top level project in Apache on the website. The Hbase is distributed and sorted map actually. This model is the original idea is from Google big table. The Hbase has Apache 2.0 license. Right now it's part of the hadoop ecosystem. You can see also there is Hbase is widely used by some big companies such as hadoop, Adobe, Facebook. Facebook is, if you are using the messaging system in Facebook, they are backed by the Hbase. I checked a few days ago the current stable version is That is the stable version for the Hbase. From the user perspective, actually Hbase is just a key value store and use HDFS for actual storage. As I said, it's modeled by the Google big table and it's column oriented. It has the version support which means you can have the data for the same column but has the multiple versions for that. Compared to the traditional SQL database Hbase is not a SQL database, it does not have any join, query engine, data types and SQL query, we will see a little bit later. It also does not have the schema and it will not have the traditional DBA will be have usable. That is Hbase, it is and is not. Compared to the traditional database, Hbase has some advantage. The first category is it's high performance. The key features here is rowkey is always sorted. Whenever you want to retrieve the data by its row key or by certain range of the row key, since the row key will be sort, is already sorted, it is very fast to retrieve the data. So it's very efficient. Also the row key can be controlled by different filters. The filters can control how you can scan the rowkey. Also the columns or families can be used any, also can be filtered. It also can support different kind of filters. Because it's built on the hadoop system, it uses hadoop and zookeeper so it has high scale abilities. The other advantage for the Hbase is you have the consistency, realtime, kind of realtime performance and also the other things that can be advantage from the hadoop system. Right now, we can do comparison about the SQL and Hbase. Suppose we have a table here, which has the, just the normal user table. It has some user information need to be stored. Some of the user information like name, age and gender. So you can see the upper table is the SQL table. Usually you can find

3 something like this. The second row here I want to show is if somebody wants to change the name from Bob to Andrew, so basically you have to update for that column. But sometimes if this happens, it may lost the information here, because if we want to trace the history, it may get a loss. Another thing here is how you design something from the Hbase. As I said, Hbase has rowkey. So row key basically is the binary array to represent the row. It's always ordered, as I said. For example, you have something like a row key ordered here, then you have the column key to represent the actual column, then you have a time stamp to give the version of each row, and also a value associated with the row. >> MARC-ARTHUR PIERRE LOUIS: Question for you, Bo. On the row key, are those, they are not unique. They can be multiple keys that are the same? >> BO MENG: Actually, in the, this is just for illustration perspective. Actually the row key will be, it will have one row key that has column keys in one row. >> MARC-ARTHUR PIERRE LOUIS: Row keys [inaudible] okay. >> BO MENG: It's like one row, for example, if the row number, then you have the three row key, column key here. Then you have the values. The second row may have the row key 02, then you have the two column keys here. >> MARC-ARTHUR PIERRE LOUIS: Got it. >> BO MENG: So, it means the Hbase, the row, the column could be varying, it is not a fixed. You can add column -- (overlapping speakers). >> MARC-ARTHUR PIERRE LOUIS: I failed to see one row that had several columns. You were describing each of these rows were actually part of one row. >> BO MENG: Yeah, okay. Yeah. Another thing I want to mention here is how you change the name. For the second row you change name from Bob to Andrew, actually the time stamp changed. Basically if you want to retrieve the information, you can retrieve the history to see how this value got changed. >> MARC-ARTHUR PIERRE LOUIS: Got it. >> BO MENG: Hbase also has some tool called Hbase shelf. It provides the DDL. The DDL is how you create the tables and are also how you retrieve the information. So we can simply walk through that. First line is how you create the table, and also create the column name cf. The first test is the table name, cf is the column name. Then we put the three records into this table. The first is the table name test and the row name is row1. Also, we put the column name cf column name and the value is Bob. Another thing is we want to put the column row2 is Tom and row3 is the Andrew. This will basically list all the tables in the current

4 namespace. Here basically it will get the test and the tables. Scan test will list all the records in the test. Get will get the, all the [inaudible] in the test table then row key equal to row 1. Delete will, you can complete the test table, row 1, then the column name, column name equal to the cf column name. >> MARC-ARTHUR PIERRE LOUIS: Quick question for you there. Once you create a row, and you abend the row it doesn't delete and if you don't get row, it will get all the rows and you are going to use the time stamp to find out what one is the most current one. (cavernous audio). >> BO MENG: In the default Hbase give you the latest version. So it's the latest, the value of that row but you also config how many records you want to track, meaning if you want to track last record, it is okay for -- >> MARC-ARTHUR PIERRE LOUIS: Got it. Interesting, yeah. >> BO MENG: Disable, basically disable because it's [inaudible] you need to first disable the tables, make sure the [inaudible] has all the tables disabled, then you can drop the table. That is a little bit different. Here basically you want to see is there is quite difference from the traditional SQL perspective how you create a table, how you get all the record, because it's not, Hbase is not traditional database. SQL is not supported. The next step is about how you map the, so from the SQL perspective how you can map the terminology of the Hbase to the SQL. So per the table here so it's for you to understand how you map the concept from SQL to the Hbase perspective. It's namespace is kind of equal to the database, and table same and row key is Hbase, placeholder is the key columns in the SQL term. The important, the point here is the key column could have multiple columns, which in the SQL term is called the composer key, like multiple keys to be one, to be the key column. In Hbase a column base and qualifier can combine [inaudible] (distorted audio). In SQL terms it could be [inaudible] in the SQL it can be whatever the data type is, like int and the float, double. But in Hbase, the only data type is the byte array. That is the only type, so meaning whatever the SQL, the value of the SQL type you need to convert the type into the byte array to store the data. Here is another example. In the SQL we want to create a table customer which has the ID name and age, age and salary and list the information and primary key is actually the ID and the name. This is composed key. ID name will combine as the

5 primary key. In the Hbase, if we want to create a table, we want to create the table name could be called customer. That is the logical name. In the physical storage we can call it H customer or whatever the name in the Hbase, and the row key will be the ID plus name in the byte array format, and the column family plus qualifier will represent the nonkey column which will be the age and the salary. Now we can switch our focus to the HSpark. Why this project can be fit into this scenario, so because the HSpark is a combination of Spark optimizer and the Hbase, the filtering, pruning capabilities. In the large collateral basically we want to use the Spark to optimize the query and also in the physical layer we want to push down the predicate to the Hbase, then leverage its filtering pruning capabilities for performance. From this perspective, HSpark looks very fitful for this scenario. It also fits into the Spark ecosystem, because we can, user can still use the data set to carry the database. Also from the Hbase user perspective, we provide the Spark SQL and data set so it's easy to use SQL to carry those unstructured data in structured way. Also in the market there are dimension, there are some similar technologies. One of this is the Apache Phoenix, which is built SQL layer on the Hbase, but it's not using, leverage the Spark. So from the performance perspective, we are not leveraging any Spark logical level optimizer. So the performance won't be that good. Spark has very advanced optimization in the logical level. Another thing is the, in the Hbase there are also Spark connectors. But also, they are not leveraging the performance power for the physical pushdown into the Hbase. So, here is the summary of the HSpark project. HSpark basically is running Spark on Hbase. We are leveraging the Spark's framework as the, such as the parser, optimizer and execution engine. While you are doing DDL or query, the parser is already there. We are leveraging the Spark parser. We are also leveraging the logical level optimizer from Spark, and push down the physical execution to the Hbase. The Hbase will act as the actual data source, and for the metadata information, we create additional extra table in the Hbase to store notes information like the mapping, the logical table name mapped to the physical table name, something like that. And also for the predicates, we prune the predicate and based on its, the region information, and also some of the data will be included in the baseline information. So additional filter can be used to avoid the full scan in the Hbase. Another feature in the HSpark is the bulk load.

6 Data, a lot of scenario is you try to load the historical data into your Hbase to do the data analysis. Bulk load is another feature we support. We want to allow users to load the historical data into the Hbase by using the HSpark. The data types, currently we are supporting a subset of the HSpark, Spark data types, gives type listed, string, byte, the float, double, boolean, date and time stamp. The reason we are not supporting every data type in Spark is that because every data type, as I said before, the data is actually stored in the Hbase is the binary array. So every data type, when we want to support, we need to provide our own algorithm to convert data from the actual value to the binary array, and also because the row key is ordered so we want to preserve the ordering when it convert to the binary array. Let me give you an example. For example, if you have a integer data type, you have two values. One value is the minus 1. One value is plus 1. By simply convert to the binary array, the minus 1 may have a bit, the higher bit as 1, like 10001, then the plus 1 may convert to So in the binary format, the ordering may be reversed, instead of the correct order. It will provide the actual algorithm to convert to the binary array. Also we want to keep it original ordering. That is one of the challenge. That is why some of the data type we are still working on. Another point is when we convert the actual data value to the binary array, we want to use the last byte as much as possible. So that means we want to use less storage to store the data. That will also improve the performance, because when the data shuffles they are traveling through the network. It will be less data if we use less bytes. So another example how you can improve the performance by using less data, so suppose we have three columns, one column is date. One column is string. The third column is the boolean. We want three columns as composed key to convert to a row key which is binary array, as I said. One solution is we use something like a index. First we provide the column length which is 3. We provide the offset of the first value, then we provide the length of the first column, then we provide the offset of the second column. Then we provide the length of the second column as a feature. In this way we basically use the index to provide those information and also provide the location of the actual data. Then we convert the whole thing into the binary array. The second solution is we just provide the data, because it's a date, we can convert to a full [inaudible] second is the string, because string could be any length, so we add the 00 to the end

7 of the string, which will be the ending indicator, and then we add the third data, this is the boolean, we can just use the one byte. So here, compared solution 2 and solution 1, we got much less the bytes. In this way we can save a lot of byte and also save a lot of storage. Here is just an example of how we can implement the conversion. Another thing in HSpark is, we want to, how we can converting the predicates into the, push down to the Hbase. One thing is the parser at the logical level it will be automatically optimized by the Spark. At the [inaudible] level the predicate will be divided into two groups. One is the things that can be handled by the Hbase, and the rest is cannot be handled by Hbase. For the Hbase doable predicate, we will first eliminate the not, then reduce the predicates by the Hbase region. Then use the different filter to handle the nonkey and the key column. Then we use the filter to scan the data there. Eventually we will construct a final result based on the scan, because there was also a part left Hbase cannot be handled. Currently, the HSpark has the syntax similar to the Spark. It has create a database, create a table, then query and drop kind of DDL and the queries. The key thing here is we want to leverage the Spark parser as much as possible. So some of the Hbase specific informations will be added as the table parses. We will see shortly in the demo how it works with the current Spark parsers. The SQL query is always supported, it is nothing changed. When the table is created, so all SQL query will be supported. >> MARC-ARTHUR PIERRE LOUIS: Bo, I have a quick question for you. The HSpark team, you are part of that team. Is it made of IBMers only? Or do we have external people in that team? >> BO MENG: Right now it's IBM only. >> MARC-ARTHUR PIERRE LOUIS: What release are you guys in right now? >> BO MENG: I will talk about shortly. >> MARC-ARTHUR PIERRE LOUIS: Good. >> BO MENG: For the HSpark, we also in the release, we also have the developer or the user to quickly doing the test. So we provide two tools to help them. One is the HSpark SQL shell. This shell is if you use Spark you know how the shell works. You can try the SQL in the shell. It will automatically give you the response, like the result, how if you do the query, if you do the creation, the table creation, it will give you the result. And also we provide Python shell to help people familiar with

8 Python language. Here is the some statistics of the HSpark. HSpark project is right now in the GitHub. It's code version is same as Spark version, because as Spark [inaudible] some code changed. We want to make sure HSpark will work with certain version of Spark. So the current release version is It will work with same Spark version. It worked with same environment like JDK and Scala in the GitHub. Developers, right now, is the Yan Zhou and me. In the Read me you can follow the read me to set up the environment and run the test and do all these things from spark. This project is published on the Spark package. The Spark package is like a host for all the third party for all the contribution to Apache Spark. You can find that project also. >> MARC-ARTHUR PIERRE LOUIS: Quick question that I have for you is, and that is question when rubber meets the road where these tech talks are concerned, if somebody in the audience wanted to kick the tire of HSpark, how easy would it be for them to set it up and see what you are talking about? >> BO MENG: I think we also have the, as the base presentation, along with the pattern is to help you to set up the environment and doing a quick test for this project. I think follow that pattern will be quite easy to help you to understand how to set up the environment. >> MARC-ARTHUR PIERRE LOUIS: Excellent. It's nice to be talking to the developers themselves. This project is your baby, right? >> BO MENG: This project can be used as Spark plan for developers to use Hbase on, to use the Hbase by through the Spark. So it can be used as the data scientists to query the information using the SQL. But the data can be reside on the Hbase. >> MARC-ARTHUR PIERRE LOUIS: Good. >> BO MENG: Any contribution, any tester, any improvement and it's just very welcome to this project. But our main communication channel is the GitHub. So you can file any through the GitHub. For to-dos, we still, there are some ongoing performance testing on the clusters, to compare to other approach. We want to support the different data paths, as I said each data path need algorithm to convert the actual data into the binary array. Another big part is the Hbase has coprocessor, went to support in the future release. We want to improve the documents and also we want to improve the code. As I said, as I showed in the last page, we have the actual, the code, and also we have the test code. The test coverage is pretty good right now. But we still want to improve it.

9 Then the final thing is to update to the Spark package to the latest version, the link to the GitHub. So, here I'm going to show a demo how to use HSpark shell to assess the data, how to create the table and import data and do the query. It will be a video, but I will comment along with the video. Another thing I want to mention is this demo is actually you can find this demo in the pattern, IBM pattern Web Page with the -- let me show you. We have a pattern here, it's called use Spark [inaudible] Hbase tables. The link here actually you can view the video. You can work through how you set up the environment, how to do the query. But I will briefly show this video. Suppose you follow that pattern to set up the environment. You can invoke the HSpark which is the HSpark shell. This shell will be released along with the HSpark version. In the folder you can find this in the shell, it will give you this environment. This is the shell. In the shell, we want to first create some data table. So I put some schemas down below to show the table. But basically we want to use the TPPPS which is benchmark for this. I show the table schema here to allow you to understand each table, and also the relations between tables. Here is the table. Also you can find the actual script in the, in this pattern. There was a folder, that already has the table creation script. So in the script folder, you have the create table text. It will have the, first it has the commented, which is the formatted script. Then you have just one line script, because if you want to cut and paste, you want just one line to paste into this shell. Here I paste the shell to the shell, then it will execute it. Then you can see this script executed, create a table. Follow this script, you create like five tables. Each table has a different key column, nonkey column. So after you create the tables, you will basically want to take a look at the tables, how the table schema gets created. And because the HTPS and the actual may have different data paths you may want to take a look. The second thing is load the data into the structure. As I said in the presentation, HSpark has bulk load. We have syntax here, load the data into the table. So one thing you want to make sure is that in the path parameter you want to make the change to the actual, you know, the location of the data. I put the location, actual location. You will see data is load to the table, shell's table. After the data is loaded we can query the table by using the standard SQL query, SQL language. In the script we also have the, some examples, how you query

10 a table. The nice thing here is, I put the SQL here, and also I put the expected results. You can verify if the result is expected, because sometimes if it's something wrong, you may get a different result but the result may not be as expected. For first query, how many lines of the table, it's got a hundred. Also we can try the other things like the query, query 3, will be more complicated. So you get results here. So you can compare the result with the actual result, which will be shown in the examples. Another thing I want to mention is if you are a developer, you want to use the codes to query the table, you can go to the actual project at GitHub. There is a text folder. You have a lot of test codes how you can, show you how you can use the actual code to access the HSpark, use the HSpark. So it will help you to get started, if you are developers. >> MARC-ARTHUR PIERRE LOUIS: I got a quick question for you, Bo. >> BO MENG: Yes. >> MARC-ARTHUR PIERRE LOUIS: As we do in JDBC and we have a connector that we can use to issue [inaudible] (cavernous audio). In HSpark, a connector? >> BO MENG: Yeah, in the Hbase it's also, you need a connection to connect to the H master. In the test code you will see how you can set up a connection to the H master, that is for the HSpark shell, it is supposed to connect to your local H master so it just connect with your local. But yeah. But in the code, you can configure to, configure whatever the machine that is the H master. >> MARC-ARTHUR PIERRE LOUIS: Thanks. >> BO MENG: That is the demo. That will also conclude my presentation for the HSpark. And any questions? I think we still have some time to answer the questions. I also mentioned before we have Yan Zhou here, he can answer the questions if there are any coming. >> MARC-ARTHUR PIERRE LOUIS: I'm looking into the chat room here. I don't see remaining questions. Do you have any questions that we answer [inaudible] >> You are asking if I have any questions? >> MARC-ARTHUR PIERRE LOUIS: In the chat room, I'm wondering if there are some questions that are left to be answered or if all of them have been answered already. >> I answered a few, a couple questions, regarding the use of, to load data directory into HSpark is question 1. There is another question regarding data unique list, how is that guaranteed. The third question I'm not sure, it's why are we parsing cf name for delete.

11 >> MARC-ARTHUR PIERRE LOUIS: Yeah, no. What I was wondering, I was wondering if there were any other questions left. It doesn't seem there are any questions left in the chat room. >> I'm not seeing that. I'm only seeing three questions. >> MARC-ARTHUR PIERRE LOUIS: Okay. All right. Well, so we thank you, Bo and Yan for this presentation. It is certainly an interesting technology that people can use, take a look at and use for their applications. I have a question for you, Bo. >> BO MENG: Yeah. >> MARC-ARTHUR PIERRE LOUIS: What about bindings, are they only available -- you said Python shell, but how about Java and other languages? Are there bindings available? >> BO MENG: Yeah, basically it's HSpark can be used as, we also have to test the code to show you how to use the Java to assess the Hbase, by using the HSpark. Just take a look. Currently I think we support... (buzzing). -- the Java and Scala to -- >> And Python. >> MARC-ARTHUR PIERRE LOUIS: Yes. Python. Okay. I thank you. Certainly people can take a look at the pattern if they want to try it. You will see how you can use this technology. They don't need to have a, do they need to have a Spark account or can they load that on their laptops? >> BO MENG: It can be load on just single laptops. You don't have to use the clusters. But in the real environment you may need a cluster. But for test purposes, you can just use one machine, just laptop is okay. >> MARC-ARTHUR PIERRE LOUIS: Very good. Thank you. People can certainly take a look at the pattern if they want to replicate what you have done. There is always use for new language and new technology to access Spark. Thank you for your time. Looking forward to see how [inaudible] this presentation. We got three things that I want to discuss real quick before we let you go, the audience. Index is next week, the 20th through the 22nd at the [inaudible] center in California in San Fran. We invite you to be there. So far we have about 1,000 people registered. We need more than that. It's a great conference. Go register and be there and learn from developers, put together by developers, so that is going to be great, Index first of its kind at IBM. The next thing that we want to talk about is our pattern, next week we have a pattern, it's about how you can use Watson natural language understanding and NLTK natural language toolkit and DSX to gain insights on data. That is going to be a presentation that shows you how to use these to gain insights

12 [inaudible] world. That is going to be at 1:00 next week. Then the week after, the week of the 26th we are going to start, we are going to have [inaudible] tech talks, that which can be open source week. It is going to start the week after the 26th, week of the 26th. We are going to have four days of great tech talks that are going to be offered in blocks of two hours, and each two hours are going to have four tech talks. We are going to talk about OpenWhisk, other open source projects, that are going to be there for you, OCI, ODPI, all those great projects are going to be there, Java, Python, Jupyter Notebook, Jupyter and all of that great stuff is going to be there for that week of the 26th. We invite you to come and [inaudible] tech talks. We have a lot lined up for you next week and the week after. Some of you are going to be at Index. While you are at Index we are going to do one tech talk. The week after we have a bunch of tech talks for you, the week of the 26th. Thank you for your time and we hope to see you next week [inaudible] coalition tech talk. Thank you for your time and we will see you next week. Bye-bye. >> BO MENG: Thank you, bye-bye. (end of call at 12:51 p.m. CST)

Hello, and welcome to another episode of. Getting the Most Out of IBM U2. This is Kenny Brunel, and

Hello, and welcome to another episode of. Getting the Most Out of IBM U2. This is Kenny Brunel, and Hello, and welcome to another episode of Getting the Most Out of IBM U2. This is Kenny Brunel, and I'm your host for today's episode which introduces wintegrate version 6.1. First of all, I've got a guest

More information

MITOCW watch?v=0jljzrnhwoi

MITOCW watch?v=0jljzrnhwoi MITOCW watch?v=0jljzrnhwoi The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To

More information

Data Informatics. Seon Ho Kim, Ph.D.

Data Informatics. Seon Ho Kim, Ph.D. Data Informatics Seon Ho Kim, Ph.D. seonkim@usc.edu HBase HBase is.. A distributed data store that can scale horizontally to 1,000s of commodity servers and petabytes of indexed storage. Designed to operate

More information

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Big Data Hadoop Developer Course Content Who is the target audience? Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Complete beginners who want to learn Big Data Hadoop Professionals

More information

Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture

Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture Hadoop 1.0 Architecture Introduction to Hadoop & Big Data Hadoop Evolution Hadoop Architecture Networking Concepts Use cases

More information

Lesson 3 Transcript: Part 1 of 2 - Tools & Scripting

Lesson 3 Transcript: Part 1 of 2 - Tools & Scripting Lesson 3 Transcript: Part 1 of 2 - Tools & Scripting Slide 1: Cover Welcome to lesson 3 of the db2 on Campus lecture series. Today we're going to talk about tools and scripting, and this is part 1 of 2

More information

Big Data Architect.

Big Data Architect. Big Data Architect www.austech.edu.au WHAT IS BIG DATA ARCHITECT? A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional

More information

The following content is provided under a Creative Commons license. Your support

The following content is provided under a Creative Commons license. Your support MITOCW Lecture 9 The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high-quality educational resources for free. To make a donation

More information

Introduction to BigData, Hadoop:-

Introduction to BigData, Hadoop:- Introduction to BigData, Hadoop:- Big Data Introduction: Hadoop Introduction What is Hadoop? Why Hadoop? Hadoop History. Different types of Components in Hadoop? HDFS, MapReduce, PIG, Hive, SQOOP, HBASE,

More information

Database infrastructure for electronic structure calculations

Database infrastructure for electronic structure calculations Database infrastructure for electronic structure calculations Fawzi Mohamed fawzi.mohamed@fhi-berlin.mpg.de 22.7.2015 Why should you be interested in databases? Can you find a calculation that you did

More information

Oracle Big Data SQL High Performance Data Virtualization Explained

Oracle Big Data SQL High Performance Data Virtualization Explained Keywords: Oracle Big Data SQL High Performance Data Virtualization Explained Jean-Pierre Dijcks Oracle Redwood City, CA, USA Big Data SQL, SQL, Big Data, Hadoop, NoSQL Databases, Relational Databases,

More information

Databricks, an Introduction

Databricks, an Introduction Databricks, an Introduction Chuck Connell, Insight Digital Innovation Insight Presentation Speaker Bio Senior Data Architect at Insight Digital Innovation Focus on Azure big data services HDInsight/Hadoop,

More information

Asanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks

Asanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks Asanka Padmakumara ETL 2.0: Data Engineering with Azure Databricks Who am I? Asanka Padmakumara Business Intelligence Consultant, More than 8 years in BI and Data Warehousing A regular speaker in data

More information

mismatch between what is maybe possible today and what is going on in many of today's IDEs.

mismatch between what is maybe possible today and what is going on in many of today's IDEs. What will happen if we do very, very small and lightweight tools instead of heavyweight, integrated big IDEs? Lecturer: Martin Lippert, VMware and Eclispe tooling expert LIPPERT: Welcome, everybody, to

More information

Stages of Data Processing

Stages of Data Processing Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,

More information

I'm Andy Glover and this is the Java Technical Series of. the developerworks podcasts. My guest is Brian Jakovich. He is the

I'm Andy Glover and this is the Java Technical Series of. the developerworks podcasts. My guest is Brian Jakovich. He is the I'm Andy Glover and this is the Java Technical Series of the developerworks podcasts. My guest is Brian Jakovich. He is the director of Elastic Operations for Stelligent. He and I are going to talk about

More information

Lesson 3 Transcript: Part 2 of 2 Tools & Scripting

Lesson 3 Transcript: Part 2 of 2 Tools & Scripting Lesson 3 Transcript: Part 2 of 2 Tools & Scripting Slide 1: Cover Welcome to lesson 3 of the DB2 on Campus Lecture Series. Today we are going to talk about tools and scripting. And this is part 2 of 2

More information

Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam

Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem Zohar Elkayam www.realdbamagic.com Twitter: @realmgic Who am I? Zohar Elkayam, CTO at Brillix Programmer, DBA, team leader, database trainer,

More information

Lesson 13 Transcript: User-Defined Functions

Lesson 13 Transcript: User-Defined Functions Lesson 13 Transcript: User-Defined Functions Slide 1: Cover Welcome to Lesson 13 of DB2 ON CAMPUS LECTURE SERIES. Today, we are going to talk about User-defined Functions. My name is Raul Chong, and I'm

More information

Introduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data

Introduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data Introduction to Hadoop High Availability Scaling Advantages and Challenges Introduction to Big Data What is Big data Big Data opportunities Big Data Challenges Characteristics of Big data Introduction

More information

Formal Methods of Software Design, Eric Hehner, segment 24 page 1 out of 5

Formal Methods of Software Design, Eric Hehner, segment 24 page 1 out of 5 Formal Methods of Software Design, Eric Hehner, segment 24 page 1 out of 5 [talking head] This lecture we study theory design and implementation. Programmers have two roles to play here. In one role, they

More information

Lesson 14 Transcript: Triggers

Lesson 14 Transcript: Triggers Lesson 14 Transcript: Triggers Slide 1: Cover Welcome to Lesson 14 of DB2 on Campus Lecture Series. Today, we are going to talk about Triggers. My name is Raul Chong, and I'm the DB2 on Campus Program

More information

Integration of Apache Hive

Integration of Apache Hive Integration of Apache Hive and HBase Enis Soztutar enis [at] apache [dot] org @enissoz Page 1 Agenda Overview of Hive and HBase Hive + HBase Features and Improvements Future of Hive and HBase Q&A Page

More information

Hadoop. Introduction / Overview

Hadoop. Introduction / Overview Hadoop Introduction / Overview Preface We will use these PowerPoint slides to guide us through our topic. Expect 15 minute segments of lecture Expect 1-4 hour lab segments Expect minimal pretty pictures

More information

Week - 01 Lecture - 04 Downloading and installing Python

Week - 01 Lecture - 04 Downloading and installing Python Programming, Data Structures and Algorithms in Python Prof. Madhavan Mukund Department of Computer Science and Engineering Indian Institute of Technology, Madras Week - 01 Lecture - 04 Downloading and

More information

Text transcript of show #280. August 18, Microsoft Research: Trinity is a Graph Database and a Distributed Parallel Platform for Graph Data

Text transcript of show #280. August 18, Microsoft Research: Trinity is a Graph Database and a Distributed Parallel Platform for Graph Data Hanselminutes is a weekly audio talk show with noted web developer and technologist Scott Hanselman and hosted by Carl Franklin. Scott discusses utilities and tools, gives practical how-to advice, and

More information

Webinar Series TMIP VISION

Webinar Series TMIP VISION Webinar Series TMIP VISION TMIP provides technical support and promotes knowledge and information exchange in the transportation planning and modeling community. Today s Goals To Consider: Parallel Processing

More information

Certified Big Data and Hadoop Course Curriculum

Certified Big Data and Hadoop Course Curriculum Certified Big Data and Hadoop Course Curriculum The Certified Big Data and Hadoop course by DataFlair is a perfect blend of in-depth theoretical knowledge and strong practical skills via implementation

More information

The following content is provided under a Creative Commons license. Your support

The following content is provided under a Creative Commons license. Your support MITOCW Recitation 1 The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high-quality educational resources for free. To make

More information

Who am I? I m a python developer who has been working on OpenStack since I currently work for Aptira, who do OpenStack, SDN, and orchestration

Who am I? I m a python developer who has been working on OpenStack since I currently work for Aptira, who do OpenStack, SDN, and orchestration Who am I? I m a python developer who has been working on OpenStack since 2011. I currently work for Aptira, who do OpenStack, SDN, and orchestration consulting. I m here today to help you learn from my

More information

Time Series Storage with Apache Kudu (incubating)

Time Series Storage with Apache Kudu (incubating) Time Series Storage with Apache Kudu (incubating) Dan Burkert (Committer) dan@cloudera.com @danburkert Tweet about this talk: @getkudu or #kudu 1 Time Series machine metrics event logs sensor telemetry

More information

Blended Learning Outline: Cloudera Data Analyst Training (171219a)

Blended Learning Outline: Cloudera Data Analyst Training (171219a) Blended Learning Outline: Cloudera Data Analyst Training (171219a) Cloudera Univeristy s data analyst training course will teach you to apply traditional data analytics and business intelligence skills

More information

Writing Cognitive Swift Apps developerworks Open Tech Talk March 8, 2017

Writing Cognitive Swift Apps developerworks Open Tech Talk March 8, 2017 Writing Cognitive Swift Apps developerworks Open Tech Talk March 8, 2017 https://developer.ibm.com/open/videos/writing-cognitive-swift-apps-tech-talk/ Question Can you please also compare Swift and Go?

More information

Speech 2 Part 2 Transcript: The role of DB2 in Web 2.0 and in the IOD World

Speech 2 Part 2 Transcript: The role of DB2 in Web 2.0 and in the IOD World Speech 2 Part 2 Transcript: The role of DB2 in Web 2.0 and in the IOD World Slide 1: Cover Welcome to the speech, The role of DB2 in Web 2.0 and in the Information on Demand World. This is the second speech

More information

Security and Performance advances with Oracle Big Data SQL

Security and Performance advances with Oracle Big Data SQL Security and Performance advances with Oracle Big Data SQL Jean-Pierre Dijcks Oracle Redwood Shores, CA, USA Key Words SQL, Oracle, Database, Analytics, Object Store, Files, Big Data, Big Data SQL, Hadoop,

More information

Hive and Shark. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic)

Hive and Shark. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic) Hive and Shark Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) Hive and Shark 1393/8/19 1 / 45 Motivation MapReduce is hard to

More information

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Cloudera s Developer Training for Apache Spark and Hadoop delivers the key concepts and expertise need to develop high-performance

More information

MITOCW watch?v=w_-sx4vr53m

MITOCW watch?v=w_-sx4vr53m MITOCW watch?v=w_-sx4vr53m The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high-quality educational resources for free. To

More information

Module - 17 Lecture - 23 SQL and NoSQL systems. (Refer Slide Time: 00:04)

Module - 17 Lecture - 23 SQL and NoSQL systems. (Refer Slide Time: 00:04) Introduction to Morden Application Development Dr. Gaurav Raina Prof. Tanmai Gopal Department of Computer Science and Engineering Indian Institute of Technology, Madras Module - 17 Lecture - 23 SQL and

More information

Rails on HBase. Zachary Pinter and Tony Hillerson RailsConf 2011

Rails on HBase. Zachary Pinter and Tony Hillerson RailsConf 2011 Rails on HBase Zachary Pinter and Tony Hillerson RailsConf 2011 What we will cover What is it? What are the tradeoffs that HBase makes? Why HBase is probably the wrong choice for your app Why HBase might

More information

Principal Software Engineer Red Hat Emerging Technology June 24, 2015

Principal Software Engineer Red Hat Emerging Technology June 24, 2015 USING APACHE SPARK FOR ANALYTICS IN THE CLOUD William C. Benton Principal Software Engineer Red Hat Emerging Technology June 24, 2015 ABOUT ME Distributed systems and data science in Red Hat's Emerging

More information

SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism

SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism Big Data and Hadoop with Azure HDInsight Andrew Brust Senior Director, Technical Product Marketing and Evangelism Datameer Level: Intermediate Meet Andrew Senior Director, Technical Product Marketing and

More information

Databases and Big Data Today. CS634 Class 22

Databases and Big Data Today. CS634 Class 22 Databases and Big Data Today CS634 Class 22 Current types of Databases SQL using relational tables: still very important! NoSQL, i.e., not using relational tables: term NoSQL popular since about 2007.

More information

School of Computer Science CPS109 Course Notes 5 Alexander Ferworn Updated Fall 15

School of Computer Science CPS109 Course Notes 5 Alexander Ferworn Updated Fall 15 Table of Contents 1 INTRODUCTION... 1 2 IF... 1 2.1 BOOLEAN EXPRESSIONS... 3 2.2 BLOCKS... 3 2.3 IF-ELSE... 4 2.4 NESTING... 5 3 SWITCH (SOMETIMES KNOWN AS CASE )... 6 3.1 A BIT ABOUT BREAK... 7 4 CONDITIONAL

More information

Big Data Hadoop Course Content

Big Data Hadoop Course Content Big Data Hadoop Course Content Topics covered in the training Introduction to Linux and Big Data Virtual Machine ( VM) Introduction/ Installation of VirtualBox and the Big Data VM Introduction to Linux

More information

Introduction to Databases and SQL

Introduction to Databases and SQL Introduction to Databases and SQL Files vs Databases In the last chapter you learned how your PHP scripts can use external files to store and retrieve data. Although files do a great job in many circumstances,

More information

MITOCW watch?v=flgjisf3l78

MITOCW watch?v=flgjisf3l78 MITOCW watch?v=flgjisf3l78 The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high-quality educational resources for free. To

More information

Data in the Cloud and Analytics in the Lake

Data in the Cloud and Analytics in the Lake Data in the Cloud and Analytics in the Lake Introduction Working in Analytics for over 5 years Part the digital team at BNZ for 3 years Based in the Auckland office Preferred Languages SQL Python (PySpark)

More information

MITOCW watch?v=zm5mw5nkzjg

MITOCW watch?v=zm5mw5nkzjg MITOCW watch?v=zm5mw5nkzjg The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To

More information

Subversion was not there a minute ago. Then I went through a couple of menus and eventually it showed up. Why is it there sometimes and sometimes not?

Subversion was not there a minute ago. Then I went through a couple of menus and eventually it showed up. Why is it there sometimes and sometimes not? Subversion was not there a minute ago. Then I went through a couple of menus and eventually it showed up. Why is it there sometimes and sometimes not? Trying to commit a first file. There is nothing on

More information

Welcome to this IBM podcast, Realizing More. Value from Your IMS Compiler Upgrade. I'm Kimberly Gist

Welcome to this IBM podcast, Realizing More. Value from Your IMS Compiler Upgrade. I'm Kimberly Gist IBM Podcast [ MUSIC ] Welcome to this IBM podcast, Realizing More Value from Your IMS Compiler Upgrade. I'm Kimberly Gist with IBM. System z compilers continue to deliver the latest programming interfaces

More information

Overview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training::

Overview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training:: Module Title Duration : Cloudera Data Analyst Training : 4 days Overview Take your knowledge to the next level Cloudera University s four-day data analyst training course will teach you to apply traditional

More information

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423

More information

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development:: Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized

More information

A Tutorial on Apache Spark

A Tutorial on Apache Spark A Tutorial on Apache Spark A Practical Perspective By Harold Mitchell The Goal Learning Outcomes The Goal Learning Outcomes NOTE: The setup, installation, and examples assume Windows user Learn the following:

More information

CSE 444: Database Internals. Lecture 23 Spark

CSE 444: Database Internals. Lecture 23 Spark CSE 444: Database Internals Lecture 23 Spark References Spark is an open source system from Berkeley Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. Matei

More information

Going Big Data on Apache Spark. KNIME Italy Meetup

Going Big Data on Apache Spark. KNIME Italy Meetup Going Big Data on Apache Spark KNIME Italy Meetup Agenda Introduction Why Apache Spark? Section 1 Gathering Requirements Section 2 Tool Choice Section 3 Architecture Section 4 Devising New Nodes Section

More information

Lesson 2 Transcript: Part 2 of 2 - The DB2 Environment

Lesson 2 Transcript: Part 2 of 2 - The DB2 Environment Lesson 2 Transcript: Part 2 of 2 - The DB2 Environment Slide 1: Cover Welcome to Lesson 2 of the DB2 on Campus lecture series. Today we're talking about the DB2 environment, and this is part 2 of 2 parts.

More information

Comparing SQL and NOSQL databases

Comparing SQL and NOSQL databases COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2014 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations

More information

Data Analytics at Logitech Snowflake + Tableau = #Winning

Data Analytics at Logitech Snowflake + Tableau = #Winning Welcome # T C 1 8 Data Analytics at Logitech Snowflake + Tableau = #Winning Avinash Deshpande I am a futurist, scientist, engineer, designer, data evangelist at heart Find me at Avinash Deshpande Chief

More information

MITOCW ocw f99-lec07_300k

MITOCW ocw f99-lec07_300k MITOCW ocw-18.06-f99-lec07_300k OK, here's linear algebra lecture seven. I've been talking about vector spaces and specially the null space of a matrix and the column space of a matrix. What's in those

More information

MITOCW watch?v=yarwp7tntl4

MITOCW watch?v=yarwp7tntl4 MITOCW watch?v=yarwp7tntl4 The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality, educational resources for free.

More information

The Stack, Free Store, and Global Namespace

The Stack, Free Store, and Global Namespace Pointers This tutorial is my attempt at clarifying pointers for anyone still confused about them. Pointers are notoriously hard to grasp, so I thought I'd take a shot at explaining them. The more information

More information

Analyzing Flight Data

Analyzing Flight Data IBM Analytics Analyzing Flight Data Jeff Carlson Rich Tarro July 21, 2016 2016 IBM Corporation Agenda Spark Overview a quick review Introduction to Graph Processing and Spark GraphX GraphX Overview Demo

More information

6.001 Notes: Section 15.1

6.001 Notes: Section 15.1 6.001 Notes: Section 15.1 Slide 15.1.1 Our goal over the next few lectures is to build an interpreter, which in a very basic sense is the ultimate in programming, since doing so will allow us to define

More information

Certified Big Data Hadoop and Spark Scala Course Curriculum

Certified Big Data Hadoop and Spark Scala Course Curriculum Certified Big Data Hadoop and Spark Scala Course Curriculum The Certified Big Data Hadoop and Spark Scala course by DataFlair is a perfect blend of indepth theoretical knowledge and strong practical skills

More information

The following content is provided under a Creative Commons license. Your support

The following content is provided under a Creative Commons license. Your support MITOCW Lecture 2 The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To make a donation

More information

CLIENT ONBOARDING PLAN & SCRIPT

CLIENT ONBOARDING PLAN & SCRIPT CLIENT ONBOARDING PLAN & SCRIPT FIRST STEPS Receive Order form from Sales Representative. This may come in the form of a BPQ from client Ensure the client has an account in Reputation Management and in

More information

How we build TiDB. Max Liu PingCAP Amsterdam, Netherlands October 5, 2016

How we build TiDB. Max Liu PingCAP Amsterdam, Netherlands October 5, 2016 How we build TiDB Max Liu PingCAP Amsterdam, Netherlands October 5, 2016 About me Infrastructure engineer / CEO of PingCAP Working on open source projects: TiDB: https://github.com/pingcap/tidb TiKV: https://github.com/pingcap/tikv

More information

Innovatus Technologies

Innovatus Technologies HADOOP 2.X BIGDATA ANALYTICS 1. Java Overview of Java Classes and Objects Garbage Collection and Modifiers Inheritance, Aggregation, Polymorphism Command line argument Abstract class and Interfaces String

More information

SearchWinIT.com SearchExchange.com SearchSQLServer.com

SearchWinIT.com SearchExchange.com SearchSQLServer.com TechTarget Windows Media SearchWinIT.com SearchExchange.com SearchSQLServer.com SearchEnterpriseDesktop.com SearchWindowsServer.com SearchDomino.com LabMice.net E-Guide Mid-Market Guide to Architecting

More information

CLIENT ONBOARDING PLAN & SCRIPT

CLIENT ONBOARDING PLAN & SCRIPT CLIENT ONBOARDING PLAN & SCRIPT FIRST STEPS Receive Order form from Sales Representative. This may come in the form of a BPQ from client Ensure the client has an account in Reputation Management and in

More information

R Language for the SQL Server DBA

R Language for the SQL Server DBA R Language for the SQL Server DBA Beginning with R Ing. Eduardo Castro, PhD, Principal Data Analyst Architect, LP Consulting Moderated By: Jose Rolando Guay Paz Thank You microsoft.com idera.com attunity.com

More information

Post Experiment Interview Questions

Post Experiment Interview Questions Post Experiment Interview Questions Questions about the Maximum Problem 1. What is this problem statement asking? 2. What is meant by positive integers? 3. What does it mean by the user entering valid

More information

Chrome if I want to. What that should do, is have my specifications run against four different instances of Chrome, in parallel.

Chrome if I want to. What that should do, is have my specifications run against four different instances of Chrome, in parallel. Hi. I'm Prateek Baheti. I'm a developer at ThoughtWorks. I'm currently the tech lead on Mingle, which is a project management tool that ThoughtWorks builds. I work in Balor, which is where India's best

More information

PROFESSOR: Last time, we took a look at an explicit control evaluator for Lisp, and that bridged the gap between

PROFESSOR: Last time, we took a look at an explicit control evaluator for Lisp, and that bridged the gap between MITOCW Lecture 10A [MUSIC PLAYING] PROFESSOR: Last time, we took a look at an explicit control evaluator for Lisp, and that bridged the gap between all these high-level languages like Lisp and the query

More information

Igniting QuantLib on a Zeppelin

Igniting QuantLib on a Zeppelin Igniting QuantLib on a Zeppelin Andreas Pfadler, d-fine GmbH QuantLib UserMeeting, Düsseldorf, 7.12.2016 d-fine d-fine All rights All rights reserved reserved 0 Welcome Back!» An early stage of this work

More information

Scalable Tools - Part I Introduction to Scalable Tools

Scalable Tools - Part I Introduction to Scalable Tools Scalable Tools - Part I Introduction to Scalable Tools Adisak Sukul, Ph.D., Lecturer, Department of Computer Science, adisak@iastate.edu http://web.cs.iastate.edu/~adisak/mbds2018/ Scalable Tools session

More information

Hive SQL over Hadoop

Hive SQL over Hadoop Hive SQL over Hadoop Antonino Virgillito THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Introduction Apache Hive is a high-level abstraction on top of MapReduce Uses

More information

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Copyright 2012, Oracle and/or its affiliates. All rights reserved. 1 Big Data Connectors: High Performance Integration for Hadoop and Oracle Database Melli Annamalai Sue Mavris Rob Abbott 2 Program Agenda Big Data Connectors: Brief Overview Connecting Hadoop with Oracle

More information

COSC 6339 Big Data Analytics. NoSQL (II) HBase. Edgar Gabriel Fall HBase. Column-Oriented data store Distributed designed to serve large tables

COSC 6339 Big Data Analytics. NoSQL (II) HBase. Edgar Gabriel Fall HBase. Column-Oriented data store Distributed designed to serve large tables COSC 6339 Big Data Analytics NoSQL (II) HBase Edgar Gabriel Fall 2018 HBase Column-Oriented data store Distributed designed to serve large tables Billions of rows and millions of columns Runs on a cluster

More information

Digital Marketing Manager, Marketing Manager, Agency Owner. Bachelors in Marketing, Advertising, Communications, or equivalent experience

Digital Marketing Manager, Marketing Manager, Agency Owner. Bachelors in Marketing, Advertising, Communications, or equivalent experience Persona name Amanda Industry, geographic or other segments B2B Roles Digital Marketing Manager, Marketing Manager, Agency Owner Reports to VP Marketing or Agency Owner Education Bachelors in Marketing,

More information

Oracle Big Data Connectors

Oracle Big Data Connectors Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process

More information

Welcome to today s Webcast. Thank you so much for joining us today!

Welcome to today s Webcast. Thank you so much for joining us today! Welcome to today s Webcast. Thank you so much for joining us today! My name is Michael Costa. I m a member of the DART Team, one of several groups engaged by HAB to provide training and technical assistance

More information

9 R1 Get another piece of paper. We re going to have fun keeping track of (inaudible). Um How much time do you have? Are you getting tired?

9 R1 Get another piece of paper. We re going to have fun keeping track of (inaudible). Um How much time do you have? Are you getting tired? Page: 1 of 14 1 R1 And this is tell me what this is? 2 Stephanie x times y plus x times y or hm? 3 R1 What are you thinking? 4 Stephanie I don t know. 5 R1 Tell me what you re thinking. 6 Stephanie Well.

More information

Instructor: Craig Duckett. Lecture 04: Thursday, April 5, Relationships

Instructor: Craig Duckett. Lecture 04: Thursday, April 5, Relationships Instructor: Craig Duckett Lecture 04: Thursday, April 5, 2018 Relationships 1 Assignment 1 is due NEXT LECTURE 5, Tuesday, April 10 th in StudentTracker by MIDNIGHT MID-TERM EXAM is LECTURE 10, Tuesday,

More information

Column-Family Databases Cassandra and HBase

Column-Family Databases Cassandra and HBase Column-Family Databases Cassandra and HBase Kevin Swingler Google Big Table Google invented BigTableto store the massive amounts of semi-structured data it was generating Basic model stores items indexed

More information

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS SUJEE MANIYAM FOUNDER / PRINCIPAL @ ELEPHANT SCALE www.elephantscale.com sujee@elephantscale.com HI, I M SUJEE MANIYAM Founder / Principal @ ElephantScale

More information

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop

More information

Welcome to another episode of Getting the Most. Out of IBM U2. I'm Kenny Brunel, and I'm your host for

Welcome to another episode of Getting the Most. Out of IBM U2. I'm Kenny Brunel, and I'm your host for Welcome to another episode of Getting the Most Out of IBM U2. I'm Kenny Brunel, and I'm your host for today's episode, and today we're going to talk about IBM U2's latest technology, U2.NET. First of all,

More information

Turning Relational Database Tables into Spark Data Sources

Turning Relational Database Tables into Spark Data Sources Turning Relational Database Tables into Spark Data Sources Kuassi Mensah Jean de Lavarene Director Product Mgmt Director Development Server Technologies October 04, 2017 3 Safe Harbor Statement The following

More information

Hadoop Development Introduction

Hadoop Development Introduction Hadoop Development Introduction What is Bigdata? Evolution of Bigdata Types of Data and their Significance Need for Bigdata Analytics Why Bigdata with Hadoop? History of Hadoop Why Hadoop is in demand

More information

Big Data with Hadoop Ecosystem

Big Data with Hadoop Ecosystem Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process

More information

SEER AKADEMI LINUX PROGRAMMING AND SCRIPTINGPERL 7

SEER AKADEMI LINUX PROGRAMMING AND SCRIPTINGPERL 7 SEER AKADEMI LINUX PROGRAMMING AND SCRIPTINGPERL 7 Hi everyone once again welcome to this lecture we are actually the course is Linux programming and scripting we have been talking about the Perl, Perl

More information

6 Stephanie Well. It s six, because there s six towers.

6 Stephanie Well. It s six, because there s six towers. Page: 1 of 10 1 R1 So when we divided by two all this stuff this is the row we ended up with. 2 Stephanie Um hm. 3 R1 Isn t that right? We had a row of six. Alright. Now before doing it see if you can

More information

Azon Master Class. By Ryan Stevenson Guidebook #5 WordPress Usage

Azon Master Class. By Ryan Stevenson   Guidebook #5 WordPress Usage Azon Master Class By Ryan Stevenson https://ryanstevensonplugins.com/ Guidebook #5 WordPress Usage Table of Contents 1. Widget Setup & Usage 2. WordPress Menu System 3. Categories, Posts & Tags 4. WordPress

More information

Chapter 1 Getting Started

Chapter 1 Getting Started Chapter 1 Getting Started The C# class Just like all object oriented programming languages, C# supports the concept of a class. A class is a little like a data structure in that it aggregates different

More information

Apache Phoenix We put the SQL back in NoSQL

Apache Phoenix We put the SQL back in NoSQL Apache Phoenix We put the SQL back in NoSQL http://phoenix.incubator.apache.org James Taylor @JamesPlusPlus Maryann Xue @MaryannXue Eli Levine @teleturn About James o o Engineer at Salesforce.com in BigData

More information

Blurring the Line Between Developer and Data Scientist

Blurring the Line Between Developer and Data Scientist Blurring the Line Between Developer and Data Scientist Notebooks with PixieDust va barbosa va@us.ibm.com Developer Advocacy IBM Watson Data Platform WHY ARE YOU HERE? More companies making bet-the-business

More information

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES 1 THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES Vincent Garonne, Mario Lassnig, Martin Barisits, Thomas Beermann, Ralph Vigne, Cedric Serfon Vincent.Garonne@cern.ch ph-adp-ddm-lab@cern.ch XLDB

More information