Slice Intelligence!

Similar documents
Data Science. Data Analyst. Data Scientist. Data Architect

Business Analytics Nanodegree Syllabus

Data Analyst Nanodegree Syllabus

CSE 140 wrapup. Michael Ernst CSE 140 University of Washington

Data Analyst Nanodegree Syllabus

Think & Work like a Data Scientist with SQL 2016 & R DR. SUBRAMANI PARAMASIVAM (MANI)

Dr. SubraMANI Paramasivam. Think & Work like a Data Scientist with SQL 2016 & R

Function Algorithms: Linear Regression, Logistic Regression

Using the Palladium Business Intelligence Functionality

Data Visualization via Conditional Formatting

Excel: Tables, Pivot Tables & More

Performance Evaluation

Commonwealth Computer Training Featured Workshop: WORD 2010 DOCUMENT MANAGEMENT

Ivy s Business Analytics Foundation Certification Details (Module I + II+ III + IV + V)

Workbooks (File) and Worksheet Handling


CS434 Notebook. April 19. Data Mining and Data Warehouse

Microsoft Analyzing and Visualizing Data with Power BI.

Supporting Information

SOFTWARE DEVELOPMENT: DATA SCIENCE

Michigan State University Team MSUFCU Banking with Amazon s Alexa and Apple s Siri Project Plan Spring 2017

Management Information Systems Review Questions. Chapter 6 Foundations of Business Intelligence: Databases and Information Management

Become strong in Excel (2.0) - 5 Tips To Rock A Spreadsheet!

Data Warehouses Chapter 12. Class 10: Data Warehouses 1

Creating a Classifier for a Focused Web Crawler

SQL Server Reporting Services

Sample Exam. Advanced Test Automation - Engineer

I Shopping on mobile / KSA

2. In Video #6, we used Power Query to append multiple Text Files into a single Proper Data Set:

As a reference, please find a version of the Machine Learning Process described in the diagram below.

Data Modeling in Looker

Outline. Project Update Data Mining: Answers without Queries. Principles of Information and Database Management 198:336 Week 12 Apr 25 Matthew Stone

Data Warehouse and Data Mining

Nine Essential Excel 2010 Skills

More on MS Access queries

Sitecore Experience Platform 8.0 Rev: September 13, Sitecore Experience Platform 8.0

The Microsoft Excel Course is divided into 4 levels

Excel Wizardry. Presented By: Kevin Lorentzen. City of Bellevue

8 Organizing and Displaying

Shawn Dorward, MVP. Getting Started with Power Query

INTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

COMM 391 Winter 2014 Term 1. Tutorial 1: Microsoft Excel - Creating Pivot Table

Key concepts through Excel Basic videos 01 to 25

DATA WAREHOUSING AND MINING UNIT-V TWO MARK QUESTIONS WITH ANSWERS

Basic Concepts Weka Workbench and its terminology

Lesson 3: Building a Market Basket Scenario (Intermediate Data Mining Tutorial)

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

TRACKER DASHBOARD USER GUIDE. Version 1.0

1 Topic. Image classification using Knime.

May 2015 Version 1.0 Sample Assessment for Functional Skills English Reading Level 2 Mobile Phones Source Documents

Prototyping Data Intensive Apps: TrendingTopics.org

Research a technology domain with Smart Search

Data warehouses Decision support The multidimensional model OLAP queries

Centering and Interactions: The Training Data

Heuristic Evaluation of [Slaptitude]

Relevance Feedback and Query Reformulation. Lecture 10 CS 510 Information Retrieval on the Internet Thanks to Susan Price. Outline

Week 1 Unit 1: Introduction to Data Science

DO EVEN MORE WITH TABLEAU. At BlueGranite, our unique approach and extensive expertise helps you get the most from your Tableau products.

Applied Regression Modeling: A Business Approach

emetrics Study Llew Mason, Zijian Zheng, Ron Kohavi, Brian Frasca Blue Martini Software {lmason, zijian, ronnyk,

SPSS TRAINING SPSS VIEWS

I Shopping on mobile / RU

Customize. Building a Customer Portal Using Business Portal. Microsoft Dynamics GP. White Paper

How does online survey mode affect answers to a customer feedback loyalty survey? PAPOR 2013 Aarti Gupta & Jason Lee

Data Analysis and Data Science

1. Two types of sheets used in a workbook- chart sheets and worksheets

SAS (Statistical Analysis Software/System)

REDCap Overview. REDCap questions to Last updated 12/12/2018 for REDCap v6.13.1

Case Study Don t Guess Where Your Performance Problem Is

Songtao Hei. Thesis under the direction of Professor Christopher Jules White

VERSION EIGHT PRODUCT PROFILE. Be a better auditor. You have the knowledge. We have the tools.

All Excel Topics Page 1 of 11

Exploratory Data Analysis with R. Matthew Renze Iowa Code Camp Fall 2013

ECDL Advanced Module 4 Spreadsheets (AM4) Syllabus Version 1.5 (UK Only)

DOIS MSP BASICS. June 1, 2006

Correctly Compute Complex Samples Statistics

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395

I Travel on mobile / FR

Excel. Dashboard Creation. Microsoft # KIRSCHNER ROAD KELOWNA, BC V1Y4N TOLL FREE:

Preparation Centre Performance Reports

Objective: Class Activities

CERTIFICATE IN BIG DATA TECHNIQUES ON SMALL DATA FORMAT UTILIZING MICROSOFT EXCEL

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data

September 2016 Version 1.2 Sample Assessment for Functional Skills English Reading Level 2 Mobile Phones Source Documents

SELECT Product.name, Purchase.store FROM Product JOIN Purchase ON Product.name = Purchase.prodName

Data Sheet - Site and User Analytics for SharePoint PRODUCT BROCHURE.

CICS VSAM Transparency

COURSE CONTENT Excel with VBA Training

DATA MINING AND WAREHOUSING

GOOGLE S MOST-SEARCHED ONLINE PRODUCTS AND SERVICES JULY Perfect Search Media

Who we are: Database Research - Provenance, Integration, and more hot stuff. Boris Glavic. Department of Computer Science

I Travel on mobile / UK

Excel and Tableau. A Beautiful Partnership. Faye Satta, Senior Technical Writer Eriel Ross, Technical Writer

Excel Shortcuts Increasing YOUR Productivity

COMP 465 Special Topics: Data Mining

Big Data Analytics CSCI 4030

Techno Expert Solutions An institute for specialized studies!

Related reading: Effectively Prioritizing Tests in Development Environment Introduction to Software Engineering Jonathan Aldrich

Andrei Dîcă. Personal Statement. Education

Transcription:

Intern @ Slice Intelligence! Wei1an(Wu( September(8,(2014(

Outline!! Details about the job!! Skills required and learned!! My thoughts regarding the internship!

About the company!! Slice, which we call the smart shopping assistant, is a start-up revolutionizing the way digital commerce measured and specializing in package tracking, online spending history recording, and e-receipts extracting.! Business Insider has called Slice one of their "Top 10 Productivity apps", and Inc.; magazine named Slice to their list of seven "Startups to Watch in 2012.!

About my job!! I was working with a team of smart data analysts to support the sales and marketing by mining Slice s rich data set to reveal new insights about online transactions and shopping behavior. The majority of my task was to perform training data cleaning and data quality check as well as ad hoc tasks.! I knew nothing about SQL and only knew some simple functions in EXCEL (e.g., sum, average functions, pivot table and pivot chart) before my interview. During the interview, I was asked to give self-assessment score of my skills on SQL and EXCEL.! After two months internship at Slice, I learned important skills in composing string, operating functions in MS.

Some major projects!! Used Excel to compare item descriptions from a text file with those from an excel file by using VLOOKUP function and returned the quantity of the items.! Used SQL Server to extract 3700 items which have specific conditions to Excel and looked through these item titles to assign them into existed branches. During the assignment, I learned how to use nested IF function and VLOOKUP function to increase the efficiency and accuracy; I also created pivot tables for other data analysts for further use based on the result. I achieved the goal to reduce the percentage of Other category and cleaned the data.! Used MySQL to help engineering team clean the training data set that has 172,720 rows. I wrote SQL queries to select items with relevant titles and then updated items whose category paths were incorrect with the correct paths. After three weeks of interactive work with the engineering team, I achieved a significant improvement in precision and recall indexes.

Precision and Recall! In pattern recognition and information retrieval with binary classification: Precision Recall also called: positive predictive value sensitivity the fraction of retrieved instances that are relevant the fraction of relevant instances that are retrieved can be seen as: a measure of exactness or quality a measure of completeness or quantity in simple terms: high precision means that an algorithm returned substantially more relevant results than irrelevant high recall means that an algorithm returned most of the relevant results! Both precision and recall are therefore based on an understanding and measure of relevance.

Precision and Recall! Machine learning?! -> Statistical learning Regression vs. Classification Regression Classification Y s (output): numerical categorical a different way: involves estimating or predicting a response identifying group membership! In statistics, if the null hypothesis is that all and only the relevant items are retrieved, absence of type I and type II errors corresponds respectively to maximum precision (no false positive) and maximum recall (no false negative).! For more info, please refer to http://en.wikipedia.org/wiki/precision_and_recall

My thoughts Data quality issues:! There might be duplicates in item titles but with different category paths.! I might not cover all the training data set so that the data could not be totally cleaned.! The Other category has a large proportion in the training data set.!... My gains:! After two months practical training, I learned to online Google things that I don t know, learned the power and conveniences of Excel functions, and learned how to write SQL queries with nested queries.! In addition to the programming skills, I also practiced my communication skills, especially my spoken and listening English.

More about Slice Technologies, Inc.:! LinkedIn: https://www.linkedin.com/company/project-slice! App: download the Slice app to help you save time and save money!

!