Large-Scale Spatial Query Processing on GPU-Accelerated Big Data Systems

Size: px

Start display at page:

Download "Large-Scale Spatial Query Processing on GPU-Accelerated Big Data Systems"

MargaretMargaret Floyd
5 years ago
Views:

Computer Science, CUNY City College (CCNY) 2 Department of

1 Large-Scale Spatial Query Processing on GPU-Accelerated Big Data Systems Jianting Zhang 1,2 Simin You 2 1 Depart of Computer Science, CUNY City College (CCNY) 2 Department of Computer Science, CUNY Graduate Center

2 Outline Introduction Spatial data, GIS, BigData and HPC Taxi trip data in NYC and Global Biodiversity Applications Spatial query processing on GPUs ISP-GPU Architecture and Implementations Experiment Results Alternative Techniques SpatialSpark Lightweight Distributed Execution (LDE) Engine Summary an Future Work

Statistics/Machine learning Image Processing/Computer Vision GIS Remote Sensing Social- Economic

3 Geographical Information System Social Studies Computational Geometry Computer Graphics Spatial Databases: data modeling, indexing, query processing Scientific Data/Information Visualization Statistics/Machine learning Image Processing/Computer Vision GIS Remote Sensing Social- Economic Modeling Environmental Modeling Census/Taxation Urban planning Transportation Air quality Hydrology Ecology

4 Big Geospatial Data Challenges Event Locations, trajectories and O-D data E.g., Taxi trip records (GPS traces or O-D locations) 0.5 million in NYC (medallion taxi cab only) and 1.2 million in Beijing per day From O-D locations to trajectories to frequent patterns Satellite: e.g., from GOES to GOES-R (2015/2016) [$11B] Spectral (3X)*spatial (4X)* temporal (5X)=60X 2km*2km*5min*16bands (360*60)*(180*60)*(12*24)*16~ 1+ trillion pixels per day Derived thematic data products (vector) Species distributions E.g million occurrence records (GBIF) E.g. 717,057 polygons and 78,929,697 vertices for 4148 birds distribution data (NatureServe)

5 Cloud computing+mapreduce+hadoop GPU SIMD CPU Host (CMP) GDRAM Core... Core GDRAM PCI-E Core Local Cache PCI-E Core Ring Bus Core C Core... Core... Core Thread Block B A Shared Cache HDD DRAM SSD MIC T 0 T 1 T 2 T 3 4-Threads In-Order Local Cache 16 Intel Sandy Bridge CPU cores+ 128GB RAM + 8TB disk + GTX TITAN + Xeon Phi 3120A ~ $9,994

ASCI Red: 1997 First 1 Teraflops (sustained) system with 9298

1 billion transistors (551mm²) 2,688 processors 4.

4 GB/s PCI-E peripheral device 250 W (17.

6 ASCI Red: 1997 First 1 Teraflops (sustained) system with 9298 Intel Pentium II Xeon processors (in 72 Cabinets) Feb billion transistors (551mm²) 2,688 processors 4.5 TFLOPS SP and 1.3 TFLOPS DP Max bandwidth GB/s PCI-E peripheral device 250 W (17.98 GFLOPS/W -SP) Suggested retail price: $999 What can we do today using a device that is more powerful than ASCI Red 16 years ago?

7 Affiliated Institutions Students: Simin You (Ph.D ), Siyu Liao (Ph.D ), Costin Vicoveanu (Undergraduate, 2014-) Bharat Rosanlall (Undergraduate, 2014), Jay Yao (MS-thesis, ), Chandrashekar Singh (MS 2013), Agniva Banerjee (MS, 2012), Roger King (MS, 2012), Wahyu Nugroho (MS, 2011), Xiao Quan Cen Feng (MS 2011), Chetram Dasrat (Undergraduate, 2008) Collaborating Institutions Geospatial Technologies and Environmental CyberInfrastructure (GeoTECI) Lab Dr. Jianting Zhang Department of Computer Science The City College of New York

8 $449,845/4yr (08/01/ /31/2017) HIGHEST-DB HIgh-performance GrapHics units based Engine for Spatial-Temporal data Spatial and Spatiotemporal indexing, query processing and optimization Trajectory data management on GPUs Segmentation/simplification/compression/Aggregation/Warehousing Map matching with road networks Data mining (moving cluster, convoy, swarm...) when yellow cabs, green cabs and MTA buses meet with multicore CPUs, GPUs and MICs in NYC

9 when GOES-R satellites, extratropical cyclones and hummingbirds meet with TITAN V T Temporal Trends High-resolution Satellite Imagery T Data Assimilation In-situ Observation Sensor Data T Zonal Statistics Ecological, environmental and administrative zones T ROIs T Global and Regional Climate Model Outputs C B High-End Computing Facility A Thread Block

...building a highly-configurable experimental computing environment for innovative BigData

Microway DIY Web Server/ Linux App Server Dell T5400 Windows App Server HP 8740w HP 8740w Lenovo

Titan Intel Xeon Phi 3120A 8 TB storage Dual-core 8GB memory Nvidia GTX Titan 3 TB storage Dual

5 TB storage Quadcore 8 GB memory Nvidia Quadro 5000m Wimmy GPU cluster Dell T7500 Dell T7500 Dell

10 ...building a highly-configurable experimental computing environment for innovative BigData technologies CCNY Computer Science LAN GeoTECI@CCNY CUNY HPCC KVM SGI Octane III Brawny GPU cluster Microway DIY Web Server/ Linux App Server Dell T5400 Windows App Server HP 8740w HP 8740w Lenovo T400s Dual Quadcore 48GB memory *2 Nvidia C2050*2 8 TB storage Dual 8-core 128GB memory Nvidia GTX Titan Intel Xeon Phi 3120A 8 TB storage Dual-core 8GB memory Nvidia GTX Titan 3 TB storage Dual Quadcore 16GB memory Nvidia Quadro TB storage Quadcore 8 GB memory Nvidia Quadro 5000m Wimmy GPU cluster Dell T7500 Dell T7500 Dell T5400 DIY Dual 6-core 24 GB memory Nvidia Quadro 6000 Dual 6-core 24 GB memory Nvidia GTX 480 Dual Quadcore 16GB memory Nvidia FX3700*2 Quadcore (Haswell) 16 GB memory AMD/ATI 7970

million passengers) in 2009 1/5 of that of

11 Taxi trip data in NYC Taxicabs 13,000 Medallion taxi cabs License priced at > $1M Car services and taxi services are separate Taxi trip records ~170 million trips (300 million passengers) in /5 of that of subway riders and 1/3 of that of bus riders in NYC 11

12 Taxi trip data in NYC Over all distributions of trip distance, time, speed and fare (2009) Count-Distance Distribution Count-Time Distribution Count <= 0.0 ( 0.8, 1.0] ( 1.8, 2.0] ( 2.8, 3.0] ( 3.8, 4.0] ( 4.8, 5.0] ( 5.8, 6.0] ( 6.8, 7.0] ( 7.8, 8.0] ( 8.8, 9.0] ( 9.8, 10.0] ( 10.8, 11.0] ( 11.8, 12.0] ( 12.8, 13.0] ( 13.8, 14.0] ( 14.8, 15.0] ( 15.8, 16.0] ( 16.8, 17.0] ( 17.8, 18.0] ( 18.8, 19.0] ( 19.8, 20.0] Count <= 0.0 ( 2.0, 3.0] ( 5.0, 6.0] ( 8.0, 9.0] ( 11.0, 12.0] ( 14.0, 15.0] ( 17.0, 18.0] ( 20.0, 21.0] ( 23.0, 24.0] ( 26.0, 27.0] ( 29.0, 30.0] ( 32.0, 33.0] ( 35.0, 36.0] ( 38.0, 39.0] ( 41.0, 42.0] ( 44.0, 45.0] ( 47.0, 48.0] > 50.0 Trip Distance (mile) TripTime (Minute) Count-Speed Distribution Count-Fare Distribution Count <= 0.0 ( 1.0, 2.0] ( 3.0, 4.0] ( 5.0, 6.0] ( 7.0, 8.0] ( 9.0, 10.0] ( 11.0, 12.0] ( 13.0, 14.0] ( 15.0, 16.0] ( 17.0, 18.0] ( 19.0, 20.0] ( 21.0, 22.0] ( 23.0, 24.0] ( 25.0, 26.0] ( 27.0, 28.0] ( 29.0, 30.0] ( 31.0, 32.0] ( 33.0, 34.0] ( 35.0, 36.0] ( 37.0, 38.0] ( 39.0, 40.0] ( 41.0, 42.0] ( 43.0, 44.0] ( 45.0, 46.0] ( 47.0, 48.0] ( 49.0, 50.0] Count <= 0.0 ( 1.0, 2.0] ( 3.0, 4.0] ( 5.0, 6.0] ( 7.0, 8.0] ( 9.0, 10.0] ( 11.0, 12.0] ( 13.0, 14.0] ( 15.0, 16.0] ( 17.0, 18.0] ( 19.0, 20.0] ( 21.0, 22.0] ( 23.0, 24.0] ( 25.0, 26.0] ( 27.0, 28.0] ( 29.0, 30.0] ( 31.0, 32.0] ( 33.0, 34.0] ( 35.0, 36.0] ( 37.0, 38.0] ( 39.0, 40.0] ( 41.0, 42.0] ( 43.0, 44.0] ( 45.0, 46.0] ( 47.0, 48.0] ( 49.0, 50.0] Speed (MPH) Fare ($)

13 Taxi trip data in NYC How to manage taxi trip data? Geographical Information System (GIS) Spatial Databases (SDB) Moving Object Databases (MOD) How good are they? Pretty good for small amount of data But, rather poor for large-scale data

14 Example 1: Taxi trip data in NYC Loading 170 million taxi pickup locations into PostgreSQL UPDATE t SET PUGeo = ST_SetSRID(ST_Point("PULong","PuLat"),4326); hours! Example 2: Finding the nearest tax blocks for 170 million taxi pickup locations using open source libspatiaindex+gdal 30.5 hours! Intel Xeon 2.26 GHz processors with 48G memory I do not have time to wait... Can we do better?

15 Global Biodiversity Data at GBIF SELECT aoi_id, sp_id, sum (ST_area (inter_geom)) FROM ( SELECT aoi_id, sp_id, ST_Intersection (sp_geom, qw_geom) AS inter_geom FROM SP_TB, QW_TB WHERE ST_Intersects (sp_geometry, qw_geom) ) GROUP BY aoi_id, sp_id HAVING sum(st_area(inter_geom)) >T; 15

16 Spatial Data Processing on GPUs

Spatial query processing on GPUs Single-Level Grid-File based Spatial Filtering Nested-Loop based Refinement Points Vertices (polygon/ polyline) Perfect coalesced memory accesses Utilizing GPU

17 Spatial query processing on GPUs Single-Level Grid-File based Spatial Filtering Nested-Loop based Refinement Points Vertices (polygon/ polyline) Perfect coalesced memory accesses Utilizing GPU floating point computing power J. Zhang, S. You and L. Gruenwald, "Parallel Online Spatial and Temporal Aggregations on Multi-core CPUs and Many-Core GPUs," Information Systems, vol. 44, p , 2014.

resolution=4 feet Spatial Aggregation 9,424 /326=30X

18 Spatial query processing on GPUs Top: grid size =256*256 resolution=128 feet Right: grid size =8192*8192 resolution=4 feet Spatial Aggregation 9,424 /326=30X (8192*8192) Temporal Aggregation 1709/198=8.6X (minute) 1598 /165 = 9.7X (hour)

Spatial query processing on GPUs P2N-D 147,011 street

P2P-D 735,488 tax blocks (4,698,986 points) CPU time GPU

2 s 33.1 s - 4,900X 3,200X Algorithmic improvement: 3.

19 Spatial query processing on GPUs P2N-D 147,011 street segments P2P-T 38,794 census blocks (470,941 points) P2P-D 735,488 tax blocks (4,698,986 points) CPU time GPU Time Speedup P2N-D P2P-T P2P-D h 30.5 h 10.9 s 11.2 s 33.1 s - 4,900X 3,200X Algorithmic improvement: 3.7X Using main-memory data structures: 37.4X GPU Acceleration: 24.3X

20 Outline Introduction Spatial data, GIS, BigData and HPC Taxi trip data in NYC and Global Biodiversity Applications Spatial query processing on GPUs ISP-GPU Architecture and Implementations Experiment Results Alternative Techniques SpatialSpark Lightweight Distributed Execution (LDE) Engine Summary an Future Work

21 ISP-GPU: Scaling out Geospatial Data Processing to GPU Clusters

execution plans C/C++ backend with SSE4 support (for strings

and nonpartitioned) LLVM-based JIT. Extension is challenging!

22 ISP-GPU: Scaling out Geospatial Data Processing to GPU Clusters Attractive Features SQL Frontend: translate SQL queries into execution plans C/C++ backend with SSE4 support (for strings operations) Efficient implementations of hash-joins (partitioned and nonpartitioned) LLVM-based JIT. Extension is challenging!

ISP-GPU: Scaling out Geospatial Data Processing to GPU Clusters class SpatialJoinNode : public BlockingJoinNode { public: SpatialJoinNode(ObjectPool* pool, const TPlanNode& tnode, const

23 ISP-GPU: Scaling out Geospatial Data Processing to GPU Clusters class SpatialJoinNode : public BlockingJoinNode { public: SpatialJoinNode(ObjectPool* pool, const TPlanNode& tnode, const DescriptorTbl& descs); virtual Status Prepare(RuntimeState* state); virtual Status GetNext(RuntimeState* state, RowBatch* row_batch, bool* eos); virtual void Close(RuntimeState* state); protected: virtual Status InitGetNext(TupleRow* first_left_row); virtual Status ConstructBuildSide(RuntimeState* state); private: boost::scoped_ptr<tplannode> thrift_plan_node_; RuntimeState* runtime_state_; } create_rtree( ) pip_join( ) nearest_join( )

24 ISP-GPU: Scaling out Geospatial Data Processing to GPU Clusters Scalable and Efficient Spatial Data Management on Multi-Core CPU and GPU Clusters. IEEE HardBD 15 Workshop

25 ISP-GPU: Scaling out Geospatial Data Processing to GPU Clusters Single-node results: 16core CPU/128GB, GTX Titan ISP-GPU ISP-MC+ GPU-Standalone MC-Standalone taxi-nycb (s) GBIF-WWF(s) Taxi-nycb: ~170 million points, ~40 thousand polygons (9 vertices/polygon) GBF-WWF: ~375 million points, ~15 thousand polygons (279 vertices/polygon) Cluster results: 2-10 nodes each with 8 vcpu cores/15gb, 1536 CUDA cores/4 GB (50 million species locations used due to memory constraint)

26 Outline Introduction Spatial data, GIS, BigData and HPC Taxi trip data in NYC and Global Biodiversity Applications Spatial query processing on GPUs ISP-GPU Architecture and Implementations Experiment Results Alternative Techniques SpatialSpark Lightweight Distributed Execution (LDE) Engine Summary an Future Work

Alternative Techniques SpatialSpark: Just Open-Sourced http://simin.

split(separator)).zipwithindex() val leftgeometrybyid = leftdata.map(x => (x._2, Try(new WKTReader().read(x._1.apply(leftGeometryIndex))))).filter(_._2.isSuccess).map(x => (x._1, x._2.get)) //similarly for right-side data.

27 Alternative Techniques SpatialSpark: Just Open-Sourced val sc = new SparkContext(conf) //reading left side data from HDFS and perform pre-processing val leftdata = sc.textfile(leftfile, numpartitions).map(x => x.split(separator)).zipwithindex() val leftgeometrybyid = leftdata.map(x => (x._2, Try(new WKTReader().read(x._1.apply(leftGeometryIndex))))).filter(_._2.isSuccess).map(x => (x._1, x._2.get)) //similarly for right-side data. //ready for spatial query (broadcast-based) val joinpredicate =SpatialOperator.Within // NearestD can be applied similarly var matchedpairs:rdd[(long, Long)] = BroadcastSpatialJoin(sc, leftgeometrybyid, rightgeometrybyid, joinpredicate) Large-Scale Spatial Join Query Processing in Cloud (Comparison with ISP-MC) IEEE CloudDM 15 Workshop

28 Alternative Techniques Lightweight Distributed Execution Engine for Large-Scale Spatial Join Query Processing

29 Spatial Data Processing and IoT Cell-phone based sensing and querying 3D world (personal navigation) Crowd-sourcing 3D urban infrastructure/traffic monitoring using RGB-D videos Building Information System and energy control Emergency response and disaster relief

have revealed both advantages and disadvantages of extending a tightly-coupled big data system to support spatial data types and

30 Summary and Future Work Designs and implementations of an in-memory spatial data management system on multi-core CPU and many-core GPU clusters by extending Cloudera Impala for distributed spatial join query processing Experiments on the initial implementations have revealed both advantages and disadvantages of extending a tightly-coupled big data system to support spatial data types and their operations. Alternative techniques are being developed to further improve efficiency, scalability, extensibility and portability.

31 Q&A

Geospatial Technologies and Environmental CyberInfrastructure (GeoTECI) Lab Dr. Jianting Zhang

Geospatial Technologies and Environmental CyberInfrastructure (GeoTECI) Lab Dr. Jianting Zhang Affiliated Institutions Students: Simin You (Ph.D. 2009 -), Siyu Liao (Ph.D. 2014-), Costin Vicoveanu (Undergraduate, 2014-) Bharat Rosanlall (Undergraduate, 2014), Jay Yao (MS-thesis, 2011-2012), Chandrashekar