Aadhaar. Scalability & Data Management Challenges. Dr. Pramod K. Varma. Chief Architect, UIDAI

Similar documents
May MapR Technologies 2014 MapR Technologies 1

The Aadhaar Project NANDAN NILEKANI CHAIRMAN UNIQUE IDENTIFICTION AUTHORITY OF INDIA

AADHAAR AUTHENTICATION API SPECIFICATION VERSION 1.0

Aadhaar enabled Public Distribution System in Delhi

Aadhaar enabled Public Distribution System in Haryana

UIDAI, RO Guwahati. Application & Authentication Division Unique Identification Authority of India New Delhi

Aadhaar Authentication Failure in the Public Distribution System of Andhra Pradesh

Extremely-Large-Scale Biometric Authentication System - Its Practical Implementation

Demographic Update through Update Client Lite (UCL)

V.V. COLLEGE OF ENGINEERING

THE DATA PLATFORM FOR BETTER BIOMETRICS

USER MANUAL/INSTALLATION GUIDE

Aaple Sarkar DBT Portal

Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics

Central Depository Services (India) Limited

Axway Validation Authority Suite

Role of Biometric Technology in Aadhaar Authentication

National IAS Academy Current Affairs: Contact: PRELIMS. Impeachment of a SC judge. Ice sheets found in Mars

6 facts about GenKey s ABIS

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,

From data center OS to Cloud architectures The future is Open Syed M Shaaf

Distributed Systems 16. Distributed File Systems II

Challenges in Aadhaar Implementation : Opportunities for products and innovations Proposal Presentation

Face recognition for enhanced security.

API 2.0 Error Handling document

DigitalPersona Altus. Solution Guide

Mobile ID, the Size Compromise

BEYOND AUTHENTICATION IDENTITY AND ACCESS MANAGEMENT FOR THE MODERN ENTERPRISE

QuickSpecs HP Archiving software for Microsoft Exchange 2.2

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed?

Caching patterns and extending mobile applications with elastic caching (With Demonstration)

EMC ISILON HARDWARE PLATFORM

AtoS IT Solutions and Services. Microsoft Solutions Summit 2012

Best Practices in Designing Cloud Storage based Archival solution Sreenidhi Iyangar & Jim Rice EMC Corporation

Introduction. Deployment Models. IBM Watson on the IBM Cloud Security Overview

Big Data and Cloud Computing

<Insert Picture Here> Enterprise Data Management using Grid Technology

esign - Evolving Opportunities and Applications C E N T R E F O R D E V ELOPMENT O F A D VANCED C O MPUTING N O V E M B E R 1 5,

Big Data on AWS. Peter-Mark Verwoerd Solutions Architect

AEBAS HELP FILE. 2. Kathua Satender Singh Udhampur Sanjeev

Making the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack. Chief Architect RainStor

Cloud Analytics and Business Intelligence on AWS

Copyright 2012 EMC Corporation. All rights reserved.

Public Key Infrastructure & esign in India November 2015

Aadhar card photo id software. Aadhar card photo id software.zip

Applying biometric authentication to physical access control systems

e-sign and TimeStamping

Introduction to Hadoop. Owen O Malley Yahoo!, Grid Team

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services

Watson Developer Cloud Security Overview

Mobile Financial Services. Ms. Vinod Kotwal Advisor (F&EA), TRAI

Caching Use Cases in the Enterprise

Technical Brief SUPPORTPOINT TECHNICAL BRIEF MARCH

Tender Schedule No. Figure: Active-Active Cluster with RAC

Table 1 The Elastic Stack use cases Use case Industry or vertical market Operational log analytics: Gain real-time operational insight, reduce Mean Ti

Backup & Recovery on AWS

TECHED USER CONFERENCE MAY 3-4, 2016

Smart Cards and Authentication. Jose Diaz Director, Technical and Strategic Business Development Thales Information Systems Security

Universal Storage Consistency of DASD and Virtual Tape

GST Registration Guide

Citizen Biometric Authentication based on e-document verification. e-government perspective. Mindshare Ruslans Arzaniks Head of Development

How Next Generation Trusted Identities Can Help Transform Your Business

Lecture 41 Blockchain in Government III (Digital Identity)

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

arxiv: v2 [cs.dc] 12 Mar 2016

MERGING NPR and UID???

Details for obtaining Class 3 Digital Signature Certificate Sign + Encrypt Organization (Foreign National) as per Indian IT Act.

FlexPod. The Journey to the Cloud. Technical Presentation. Presented Jointly by NetApp and Cisco

FAQs. Steps to do different types of transactions on M-Pesa. By Web Portal ( Types of Transaction Using*400# By Mobile App

Developing Enterprise Cloud Solutions with Azure

Sourcefire Solutions Overview Security for the Real World. SEE everything in your environment. LEARN by applying security intelligence to data

Storage for HPC, HPDA and Machine Learning (ML)

Big Data Issues for Federal Records Managers

A National e-authentication Service

Technology Solutions for Financial Inclusion-Indian Experiences

Apache Hadoop Goes Realtime at Facebook. Himanshu Sharma

Busting the top 5 myths of cloud-based authentication

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

USE CASES. See how Polygon s Biometrid can be used in different usage settings

Streaming Integration and Intelligence For Automating Time Sensitive Events

Survey Guide: Businesses Should Begin Preparing for the Death of the Password

EMC Documentum xdb. High-performance native XML database optimized for storing and querying large volumes of XML content

EMC s IT TRANSFORMATION

An Introduction to GPFS

AKAMAI CLOUD SECURITY SOLUTIONS

DigiDocs; A Proposed Architecture for e-government

Quality of Service Management

CIAM: Need for Identity Governance & Assurance. Yash Prakash VP of Products

IBM Active Cloud Engine centralized data protection

ArcGIS Server Architecture Considerations. Andrew Sakowicz

Put Identity at the Heart of Security

Mobile Money. Lecture 9: CSE 490c. 10/15/2018 University of Washington, Autumn

The Transformative Power of Mobile to Accelerate Digital Identity

Kinetic Open Storage Platform: Enabling Break-through Economics in Scale-out Object Storage PRESENTATION TITLE GOES HERE Ali Fenn & James Hughes

Datacenter replication solution with quasardb

Identity Proofing Standards and Beyond

Oracle Banking Digital Experience

Linux Automation.

Low Latency Data Grids in Finance

Scaling Without Sharding. Baron Schwartz Percona Inc Surge 2010

Transcription:

Aadhaar Scalability & Data Management Challenges Dr. Pramod K. Varma Chief Architect, UIDAI twitter.com/pramodkvarma pramodkvarma.com

Understanding Aadhaar System 2

Establishing ID is a Challenge Issue rations A resident typically accesses multiple service providers, at different times Mobile Phone NREGA Passport jobs Bank A/C Needs to repeatedly re-establish ID = problem for the poor Birth records Address proof Money to beat the system = No or limited access to entitlements and opportunities 3

Why Aadhaar? Difficulty in establishing ID exclusion Weak authentication inefficient delivery Entitlements Ghost Entries Food, fuel, fertilizer subsidy = ~Rs. 1 lac crore Social Security Net Financial 60% unbanked (~700mn) Duplication Multiple layers 45% BPL do not have a ration card biometric-based unique identity has the potential to address both these dimensions simultaneously. - Thirteenth Finance Commission 4

Enroll Once Aadhaar Number -Unique, lifetime, biometric based identity Demographic Data Compulsory data: Name, Age/Date of Birth, Gender and Address of the resident. Conditional data: Parents/Guardian details Optional data: Phone no., email address Biometric Data Resident s Photograph Resident s Finger Prints Resident s Iris 5

authenticate many times Online service to verify the claim are you who you claim to be? 1:1 check only a yes/no answer Authenticate online Anytime, anywhere, multi-factor Always responds with yes or no Open identity platform Can be used in any service, any domain using any protocol, any device, any network 6

Enrolment Application Modules Geographically Distributed Client (mostly offline) Enrolment Server with Multi modal, Multi-vendor ABIS Authentication Geographically Distributed Servers Geographically Distributed Devices (several millions) Multi-factor support Supporting Systems Business Intelligence Fraud Detection 7

Enrolment Process 3 Enrolment Processing Biometric De-duplication CIDR UID Assignment Enrolment Service Letter Delivery & Verification Registrar 4 Aadhaar Number And rejection data Automatic Synch (software/data) Logistics Partner (India Post) 2 (Option B) Enrolment Data to CIDR 1 Data Capture 2 (Option A) Enrolment Data to CIDR 5 Aadhaar letter or Rejection letter Enrolment Agency Information/ Issue resolution Customer Contact Center 8

Authentication Process 9

Enrolment Server Manages complete Aadhaar enrolment and lifecycle process Features Data validation Operator, supervisor verification Biometric de-duplication (1:N matching) Manual inspection Aadhaar number allocation / rejection Letter generation and delivery tracking Registrar integration 10

Biometric De-duplication Multi-modal matching 1:N matching (Every resident is matched using his/her biometrics against every entry in the ABIS system) Multi-vendor interface through ABIS API Dynamic allocation to ABIS vendor based on their accuracy and performance Multi-DC architecture adds complexity Exception handling Mostly automated and manual Volumes require highly automated and learning systems to handle exceptions in an effective manner 11

Authentication Supports answering the question is a resident the person he/she claims to be Verifies resident information (demographics, biometrics) for a given Aadhaar number against the stored data Online service that is lightweight, ubiquitous, and secure Only responds with a yes/no and no personal identity information is returned as part of the response Supports multi-factor authentication using biometrics, PIN, OTP and combinations thereof Supports multiple protocols and devices Personal computer, mobile, PoS terminals, etc. Many protocols (USSD, SMS, HTTPS) over data and mobile connections Works with assisted and self-service applications 12

Scalability and Data Management Challenges 13

Architecture Highlights Support large scaling of enrolments and authentications No vendor lock-in across the system Use of open-source technologies wherever available and prudent Use of open standards to ensure interoperability Ensure wide device driver support for biometric devices through standardization Use of widely adopted technology platforms and tools Make all performance metrics (no PII) public through business intelligence portal for transparency Build strong end-to-end security upfront 14

Enrolment Server Architecture Throughput is the key Fully distributed compute platform Data shardedacross multiple RDBMS instances and DFS Highly asynchronous using a high speed messaging layer SEDA (Staged Even Driven Architecture) allows smarter failure handling Multi-DC architecture for near-zero RTO and zero RPO (adds complexity in biometric d-deuplication) 15

Enrolment Volume 600 to 800 million UIDs in 4 years 1 to 4 million enrolments a day When we cover half the country, we will end up doing 4 m * 12 * 500 m * 12 biometric matches a day!!! Data updates and new enrolments will continue for ever Enrolment data moves from very hot to cold needing multi-layered storage architecture 16

Enrolment Data Management Enrolment require handling of large binary data for all residents ~5 MB per resident biometrics ~3 MB for supporting docs Maps to about 8 PB of raw data! With replication, it means managing about 25 PB of source data Replication and backup across DCs of 4+ TB of incremental data every day for near-zero RTO Additional workflow/process/event data 15+ million events on an average moving through async channels Needing complete update and insert guarantees across data stores Lifetime updates adds several more petabytes 17

Authentication Server Authentication poses response time issue Match demographics (partial, fuzzy, Indian language matching) Match biometrics (balancing FPIR) Needs to scale to handle 100 s of million requests every day with sub-sec response Edge cached, in-memory operation Async data updates to the cache Stateless service Audits maintained asynchronously on HDFS 18

Authentication Volume Few 100 million authentications per day mostly during 10 hr period High variance on peak and average Requires async request handling on HTTP server Sub second response with support for OTP, guaranteed audits Multi-DC architecture Fully load balanced Mostly reads with some updates (OTP, Audit) All changes needs to be propagated from enrolment data stores to all authentication sites PIN updates, OTP requests, and less occasional demographic data updates 19

Authentication Data Management Minutiae based authentication request is about 1 K Image based ones are about 10 K on an average 100 million authentications / day means 1 billion audit records in 10 days 1 TB encrypted audit logs in 10 days Need to keep recent audits online accessible any time and older ones in achieve until deleted Audit write must be guaranteed 20

Analytics/Mining Architecture Analyzing terabytes of data generated out of billion+ events every day Constantly aggregating data across billions of records on a distributed compute grid to analyze and create patterns for operational and strategic decision making Fraud detection Detecting fraud during enrolment Detecting identity fraud scenarios near real-time during authentication Building mining, clustering, learning tools to work on top of billions of events 21

Technology Stack Java application deployed on Linux stack with virtualization Multiple MySQL instances as RDBMS Apache Hadoop(HDFS, Hive, HBase, Pig) stack for large scale compute and distributed storage RabbitMQ(AMQP standard) as messaging framework Drools for rules engine Several other open source libraries All 3 rd party interfaces abstracted through standard API layer (VDM, ABIS, Language Support, etc) 22

Final Thoughts Largest biometric identity system is about 120 million. Scaling needs are unprecedented. Completely built on open standards and open source platforms Scalability, Security, interoperability, and vendor neutrality a must Next generation e-governance applications require cloud based, large data-driven, open platforms Research community support required 23

Thank You! Dr. Pramod K. Varma Chief Architect, UIDAI twitter.com/pramodkvarma pramodkvarma.com 24