TECNOLOGIES FOR INFORMATION SYSTEMS INTRODUCTION Prof. Fabio A. Schreiber http://home.dei.polimi.it home.dei.polimi.it/schreibe/index.htmlindex.html Prof. Letizia Tanca http://tanca.dei.polimi.it tanca.dei.polimi.it Dipartimento di Elettronica e Informazione Politecnico di Milano COURSE PROGRAMME (1) THE LECTURES WILL DEAL WITH THE FOLLOWING TOPICS, NOT NECESSARILY IN THIS ORDER Information Systems Architectures Heterogeneous Systems Integration model heterogeneity semantic heterogeneity Data Analysis Tools Data Warehouse Architectures and Design Data Mining and its Applications Non-structured Data Management Information Retrieval Systems text analysis F. A. Schreiber - L. Tanca 1 INTRODUCTION 1
COURSE PROGRAMME (2) Semi-structured Data Management meta-models wrappers and mediators search engines Embedded IS Introduction to Embedded Databases Main Memory Databases Real Time Databases Introduction to Wireless Sensor Networks Time Representation and Management in IS Time Ontology Temporal Databases CLASS HOURS: 50 lessons; 20 Exercises ESTIMATE PROJECT HOURS: 90 F. A. Schreiber - L. Tanca 2 INTRODUCTION SUPPORT MATERIALS COURSE WEB SITE: http://home.dei.polimi.it/tesi/index.html AVAILABLE SUPPORT MATERIALS COURSE PROGRAM DETAILED LECTURE CALENDAR BIBLIOGRAPHICAL REFERENCES LECTURE SLIDES PROJECT SEMINAR SLIDES SOME EXERCISES Consulting this material is a necessary, but not a sufficient condition, for passing the final examination. Individual study of the suggested bibliography is a must F. A. Schreiber - L. Tanca 3 INTRODUCTION 2
EXAMINATIONS NON TLC STUDENTS A SHORT WRITTEN EXERCISE (about 1.5 hrs) A COLLOQUY A MARK WILL BE GIVEN AFTER THE FIRST TWO STEPS A PROJECT PROJECT SEMINARS GUIDED PROJECT SESSIONS PROJECT REPORT AND PRESENTATION A MARK WILL BE GIVEN FOR THE PROJECT THE FINAL MARK IS THE AVERAGE OF THE TWO PARTIAL EVALUATIONS TLC STUDENTS: A COLLOQUY F. A. Schreiber - L. Tanca 4 INTRODUCTION EXAMINATIONS NON TLC STUDENTS SHORT SHORT WRITTEN WRITTEN EXERCISE EXERCISE (about (about 1.5 1.5 hrs) hrs) A E MARK TLC STUDENTS COLLOQUY COLLOQUY INS 30L MARK AVG INS 30L MARK TLC STUDENT NO YES PROJECT PROJECT (WITHIN (WITHIN 1 1 ACADEMIC ACADEMIC YEAR) YEAR) REGISTER REGISTER REGISTER REGISTER F. A. Schreiber - L. Tanca 5 INTRODUCTION 3
FOLLOW UP MORE TOPICS IN THE AREA OF WIRELESS SENSORS NETWORKS AND OF PERVASIVE MOBILE CONTEXT-AWARE INFORMATION SYSTEMS ARE DEALT WITH IN ADVANCED INFORMATION SYSTEMS B: PERVASIVE INFORMATION SYSTEMS F. A. Schreiber - L. Tanca 6 INTRODUCTION INFORMATION SYSTEMS BACKGROUND ORGANIZATION STRUCTURE AND ORGANISATION ASPECTS FUNCTIONAL AND PLANNING ASPECTS PROCEDURES DATA LEGAL AND PROFESSIONAL ASPECTS HUMAN RELATIONS INFORMATION SYSTEM COMPUTER SYSTEM COMPUTER BASED INFORMATION SYSTEM COMPUTER SYSTEMS ARCHITECTURES OPERATING SYSTEMS DATABASE MANAGEMENT SYSTEMS SOFTWARE ENGINEERING INFORMATIC TECHNOLOGIES AND METHODOLOGIES F. A. Schreiber - L. Tanca 7 INTRODUCTION 4
COMPLEX INFORMATION SYSTEMS F. A. Schreiber - L. Tanca 8 INTRODUCTION (PRE)HISTORICAL DEVELOPMENT OF INFORMATION SUPPORTING TECHNOLOGIES Nos Agilulfus 1 in nostra potestate statuemus ut tibi agrum fertilissi. detur pro 40 tall. CLAY (EBLA XXIV SEC. B.C.) PAPYRUS (EGYPT XII SEC. B.C.) PARCHMENT (EUROPE XII SEC. A.D.) PAPER FILES (XVII SEC.) PUNCHED CARDS (FORTIES) MAGNETICAL SUPPORTS (SIXTIES) F. A. Schreiber - L. Tanca 9 INTRODUCTION 5
INFORMATION SYSTEMS AND DATA TECHNOLOGY FILE SYSTEM QUERY ANSWERING SYSTEM NATURAL LANGUAGE PROCESSING INFORMATION RETRIEVAL DATA BASE SYSTEM MANAGEMENT INFORMATION SYSTEM STATISTICS, OPERATIONAL RESEARCH DECISION SUPPORT SYSTEM KNOWLEDGE BASED SYSTEMS MATHEMATICAL LOGIC F. A. Schreiber - L. Tanca 10 INTRODUCTION SYSTEM ARCHITECTURE DEVELOPMENT MANY USERS PER MACHINE SINGLE USER MACHINES MANY MACHINES PER USER HIGH INFORMATION SERVICES AND TOOLS PC + NETWORK DISTRIBUTION 1950 1965 1980 1990 2000 WS / SERVER TIME SHARING RJE LOW BATCH LOW PERSONALIZATION HIGH From: Cherniack, Franklin, Zdonik F. A. Schreiber - L. Tanca 11 INTRODUCTION 6
THE ARCHITECTURE OF A MODERN INFORMATION SYSTEM SERVER LOCAL NETWORK WIDE BAND DIGITAL NETWORKS FRONT END NARROW BAND REST INTERNAL NETWORK FIREWALL ANALOGICAL NETWORKS OF THE WORLD WIRELESS NETWORKS LOCAL NETWORK SERVER F. A. Schreiber - L. Tanca 12 INTRODUCTION INFORMATION MANAGEMENT TECHNOLOGIES DATA WAREHOUSE DECISION SUPPORT SYSTEMS DATA MINING INFORMATION SYSTEMS ANALYSIS DATA INTEGRATION REAL-TIME MAIN MEMORY TEMPORAL DATABASES DISTRIBUTED ETHEROGENEOUS DATA MANAGEMENT WEB INFORMATION SYSTEMS NON STRUCTURED SEMISTRUCTURED AND MULTIMEDIAL INFORMATION EMBEDDED SISTEMS MOBILE AND CONTEXT- AWARE COMPONENTS INFORMATION RETRIEVAL SISTEMS F. A. Schreiber - L. Tanca 13 INTRODUCTION 7
THE NEW TECHNOLOGICAL ENVIRONMENT (1) DISTRIBUITED SYSTEMS ON COMPUTER NETWORKS EASE OF ACCES, INTEROPERABILITY MULTIPROCESSOR, PARALLEL SYSTEMS PERFORMANCE SCALABILITY F. A. Schreiber - L. Tanca 14 INTRODUCTION THE NEW TECHNOLOGICAL ENVIRONMENT (2) NEW TECHNOLOGIES FOR DATA MANAGEMENT HOMOGENEOUS DISTRIBUTED DATABASE DATA WAREHOUSE INTERNET CONNECTED SYSTEMS WORLD-WIDE-WEB: THE BEST INFORMATION ACCESS INTERFACE HETEROGENEOUS INTERCONNECTED SYSTEMS MOBILE COMPONENTS F. A. Schreiber - L. Tanca 15 INTRODUCTION 8
THE NEW TECHNOLOGICAL ENVIRONMENT (3) THE MASSIVE INTERNET WIDESPREADING INFLUENCES INFORMATION SYSTEMS AS TO NO MORE INFORMATION SEARCH AND RETRIEVAL IN A SINGLE DATA BANK, BUT IN EVERY NETWORK NODES INTERNET/WWW ARCHITECTURE USED ALSO FOR INTRA/INTER-COMPANY INFORMATION SYSTEMS INTRANET + EXTRANET = INTERNET INTEGRATION NEED AMONG THE WEB TECHNOLOGY AND TRADITIONAL OLTP F. A. Schreiber - L. Tanca 16 INTRODUCTION THE NEW TECHNOLOGICAL ENVIRONMENT (4) THREE TIERS ARCHITECTURE PRESENTATION BROWSER (Netscape Navigator, Internet Explorer, ecc.) CLIENT FUNCTIONAL APPLICATION AND NETWORK MANAGEMENT FUNCTIONS WEB SERVER DATA DBMS BACK END F. A. Schreiber - L. Tanca 17 INTRODUCTION 9
THE NEW TECHNOLOGICAL ENVIRONMENT (5) INTRANET EXTENSION TO THE COMPANY INFORMATION SYSTEM OF THE USER FRIENDLINESS OF THE WEB TECHNOLOGY THEY MUST EFFECTIVELY INTEGRATE DBMS, WFMS TECHNOLOGIES, AND OLTP AND OLAP SYSTEMS THEY MUST PROVIDE ACCESS SECURITY FROM THE EXTERNAL WORLD BY MEANS OF FIREWALL F. A. Schreiber - L. Tanca 18 INTRODUCTION THE NEW TECHNOLOGICAL ENVIRONMENT (6) USE OF MOBILE DEVICES TYPE AND POWER OF THE DEVICE (smart cards, cell phones, PDAs, portable PC,...) OPERATING ENVIRONMENT VARIABILITY (proprietary, intranet/internet, ) ACCURATE AND COHERENT SPATIO/TEMPORAL PERCEPTION OF SERVICE STATE AND QUALITY (QoS) MULTICANALITY F. A. Schreiber - L. Tanca 19 INTRODUCTION 10
THE NEW APPLICATION ENVIRONMENT (1) INTERNET COMMERCIAL APPLICATIONS <1990 1992 1993 1994 1995 1996 1997 1998 1999 >2000 e-mail, FILE TRANSFER MARKETING DECISION SUPPORT GROUPWARE COMMERCIAL AND FINANCIAL TRANSACTIONS WORKFLOW, BUSINESS TO BUSINESS MULTIMEDIA VIRTUAL COMMERCE F. A. Schreiber - L. Tanca 20 INTRODUCTION THE NEW APPLICATION ENVIRONMENT (2) AN EXAMPLE: ELECTRONIC COMMERCE APPLICATION COMPLEXITY SERVICE ORIENTED SITES WEB BASED INFORMATION (hotmail) ELECTRONIC SYSTEMS COMMERCE PRESENTATION SITES CATALOGUE SITES WORKFLOW DESIGN HYPERTEXT DESIGN DATA COMPLEXITY F. A. Schreiber - L. Tanca 21 INTRODUCTION 11
THE NEW APPLICATION ENVIRONMENT (3) A VERY LARGE NUMBER OF LARGE DATA SOURCES GENERALLY HIGHLY VARIABLE AND VOLATILE DATA (ES. WEB) HIGHLY HETEROGENEOUS DATA SOURCES DIFFERENT DATA STRUCTURING LEVELS DATABASES WITH DIFFERENT UDERLYING MODELS (RELATIONAL, OBJECT ORIENTED, LEGACY ) SEMI-STRUCTURED STRUCTURED DATA (XML, HTML, OTHER TAGGING SYSTEMS ) NON-STRUCTURED DATA (TEXT, IMAGE, SOUND, ETC ) DIFFERENT TERMINOLOGIES AND CONTEXTS F. A. Schreiber - L. Tanca 22 INTRODUCTION THE NEW APPLICATION ENVIRONMENT (4) CONTEXT AWARENESS THE SYSTEM CAPABILITY OF IDENTIFYING AN ENVIRONMENTAL SITUATION AND TO ADAPT ITSELF TO IT IN ORDER TO BE APPLICATION EFFECTIVE POSITION TIME INTEREST TOPICS SOCIAL VARIABLES NOISE LEVEL PRIVACY CONSTRAINTS F. A. Schreiber - L. Tanca 23 INTRODUCTION 12
UNIFYING PROBLEMS DISTRIBUTION AND SPATIO-TEMPORAL TEMPORAL CONTEXT OF DATA SOURCES OF INFORMATION SERVERS OF USERS DATA HETEROGENEITY FORMATS SEMANTICS POOR STRUCTURING OF BOTH DATA AND INFORMATION F. A. Schreiber - L. Tanca 24 INTRODUCTION COMMON TECHNIQUES WEB AND DATABASES INTERCONNECTION WORKFLOW AND DATABASES INTERCONNECTION DATA ANALYSIS TECHNIQUES DATA MINING WEB AND DATA BANKS SEARCH F. A. Schreiber - L. Tanca 25 INTRODUCTION 13
DATA INTEGRATION A SOLUTION MERGES DATA COMING FROM DIFFERENT SOURCES GIVES USER A UNIFIED VIEW HE/SHE NEEDS NO MORE DISCOVER DATA SOURCES RELEVANT TO A GIVEN QUEY HE/SHE NEEDS NO MORE INTERACT WITH EACH DATA SOURCE INDIVIDUALLY THE RESULTS OF INDIVIDUAL SUB-QUERIES ARE COMBINED INTO A SINGLE ANSWER F. A. Schreiber - L. Tanca 26 INTRODUCTION HOW TO DO DATA INTEGRATION SCHEMATA INTEGRATION AND DATA FILTERING (OFF-LINE) SPARSE SOURCES SCHEMATA INTEGRATION INTEGRATED SCHEMA DISTRIBUTED QUERY CENTRALIZED QUERY INDIVIDUAL ANSWERS ANSWERS INTEGRATION FINAL ANSWER WRAPPER AND MEDIATORS (ON-LINE) F. A. Schreiber - L. Tanca 27 INTRODUCTION 14
HOW TO DO APPLICATION INTEGRATION MIDDLEWARE ENTERPRISE APPLICATION INTEGRATION (EAI) WEB SERVICES F. A. Schreiber - L. Tanca 28 INTRODUCTION HOW TO DO APPLICATION INTEGRATION MIDDLEWARE PROVIDES THE PROGRAMMER WITH FUNCTIONALITIES WHICH, OTHERWISE, SHOULD BE BUILT ANEW EACH TIME A LARGE SOFTWARE INFRASTRUCTURE IS REQUIRED IN ORDER TO CREATE THESE PROGRAMMING ABSTRACTIONS RPC (Remote Procedure Call) BASED SYSTEMS TP MONITORS OBJECT BROKERS MESSAGGE BASED SYSTEMS (asinchronous) F. A. Schreiber - L. Tanca 29 INTRODUCTION 15
HOW TO DO APPLICATION INTEGRATION ENTERPRISE APPLICATION INTEGRATION (EAI) EXTENDS THE MIDDLEWARE CONCEPT FROM NEW APPLICATION LOGICS CREATION TO COMPLEX APPLICATIONS INTEGRATION USEFUL IN HIGHLY HETEROGENEOUS ENVIRONMENTS WHERE THE LOWEST LEVEL MECHANISMS SHOULD BE ADAPTED TO EACH PARTICULAR SYSTEM IT MAINLY USES ASINCHRONOUS COMMUNICATIONS MESSAGE BROKERS PUBLISH/SUBSCRIBE PARADIGM WORKFLOW MANAGEMENT SYSTEMS F. A. Schreiber - L. Tanca 30 INTRODUCTION HOW TO DO APPLICATION INTEGRATION WEB SERVICES A SERVICE IS A PROCEDURE, A METHOD OR AN OBJECT PROVIDED WITH A STABLE AND PUBLIC INTERFACE WHICH CAN BE INVOKED BY CLIENTS A WEB SERVICE IS A SOFTWARE APPLICATION PROVIDED WITH A STABLE AND PUBLIC INTERFACE A WEB SERVICE IS NOT A SET OF WEB PAGES F. A. Schreiber - L. Tanca 31 INTRODUCTION 16
HOW TO DO APPLICATION INTEGRATION WEB SERVICES A WAY TO EXPOSE THE FUNCTIONALITIES OF AN INFORMATION SYSTEM MAKING THEM AVAILABLE THROUGH STANDARD WEB TECHNOLOGIES A SOFTWARE APPLICATION IDENTIFIED BY AN URI INTERFACES AND CONNECTIONS CAN BE DEFINED, DESCRIBED, AND DISCOVERED BY MEANS OF XML COMPONENTS SUPPORTS DIRECT INTERACTIONS WITH OTHER AGENTS BY MEANS OF XML MESSAGES EXCHANGED VIA INTERNET PROTOCOLS F. A. Schreiber - L. Tanca 32 INTRODUCTION WEB SERVICES ARCHITECTURE COMPANY A (PROVIDER) WEB SERVICE WEB SERVICE INTERFACE ACCESS TO THE INTERNAL SYSTEM COMPANY D (CLIENT) CLIENT INTERNAL INTERNAL ARCHITECTURE ARCHITECTURE MIDDLEWARE EXTERNAL EXTERNAL ARCHITECTURE ARCHITECTURE WEB SERVICE WEB SERVICE WEB SERVICE COMPANY C (PROVIDER) INTERNAL SERVICE INTERNAL SERVICE WEB SERVICE WEB SERVICE COMPANY B (PROVIDER) From: Alonso, Casati, Kuno, Machiraju F. A. Schreiber - L. Tanca 33 INTRODUCTION 17
WEB SERVICES ARCHITECTURE WEB SERVICES ARE USED AS SOPHISTICATED WRAPPER IN A TIER ABOVE CONVENTIONAL MIDDLEWARE SERVICES THE INTERNAL ARCHITECTURE DEFINES THE CONNECTIONS WITH THE LOCAL INFORMATION SYSTEMS THE EXTERNAL ARCHITECTURE DEFINES HOW WEB SERVICES CAN RECOGNIZE AND INTERACT WITH EACH OTHER IT REQUIRES THAT DIFFERENT ORGANIZATIONS COOPERATE VIA INTERNET OFTEN THERE IS NO CENTRALIZED CONTROL IT IS BASED ON THE DEFINITION OF WIDELY RECOGNIZED STANDARDS F. A. Schreiber - L. Tanca 34 INTRODUCTION WEB SERVICES INFRASTRUCTURE TO IMPLEMENT A SET OF WEB SERVICES WE NEED A COMMON SYNTAX FOR SPECIFICATION PROVIDED BY XML STANDARDS A MECHANISM ALLOWING THE INTERACTION AMONG THE INTERESTED SITES A COMMON DATA FORMAT AN AGREEMENT AS TO THE INTERACTION TYPE ASINCHRONOUS (MESSAGES) SINCHRONOUS (RPC) A MECHANISM WHICH MAPS MESSAGES ON THE TRANSPORTATION PROTOCOL (TCP/IP, HTTP, SMTP) SIMPLE OBJECT ACCESS PROTOCOL (SOAP) F. A. Schreiber - L. Tanca 35 INTRODUCTION 18
WEB SERVICES INFRASTRUCTURE A STANDARD DESCRIPTION OF SERVICES AND OF THEIR INTERFACES WEB SERVICES DESCRIPTION LANGUAGE (WSDL) WHEN COMPILED IN AN APPROPRIATE PROGRAMMING LANGUAGE, IT GENERATES THE STUBS AND THE SKELETONS TO MAKE SERVICE CALLS TRANSPARENT TO THE USER A NAMING AND DIRECTORY MANAGEMENT SERVICE A STANDARD WAY TO PUBBLISH AND LOCATE SERVICES UNIVERSAL DESCRIPTION DISCOVERY and INTEGRATION (UDDI) F. A. Schreiber - L. Tanca 36 INTRODUCTION WEB SERVICES INFRASTRUCTURE PROVIDER WDSL REQUESTOR WDSL COMPILER CLIENT SIDE WDSL COMPILER PROVIDER SIDE PROVIDER APPLICATION OBJECT (CLIENT) STUB APPLICATION OBJECT (PROVIDER) SKELETON SOAP MIDDLEWARE SOAP MESSAGGES SOAP MIDDLEWARE SOAP MESSAGGES SERVICE SEARCH SOAP MESSAGGES SERVICE PUBLICATION SOAP MIDDLEWARE SERVICE DESCRIPTION From: Alonso, Casati, Kuno, Machiraju UDDI REGISTRY F. A. Schreiber - L. Tanca 37 INTRODUCTION 19
NEW FRONTIERS CLOUD COMPUTING: DELIVERING HOSTED SERVICES THROUGH THE INTERNET IaaS (Infrastructure as a Service) Provides virtual server instances with unique IP address and storage on demand (e.g. Amazon Web Services PaaS (Platform as a Service) Set of software and development tools hosted on the provider s infrastructure (e.g. GoogleApps) SaaS (Software as a Service) The service provider supplies both the application and the data. The user operates from a front-end portal F. A. Schreiber - L. Tanca 38 INTRODUCTION CLOUD COMPUTING CLOUD STORAGE DATA IS STORED ON MULTIPLE (THIRD PARTIES) VIRTUAL SERVERS MAIN PROS DEVICE INDEPENDENCE RELIABILITY (RESOURCES REPLICATION) SCALABILITY (RESOURCES USED ON DEMAND) MAINTENANCE MAIN CONS LOSS OF CONTROL PRIVACY LEAKS F. A. Schreiber - L. Tanca 39 INTRODUCTION 20