BIG DATA Business Performance Big Data Integration Big data is often about doing things that weren t widely possible because the technology was not advanced enough or the cost of doing so was prohibitive. An important aspect of big data is that you often don t need to own all the data that you will use. You may be leveraging social media data, data coming from third-party industry statistics, or even data coming from satellites. it becomes necessary to integrate different sources. The problem is, that we won t get the business value if we deal with a variety of data sources as a set of disconnected silos of information, to solve this we need to include two components: Connectors and Metadata. 1
Connectores We need to have some connectors that enable us to pull data in from various big data sources. Like a Twitter connector or a Facebook one. Maybe we need to integrate from our data warehouse with a big data source that s off our premises so that we can analyze both of these sources of data together. Metadata Metadata is the definitions, mappings, and other characteristics used to describe how to find, access, and use a company s data (and software) components. One example of metadata is data about an account number. This might include the number, description, data type, name, address, phone number, and privacy level. Metadata is defined as data about data. Software designers have been using metadata for decades under several names like data dictionary, metadata directory, and more recently, tags. 2
Distributed Computing Distributed computing is not a new technology concept. It has been around for almost 50 years. Initially, the technology was used in computer science research as a way to scale computing tasks and attack complex problems without the expense of massive computing systems. Individual computing entities simply pass messages to each other. In other situations, a distributed computing environment may share resources ranging from memory to networks and storage. All distributed computing models have a common attribute: They are a group of networked computers that work together to execute a workload or process. The most well-known distributed computing model, the Internet, is the foundation for everything from e-commerce to cloud computing to service management and virtualization. The Internet was conceived as a research project funded by the U.S. DARPA. It was designed to create an interconnecting networking system that would support noncommercial, collaborate research among scientists. In the early days of the Internet, these computers were often connected by telephone lines Distributed Computing - Internet As the technology matured over the next decade, common protocols such as Transmission Control Protocol (TCP) helped to proliferate the technology and the network. When the Internet Protocol (IP) was added, the project moved from a closed network for a collection of scientists to a potentially commercial platform to transfer e-mail across the globe. Throughout the 1980s, new Internet-based services began to spring up in the market as a commercial alternative to the DARPA network. In 1992, the U.S. Congress passed the Scientific and Advanced-Technology Act that for the first time, allowed commercial use of this powerful networking technology. With its continued explosive growth, the Internet is truly a global distributed network and remains the best example of the power of distributed computing. 3
Distributed Computing - Internet Before the commercialization of the Internet, there were hundreds of companies and organizations creating a software infrastructure intended to provide a common platform to support a highly distributed computing environment. However, each vendor or standards organization came up with its own remote procedures calls (RPCs) that all customers commercial software developers, and partners would have to adopt and support. With vendors implementing proprietary RPCs, it became impractical to imagine that any one company would be able to create a universal standard for distributed computing. By the mid-1990s, the Internet protocols replaced these primitive approaches and became the foundation for what is distributed computing today. Big Data Decision https://www.youtube.com/watch?v=nxel5f4_aka 4
Big Data Technology Components Layer 0: Redundant Physical Infrastructure Big data implementations have very specific requirements on all elements in the reference architecture, so you need to examine these requirements on a layer-by-layer basis to ensure that your implementation will perform and scale according to the demands of your business. Performance, also called latency, is often measured end to end, based on a single transaction or query request. Availability: How long can your business wait in the case of a service interruption or failure? Scalability: How much disk space is needed today and in the future? Flexibility: How quickly can you add more resources to the infrastructure? Cost: Because the infrastructure is a set of components, you might be able to buy the best networking and decide to save money on storage (or vice versa). 5
Layer 0: Redundant Physical Infrastructure As big data is all about high-velocity, high-volume, and high-data variety, the physical infrastructure will literally make or break Networks should be redundant and must have enough capacity to accommodate the anticipated volume and velocity of the inbound and outbound data in addition to the normal network traffic experienced by the business. As you begin making big data an integral part of your computing strategy, it is reasonable to expect volume and velocity to increase. The hardware (storage and server) assets must have sufficient speed and capacity to handle all expected big data capabilities. It s of little use to have a high-speed network with slow servers because the servers will most likely become a bottleneck Layer 1: Security Infrastructure Security and privacy requirements for big data are similar to the requirements for conventional data environments. The security requirements have to be closely aligned to specific business needs. Some unique challenges arise when big data becomes part of the strategy, which we briefly describe in this list: Data access: User access; Application access: Application access to data; Data encryption: Data encryption is the most challenging aspect of security in a big dat Threat detection: The inclusion of mobile devices and social networks exponentially increases both the amount of data and the opportunities for security threats.a environment. Interfaces ad API (Application program interface) 6
Layer 2: Operational Databases At the core of any big data environment are the database engines containing the collections of data elements relevant to your business. These engines need to be fast, scalable, and rock solid. They are not all created equal, and certain big data environments will fare better with one engine than another, or more likely with a mix of database engines. No single right choice exists regarding database languages. Although SQL is the most prevalent database query language in use today, other languages like Python or Java may provide a more effective or efficient way of solving your big data challenges. It is very important to understand what types of data can be manipulated by the database and whether it supports true transactional behavior. Database designers describe this behavior with the acronym ACID. It stands for Layer 2: Operational Databases Atomicity: A transaction is all or nothing when it is atomic. If any part of the transaction or the underlying system fails, the entire transaction fails. Consistency: Only transactions with valid data will be performed on the database. If the data is corrupt or improper, the transaction will not complete and the data will not be written to the database. Isolation: Multiple, simultaneous transactions will not interfere with each other. All valid transactions will execute until completed and in the order they were submitted for processing. Durability: After the data from the transaction is written to the database,it stays there forever. 7
Layer 3: Organizing Data Services and Tools Capture, validate, and assemble various big data elements into contextually relevant collections Organizing data services are, in reality, an ecosystem of tools and technologies that can be used to gather and assemble data in preparation for further processing. A distributed file system: Necessary to accommodate the decomposition of data streams and to provide scale and storage capacity Serialization services: Necessary for persistent data storage and Multilanguage remote procedure calls (RPCs) Coordination services: Necessary for building distributed applications (locking and so on) Extract, transform, and load (ETL) tools: Necessary for the loading and conversion of structured and unstructured data into Hadoop Workflow services: Necessary for scheduling jobs and providing a structure for synchronizing process elements across layers Layer 4: Analytical Data Warehouses The data warehouse, and its companion the data mart, have long been the primary techniques that organizations use to optimize data to help decision. Data warehouse assembles data dispersed in various data sources across the enterprise and helps business stakeholders manage their operations by making better informed decisions 8
Layer 4: Analytical Data Warehouses https://www.youtube.com/watch?v=kghby_sales Layer 4: Analytical Data Warehouses The data mart is a subset of the data warehouse and is usually oriented to a specific business line or team. Whereas data warehouses have an enterprisewide depth, the information in data marts pertains to a single department. 9
Layer 4: Analytical Data Warehouses https://www.youtube.com/watch?v=ogjrtkrozwi Layer 4: Analytical Data Warehouses Data warehouses and marts simplify the creation of reports and the visualization of disparate data items. They are generally created from relational databases, multidimensional databases, flat files, and object databases. As the organization of the data and its readiness for analysis are key, most data warehouse implementations are kept current via batch processing. Because many data warehouses and data marts are comprised of data gathered from various sources within a company, the costs associated with the cleansing and normalizing of the data must also be addressed. 10
Big Data Analytics Existing analytics tools and techniques will be very helpful in making sense of big data. However, there is a catch. The algorithms that are part of these tools have to be able to work with large amounts of potentially real-time and disparate data.. There are 3 classes of analytics tools that can be used independently or collectively by decision makers to help steer the business. Reporting and dashboards: These tools provide a user-friendly representation of the information from various sources. Although a mainstay in the traditional data world, this area is still evolving for big data. Some of the tools that are being used are traditional ones that can now access the new kinds of databases collectively called NoSQL (Not Only SQL). Big Data Analytics Visualization: These tools are the next step in the evolution of reporting. The output tends to be highly interactive and dynamic in nature. Another important distinction between reports and visualized output is animation. Business users can watch the changes in the data utilizing a variety of different visualization techniques, including mind maps, heat maps, infographics, and connection diagrams. Often, reporting and visualization occur at the end of the business activity. Although the data may be imported into another tool for further computation or examination, this is the final step. Analytics and advanced analytics: These tools reach into the data warehouse and process the data for human consumption. Advanced analytics should explicate trends or events that are transformative, unique, or revolutionary to existing business practice. Predictive analytics and sentiment analytics are good examples of this science. 11
Big Data Analytics - Dashboard The fundamental challenge of the dashboard design is to display all information required on a single screen, clearly and without distraction, to be assimilated rapidly Three layers of information Monitoring The analysis Management Big Data Analytics - Dashboard https://www.datapine.com/dashboard-examples-and-templates/management https://www.youtube.com/watch?time_continue=492&v=j-yqmi-gkei 12
Big Data Analytics - Dashboard What makes a good Dashboard What to look for in a dashboard: The use of visual components to highlight data and exceptions requiring intervention Transparent to the user, which means they require little training and are extremely easy to use Combines data from a variety of systems into a single summary / view Unified Enterprise Allows a drill-down or access to the details of the underlying data sources or other reports Provides a dynamic vision of the company with time data timely Big Data Analytics - Dashboard What to look for in a dashboard: Requires little coding to implement, easy to deploy and maintain "Benchmark" KPIs with Industry Standards Include contextualization of data Validate design and design by an ergonomist Ranking, Alerts and Exception Enrich it with Business Feedback Information in three different levels Choose the right visualizations Guided Analyzes 13
Business Performance Management (BPM) A real-time system that tells managers about opportunities potential problems, imminent problems and threats, and then ways to respond through models and collaboration. BPM refers to business processes, methodologies, metrics and technologies used by companies to measure, monitor and manage performance. BPM has three key components: A set of closed-loop analytical and management processes integrated, backed by technology... The tools required for companies to set objectives and measuring / managing performance against them. Methods and tools for monitoring key performance indicators (KPIs), linked to the organizational strategy THE CLOSED LOOP Process Steps: Strategist Plan Monitoring / Analysis Act / Adjust 14
Strategic planning Where we want to go? Common tasks for the strategic planning process: Conduct an analysis of the current situation Determine planning horizon Conduct an environmental scan Identify critical success factors Perform a Gap Analysis Create a strategic vision Develop a business strategy Identify strategic goals and objectives Plan How to get there? Operational planning Operational Plan: A plan that reflects the strategic objectives of an organization in a set of well-defined tactics and initiatives, the resources, and expected results for a certain period of future time (usually one year). Operational planning can be Tactic-centric (operations-oriented) Budget-centric (financially targeted) 15
Monitoring / Analysis How we perform? A global framework for performance monitoring must meet two Key issues: What to watch Critical Success Factors Strategic Goals and Objectives How to monitor Act / Adjust Success (or simple survival) depends on new projects: the creation of new products, new markets, acquiring new customers (or enterprises), or the rationalization of certain processes. Many new projects and businesses fail! What is the risk of failure? 60% of Hollywood films fail 70% of major IT projects fail,... 16
Business Performance Measurement Performance Measurement System: A system that assists managers in monitoring the implementation of the by comparing the actual results against the goals and strategic targets Systematic comparative methods that indicate progress (or lack thereof) in relation to the objectives Operational KPIs and Metrics A Key Performance Indicator is a measurable value that demonstrates how effectively a company is achieving key business objectives. Organizations use KPIs at multiple levels to evaluate their success at reaching targets A KPI is a strategic goal and metrics that measure performance against a goal 6 distinguishing characteristics of KPIs: Strategic Objectives Variations Encodings Time Benchmark https://www.klipfolio.com/resources/kpi-examples 17