Big Data The New Era of Data

Size: px
Start display at page:

Download "Big Data The New Era of Data"

Transcription

1 Big Data The New Era of Data Ruchita H.Bajaj, Prof. P. L. Ramteke CS-IT Department, Amravati University, India Abstract-Every day, we create 2.5 quintillion bytes of data so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data is Big Data. As data scientists, we live in interesting times. Data has been the No. 1 fast growing phenomenon on the Internet for the last decade. Big Data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target, as of 2012 ranging from a few dozen terabytes to many petabytes of data in a single data set. internet connection most of the things we talk about #edtech become useless. This situation leads us to the fact that students school life will be recorded in Big Data. Let s include their after school time by taking their personal devices into consideration and now you have full profile of each and every student in Big Data. Here comes a cliché: when students grow up they will be in the system. Actually same thing is true for us too but fortunately we were not born into this technology. It s not that Big Data is pure evil. If all data about your health could be recorded and analysed wouldn t it be useful when you try to recover from an illness? Keywords Big Data, petabytes, capture, curate, terabytes. I. INTRODUCTION As time goes by the amount of data increases. But, what does it really include? You can say that it includes your internet activities such as social media posts, photos but this type of data is not enough to create Big Data. What device did you use when you sent a tweet? When did you send it? Where were you? What was the operating system of your device? Which version was it? What other apps were installed? Did you save it as a draft and send it later or did you send it immediately? All this information, which we call unnecessary, are recorded and stored somewhere and they create the Big Data. Fig 2. A Decade of Digital Universe Growth, Storage in Exabytes Fig 1. We are living in the world of DATA First of all it is about collecting information. Whether it is big or not information has always been the most valuable thing in human history. Companies can use Big Data to analyse consumer behaviours. Scientists can use it to discover new facts. Governments can use it for well you can guess. Some people say Big Data is another representation of Big Brother. Who holds this information and how can it be analysed? These are technical questions. I want to talk about why it is related to education? Today s educational technology is based on internet. Without Big data is defined as large amount of data which requires new technologies and architectures to make possible to extract value from it by capturing and analysis process. Big Data has emerged because we are living in a society which makes increasing use of data intensive technologies. Due to such large size of data it becomes very difficult to perform effective analysis using the existing traditional techniques. Since Big data is a recent upcoming technology in the market which can bring huge benefits to the business organizations, it becomes necessary that various challenges and issues associated in bringing and adapting to this technology are need to be understood. Big Data concept means a datasets which continues to grow so much that it becomes difficult to manage it using existing database management concepts & tools. Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage search, sharing, transfer, analysis [1] and visualization

2 Big Data is certainly not a measurement but we should understand how much data is considered Big. In the table below the starting of terrabyte of data is considered to starting of what is referred to as big data. [2] Estimations 4.7 Gigabytes: A single DVD 1 Terabyte: About two years worth of non-stop MP3s. (Assumes one megabyte per minute of music) 10 Terabytes: The printed collection of the U.S. Library of Congress 1 Petabyte: The amount of data stored on a stack of CDs about 2 miles high or 13 years of HD-TV video 20 Petabytes: The storage capacity of all hard disk drives created in Exabyte: One billion gigabytes 5 Exabytes: All words ever spoken by mankind TABLE I DEFINITIONS AND ESTIMATIONS Definitions Gigabyte: 1024 megabytes Terabyte: 1024 giabytes Petabyte: 1024 terabytes Exabyte: 1024 petabytes II. ARCHITECTURE In this section, we will take a closer look at the overall architecture for big data. A. Traditional Information Architecture Capabilities To understand the high-level architecture aspects of Big Data, let s first review well formed logical information architecture for structured data. In the illustration, you see two data sources that use integration (ELT/ETL/Change Data Capture) techniques to transfer data into a DBMS data warehouse or operational data store, and then offer a wide variety of analytical capabilities to reveal the data. Some of these analytic capabilities include: dashboards, reporting, EPM/BI applications, summary and statistical query, semantic interpretations for textual data, and visualization tools for high-density data. In addition, some organizations have applied oversight and standardization across projects, and perhaps have matured the information architecture capability through managing it at the enterprise level. Fig 3: Traditional Information Architecture Capabilities The key information architecture principles include treating data as an asset through a value, cost, and risk lens, and ensuring timeliness, quality, and accuracy of data. And, the EA oversight responsibility is to establish and maintain a balanced governance approach including using center of excellence for standards management and training. B. Adding Big Data Capabilities The defining processing capabilities for big data architecture are to meet the volume, velocity, variety, and value requirements. Unique distributed (multi-node) parallel processing architectures have been created to parse these large data sets. There are differing technology strategies for real-time and batch processing requirements. For real-time, key-value data stores, such as NoSQL, allow for high performance, index-based retrieval. For batch processing, a technique known as Map Reduce, filters data according to a specific data discovery strategy. After the filtered data is discovered, it can be analyzed directly, loaded into other unstructured databases, sent to mobile devices, or merged into traditional data warehousing environment and correlated to structured data. Fig 4: Big Data Information Architecture Capabilities An Oracle White In addition to new unstructured data realms, there are two key differences for big data. First, due to the size of the data sets, we don t move the raw data directly to a data warehouse. However, after MapReduce processing we may integrate the reduction result into the data warehouse environment so that we can leverage conventional BI reporting, statistical, semantic, and correlation capabilities. It is ideal to have analytic capabilities that combine a conventional BI platform along with big data visualization and query capabilities. And second, to facilitate analysis in the Hadoop environment, sandbox environments can be created. For many use cases, big data needs to capture data that is continuously changing and unpredictable. And to analyze that data, a new architecture is needed. In retail, a good example is capturing real time foot traffic with the intent of delivering in-store promotion. To track the effectiveness of floor displays and promotions, customer movement and behavior must be interactively explored with visualization or query tools. In other use cases, the analysis cannot be complete until you correlate it with other enterprise data - structured data. In the example of consumer sentiment analysis, capturing a positive or negative social media comment has some value, but associating it with your most or least profitable customer makes it far more valuable. So, the needed capability with Big Data BI is context and understanding. Using powerful statistical and semantic tools allow you to find the needle in the haystack, and will help you predict the future. In summary, the Big Data architecture challenge is to meet the rapid use and rapid data interpretation requirements while at the same time correlating it with other data

3 What s important is that the key information architecture principles are the same, but the tactics of applying these principles differ. For example, how do we look at big data as an asset. We all agree there s value hiding within the massive high-density data set. But how do we evaluate one set of big data against the other? How do we prioritize? The key is to think in terms of the end goal. Focus on the business values and understand how critical they are in support of the business decisions, as well as the potential risks of not knowing the hidden patterns. Another example of applying architecture principles differently is data governance. The quality and accuracy requirements of big data can vary tremendously. Using strict data precision rules on user sentiment data might filter out too much useful information, whereas data standards and common definitions are still going to be critical for fraud detections scenarios. To reiterate, it is important to leverage your core information architecture principles and practices, but apply them in a way that s relevant to big data. In addition, the EA responsibility remains the same for big data. It is to optimize success, centralize training, and establish standards. C. An Integrated Information Architecture One of the obstacles observed in Hadoop adoption in enterprise is the lack of integration with existing BI ecosystem. At present, the traditional BI and big data ecosystems are separate causing integrated data analysis headaches. As a result, they are not ready for use by the typical business user or executive. An Oracle White Paper in Enterprise Architecture Information Architecture: An Architect s Guide to Big Data Earlier adopters of big data have often times written custom code to move the processed results of big data back into database or developed custom solutions to report and analyze on them. These options might not be feasible and economical for enterprise IT. First of all, it creates proliferations of one-off code and different standards. Architecture impacts IT economics. Big Data done independently runs the risk of redundant investments. In addition, most businesses simply do not have the staff and skill level for such custom development work. A better option is to incorporate the Big Data results into the existing Data Warehousing platform. The power of information lies in our ability to make associations and correlation. What we need is the ability to bring different data sources, processing requirements together for timely and valuable analysis. Here is Oracle s holistic capability map that bridges traditional information architecture and big data architecture: As various data are captured, they can be stored and processed in traditional DBMS, simple files, or distributed-clustered systems such as NoSQL and Hadoop Distributed File System (HDFS). Architecturally, the critical component that breaks the divide is the integration layer in the middle. Fig 5: Oracle Integrated Information Architecture Capabilities This integration layer needs to extend across all of the data types and domains, and bridge the gap between the traditional and new data acquisition and processing framework. The data integration capability needs to cover the entire spectrum of velocity and frequency. It needs to handle extreme and ever-growing volume requirements. And it needs to bridge the variety of data structures. You need to look for technologies that allow you to integrate Hadoop / MapReduce with your data warehouse and transactional data stores in a bi-directional manner. The next layer is where you load the reductionresults from Big Data processing output into your data warehouse for further analysis. You also need the ability to access your structured data An Oracle White Paper in Enterprise Architecture Information Architecture: An Architect s Guide to Big Data such as customer profile information while you process through your big data to look for patterns such as detecting fraudulent activities. The Big Data processing output will be loaded into the traditional ODS, data warehouse, and data marts, for further analysis, just as the transaction data. The additional component in this layer is the Complex Event Processing engine to analyse stream data in real time. The Business Intelligence layer will be equipped with advanced analytics, in-database statistical analysis, and advanced visualization, on top of the traditional components such as reports, dashboards, and queries. Governance, security, and operational management also cover the entire spectrum of data and information landscape at the enterprise level. With this architecture, the business users do not see a divide. They don t even need to be made aware that there is a difference between traditional transaction data and Big Data. The data and analysis flow would be seamless as they navigate through various data and information sets, test hypothesis, analyse patterns, and make informed decisions. [3]

4 customers to choose between vendors, increasing competition. The challenge ahead of us is to combine these healthy features of prior systems as we devise novel solutions to the many new challenges of Big Data. In this paper, we consider each of the boxes in the figure above, and discuss both what has already been done and what challenges remain as we seek to exploit Big Data. We begin by considering the five stages in the pipeline, then move on to the five cross-cutting challenges, and end with a discussion of the architecture of the overall system that combines all these functions. Fig 6: An example of Big Data Architecture [4] III. BIG DATA ANALYSIS The analysis of Big Data involves multiple distinct phases as shown in the figure below, each of which introduces challenges. Many people unfortunately focus just on the analysis/modelling phase: while that phase is crucial, it is of little use without the other phases of the data analysis pipeline. Even in the analysis phase, which has received much attention, there are poorly understood complexities in the context of multi-tenanted clusters where several users programs run concurrently. Many significant challenges extend beyond the analysis phase. For example, Big Data has to be managed in context, which may be noisy, heterogeneous and not include an upfront model. Doing so raises the need to track provenance and to handle uncertainty and error: topics that are crucial to success, and yet rarely mentioned in the same breath as Big Data. Similarly, the questions to the data analysis pipeline will typically not all be laid out in advance. We may need to figure out good questions based on the data. Doing this will require smarter systems and also better support for user interaction with the analysis pipeline. In fact, we currently have a major bottleneck in the number of people empowered to ask questions of the data and analyze it [NYT2012]. We can drastically increase this number by supporting many levels of engagement with the data, not all requiring deep database expertise. Solutions to problems such as this will not come from incremental improvements to business as usual such as industry may make on its own. Rather, they require us to fundamentally rethink how we manage data analysis. Fortunately, existing computational techniques can be applied, either as is or with some extensions, to at least some aspects of the Big Data problem. For example, relational databases rely on the notion of logical data independence: users can think about what they want to compute, while the system (with skilled engineers designing those systems) determines how to compute it efficiently. Similarly, the SQL standard and the relational data model provide a uniform, powerful language to express many query needs and, in principle, allows Fig 7: Phases in the Processing Pipeline A. Data Acquisition and Recording Big Data does not arise out of a vacuum: it is recorded from some data generating source. For example, consider our ability to sense and observe the world around us, from the heart rate of an elderly citizen, and presence of toxins in the air we breathe, to the planned square kilometre array telescope, which will produce up to 1 million terabytes of raw data per day. Similarly, scientific experiments and simulations can easily produce petabytes of data today. Much of this data is of no interest, and it can be filtered and compressed by orders of magnitude. One challenge is to define these filters in such a way that they do not discard useful information. For example, suppose one sensor reading differs substantially from the rest: it is likely to be due to the sensor being faulty, but how can we be sure that it is not an artifact that deserves attention? In addition, the data collected by these sensors most often are spatially and temporally correlated (e.g., traffic sensors on the same road segment). We need research in the science of data reduction that can intelligently process this raw data to a size that its users can handle while not missing the needle in the haystack. Furthermore, we require on-line analysis techniques that can process such streaming data on the fly, since we cannot afford to store first and reduce afterward. The second big challenge is to automatically generate the right metadata to describe what data is recorded and how it is recorded and measured. For example, in scientific experiments, considerable detail regarding specific experimental conditions and procedures may be

5 required to be able to interpret the results correctly, and it is important that such metadata be recorded with observational data. Metadata acquisition systems can minimize the human burden in recording metadata. Another important issue here is data provenance. Recording information about the data at its birth is not useful unless this information can be interpreted and carried along through the data analysis pipeline. For example, a processing error at one step can render subsequent analysis useless; with suitable provenance, we can easily identify all subsequent processing that dependent on this step. Thus we need research both into generating suitable metadata and into data systems that carry the provenance of data and its metadata through data analysis pipelines. B. Information Extraction and Cleaning Frequently, the information collected will not be in a format ready for analysis. For example, consider the collection of electronic health records in a hospital, comprising transcribed dictations from several physicians, structured data from sensors and measurements (possibly with some associated uncertainty), and image data such as x-rays. We cannot leave the data in this form and still effectively analyze it. Rather we require an information extraction process that pulls out the required information from the underlying sources and expresses it in a structured form suitable for analysis. Doing this correctly and completely is a continuing technical challenge. Note that this data also includes images and will in the future include video; such extraction is often highly application dependent (e.g., what you want to pull out of an MRI is very different from what you would pull out of a picture of the stars, or a surveillance photo). In addition, due to the ubiquity of surveillance cameras and popularity of GPS-enabled mobile phones, cameras, and other portable devices, rich and high fidelity location and trajectory (i.e., movement in space) data can also be extracted. We are used to thinking of Big Data as always telling us the truth, but this is actually far from reality. For example, patients may choose to hide risky behavior and caregivers may sometimes mis-diagnose a condition; patients may also inaccurately recall the name of a drug or even that they ever took it, leading to missing information in (the history portion of) their medical record. Existing work on data cleaning assumes well-recognized constraints on valid data or well-understood error models; for many emerging Big Data domains these do not exist. C. Data Integration, Aggregation, and Representation Given the heterogeneity of the flood of data, it is not enough merely to record it and throw it into a repository. Consider, for example, data from a range of scientific experiments. If we just have a bunch of data sets in a repository, it is unlikely anyone will ever be able to find, let alone reuse, any of this data. With adequate metadata, there is some hope, but even so, challenges will remain due to differences in experimental details and in data record structure. Data analysis is considerably more challenging than simply locating, identifying, understanding, and citing data. For effective large-scale analysis all of this has to happen in a completely automated manner. This requires differences in data structure and semantics to be expressed in forms that are computer understandable, and then robotically resolvable. There is a strong body of work in data integration that can provide some of the answers. However, considerable additional work is required to achieve automated error-free difference resolution. Even for simpler analyses that depend on only one data set, there remains an important question of suitable database design. Usually, there will be many alternative ways in which to store the same information. Certain designs will have advantages over others for certain purposes, and possibly drawbacks for other purposes. Witness, for instance, the tremendous variety in the structure of bioinformatics databases with information regarding substantially similar entities, such as genes. Database design is today an art, and is carefully executed in the enterprise context by highly-paid professionals. We must enable other professionals, such as domain scientists, to create effective database designs, either through devising tools to assist them in the design process or through forgoing the design process completely and developing techniques so that databases can be used effectively in the absence of intelligent database design. D. Query Processing, Data Modeling, and Analysis Methods for querying and mining Big Data are fundamentally different from traditional statistical analysis on small samples. Big Data is often noisy, dynamic, heterogeneous, inter-related and untrustworthy. Nevertheless, even noisy Big Data could be more valuable than tiny samples because general statistics obtained from frequent patterns and correlation analysis usually overpower individual fluctuations and often disclose more reliable hidden patterns and knowledge. Further, interconnected Big Data forms large heterogeneous information networks, with which information redundancy can be explored to compensate for missing data, to crosscheck conflicting cases, to validate trustworthy relationships, to disclose inherent clusters, and to uncover hidden relationships and models. Mining requires integrated, cleaned, trustworthy, and efficiently accessible data, declarative query and mining interfaces, scalable mining algorithms, and big-data computing environments. At the same time, data mining itself can also be used to help improve the quality and trustworthiness of the data, understand its semantics, and provide intelligent querying functions. As noted previously, real-life medical records have errors, are heterogeneous, and frequently are distributed across multiple systems. The value of Big Data analysis in health care, to take just one example application domain, can only be realized if it can be applied robustly under these difficult conditions. On the flip side, knowledge developed from data can help in correcting errors and removing ambiguity. For example, a physician may write DVT as the diagnosis for a patient. This abbreviation is commonly used for both deep vein thrombosis and diverticulitis, two very different medical conditions. A knowledge-base constructed from related data

6 can use associated symptoms or medications to determine which of two the physician meant. Big Data is also enabling the next generation of interactive data analysis with real-time answers. In the future, queries towards Big Data will be automatically generated for content creation on websites, to populate hotlists or recommendations, and to provide an ad hoc analysis of the value of a data set to decide whether to store or to discard it. Scaling complex query processing techniques to terabytes while enabling interactive response times is a major open research problem today. A problem with current Big Data analysis is the lack of coordination between database systems, which host the data and provide SQL querying, with analytics packages that perform various forms of non-sql processing, such as data mining and statistical analyses. Today s analysts are impeded by a tedious process of exporting data from the database, performing a non-sql process and bringing the data back. This is an obstacle to carrying over the interactive elegance of the first generation of SQL-driven OLAP systems into the data mining type of analysis that is in increasing demand. A tight coupling between declarative query languages and the functions of such packages will benefit both expressiveness and performance of the analysis. E. Interpretation Having the ability to analyze Big Data is of limited value if users cannot understand the analysis. Ultimately, a decision-maker, provided with the result of analysis, has to interpret these results. This interpretation cannot happen in a vacuum. Usually, it involves examining all the assumptions made and retracing the analysis. Furthermore, as we saw above, there are many possible sources of error: computer systems can have bugs, models almost always have assumptions, and results can be based on erroneous data. For all of these reasons, no responsible user will cede authority to the computer system. Rather she will try to understand, and verify, the results produced by the computer. The computer system must make it easy for her to do so. This is particularly a challenge with Big Data due to its complexity. There are often crucial assumptions behind the data recorded. Analytical pipelines can often involve multiple steps, again with assumptions built in. The recent mortgage-related shock to the financial system dramatically underscored the need for such decision-maker diligence -- rather than accept the stated solvency of a financial institution at face value, a decision-maker has to examine critically the many assumptions at multiple stages of analysis. In short, it is rarely enough to provide just the results. Rather, one must provide supplementary information that explains how each result was derived, and based upon precisely what inputs. Such supplementary information is called the provenance of the (result) data. By studying how best to capture, store, and query provenance, in conjunction with techniques to capture adequate metadata, we can create an infrastructure to provide users with the ability both to interpret analytical results obtained and to repeat the analysis with different assumptions, parameters, or data sets. Furthermore, with a few clicks the user should be able to drill down into each piece of data that she sees and understand its provenance, which is a key feature to understanding the data [5].That is, users need to be able to see not just the results, but also understand why they are seeing those results. However, raw provenance, particularly regarding the phases in the analytics pipeline, is likely to be too technical for many users to grasp completely. One alternative is to enable the users to play with the steps in the analysis make small changes to the pipeline, for example, or modify values for some parameters. The users can then view the results of these incremental changes. By these means, users can develop an intuitive feeling for the analysis and also verify that it performs as expected in corner cases. Accomplishing this requires the system to provide convenient facilities for the user to specify analyses. IV. DIMENSIONS The convergence of these four dimensions helps both to define and distinguish big data: A. Volume The amount of data. Perhaps the characteristic most associated with big data, volume refers to the mass quantities of data that organizations are trying to harness to improve decision-making across the enterprise. Data volumes continue to increase at an unprecedented rate. However, what constitutes truly high volume varies by industry and even geography, and is smaller than the petabytes and zetabytes often referenced. Just over half of respondents consider datasets between one terabyte and one petabyte to be big data, while another 30 percent simply didn t know how big big is for their organization. Still, all can agree that whatever is considered high volume today will be even higher tomorrow. B. Variety Different types of data and data sources. Variety is about managing the complexity of multiple data types, including structured, semi-structured and unstructured data. Organizations need to integrate and analyse data from a complex array of both traditional and non-traditional information sources, from within and outside the enterprise. With the explosion of sensors, smart devices and social collaboration technologies, data is being generated in countless forms, including: text, web data, tweets, sensor data, audio, video, click streams, log files and more. C. Velocity Data in motion. The speed at which data is created, processed and analyzed continues to accelerate. Contributing to higher velocity is the real-time nature of data creation, as well as the need to incorporate streaming data into business processes and decision making. Velocity impacts latency the lag time between when data is created or captured, and when it is accessible. Today, data is continually being generated at a pace that is impossible for traditional systems to capture, store and analyse. For timesensitive processes such as real-time fraud detection or multi-channel instant marketing, certain types of data must be analysed in real time to be of value to the business

7 D. Data Value Data value measures the usefulness of data in making decisions. Data science is exploratory and useful in getting to know the data, but analytic science encompasses the predictive power of big data. User can run certain queries against the data stored and thus can deduct important results from the filtered data obtained and can also rank it according to the dimensions they require. These reports help these people to find the business trends according to which they can change their strategies. Fig 8: Big Data in Dimensions D. Veracity Data uncertainty. Veracity refers to the level of reliability associated with certain types of data. Striving for high data quality is an important big data requirement and challenge, but even the best data cleansing methods cannot remove the inherent unpredictability of some data, like the weather, the economy, or a customer s actual future buying decisions. The need to acknowledge and plan for uncertainty is a dimension of big data that has been introduced as executives seek to better understand the uncertain world around them (see sidebar, Veracity, the fourth V. ).[6] V. BIG DATA CHARACTERISTICS A. Data Volume The Big word in Big data itself defines the volume. At present the data existing is in petabytes (1015) and is supposed to increase to zetabytes (1021) in nearby future. Data volume measures the amount of data available to an organization, which does not necessarily have to own all of it as long as it can access it.[7] B. Data Velocity Velocity in Big data is a concept which deals with the speed of the data coming from various sources. This characteristic is not being limited to the speed of incoming data but also speed at which the data flows and aggregated. C. Data Variety Data variety is a measure of the richness of the data representation text, images video, audio, etc. Data being produced is not of single category as it not only includes the traditional data but also the semi structured data from various resources like web Pages, Web Log Files, social media sites, , documents. Fig 9: Era of Big Data E. Complexity Complexity measures the degree of interconnectedness (possibly very large) and interdependence in big data structures such that a small change (or combination of small changes) in one or a few elements can yield very large changes or a small change that ripple across or cascade through the system and substantially affect its behavior, or no change at all (Katal, Wazid, & Goudar, 2013). VI. CHALLENGES IN BIG DATA The challenges in Big Data are usually the real implementation hurdles which require immediate attention. Any implementation without handling these challenges may lead to the failure of the technology implementation and some unpleasant results. A. Privacy and Security It is the most important challenges with Big data which is sensitive and includes conceptual, technical as well as legal significance. The personal information (e.g. in database of a merchant or social networking website) of a person when combined with external large data sets, leads to the inference of new facts about that person and it s possible that these kinds of facts about the person are secretive and the person might not want the data owner to know or any person to know about them. Information regarding the people is collected and used in order to add value to the business of the organization. This is done by creating insights in their lives which they are unaware of. Another important consequence arising would be Social stratification where a literate person would be taking advantages of the Big data predictive analysis and on the other hand underprivileged will be easily identified and treated worse. Big Data used by law enforcement will increase the chances of certain tagged people to suffer from adverse consequences without the ability to fight back or even having knowledge that they are being discriminated. B. Data Access and Sharing of Information If the data in the companies information systems is to be used to make accurate decisions in time it becomes necessary that it should be available in accurate, complete and timely manner. This makes the data management and governance process bit complex adding the necessity to

8 make data open and make it available to government agencies in standardized manner with standardized APIs, metadata and formats thus leading to better decision making, business intelligence and productivity improvements. Expecting sharing of data between companies is awkward because of the need to get an edge in business. Sharing data about their clients and operations threatens the culture of secrecy and competitiveness. C. Analytical Challenges The main challenging questions are as: What if data volume gets so large and varied and it is not known how to deal with it? Does all data need to be stored? Does all data need to be analyzed? How to find out which data points are really important? How can the data be used to best advantage? Big data brings along with it some huge analytical challenges. The type of analysis to be done on this huge amount of data which can be unstructured, semi structured or structured requires a large number of advance skills. Moreover the type of analysis which is needed to be done on the data depends highly on the results to be obtained i.e. decision making. This can be done by using one of two techniques: either incorporate massive data volumes in analysis or determine upfront which Big data is relevant. D. Human Resources and Manpower Since Big data is at its youth and an emerging technology so it needs to attract organizations and youth with diverse new skill sets. These skills should not be limited to technical ones but also should extend to research, analytical, interpretive and creative ones. These skills need to be developed in individuals hence requires training programs to be held by the organizations. Moreover the Universities need to introduce curriculum on Big data to produce skilled employees in this expertise. E. Technical Challenges 1.) Fault Tolerance: With the incoming of new technologies like Cloud computing and Big data it is always intended that whenever the failure occurs the damage done should be within acceptable threshold rather than beginning the whole task from the scratch. Fault-tolerant computing is extremely hard, involving intricate algorithms. It is simply not possible to devise absolutely foolproof, 100% reliable fault tolerant machines or software. Thus the main task is to reduce the probability of failure to an acceptable level. Unfortunately, the more we strive to reduce this probability, the higher the cost. Two methods which seem to increase the fault tolerance in Big data are as: 1. First is to divide the whole computation being done into tasks and assign these tasks to different nodes for computation. 2. Second is, one node is assigned the work of observing that these nodes are working properly. If something happens that particular task is restarted. But sometimes it s quite possible that that the whole computation can t be divided into such independent tasks. There could be some tasks which might be recursive in nature and the output of the previous computation of task is the input to the next computation. Thus restarting the whole computation becomes cumbersome process. This can be avoided by applying Checkpoints which keeps the state of the system at certain intervals of the time. In case of any failure, the computation can restart from last checkpoint maintained. 2.) Scalability: The scalability issue of Big data has lead towards cloud computing, which now aggregates multiple disparate workloads with varying performance goals into very large clusters. This requires a high level of sharing of resources which is expensive and also brings with it various challenges like how to run and execute various jobs so that we can meet the goal of each workload cost effectively. It also requires dealing with the system failures in an efficient manner which occurs more frequently if operating on large clusters. These factors combined put the concern on how to express the programs, even complex machine learning tasks. There has been a huge shift in the technologies being used. Hard Disk Drives (HDD) are being replaced by the solid state Drives and Phase Change technology which are not having the same performance between sequential and random data transfer. Thus, what kinds of storage devices are to be used; is again a big question for data storage. 3.) Quality of Data: Collection of huge amount of data and its storage comes at a cost. More data if used for decision making or for predictive analysis in business will definitely lead to better results. Business Leaders will always want more and more data storage whereas the IT Leaders will take all technical aspects in mind before storing all the data. Big data basically focuses on quality data storage rather than having very large irrelevant data so that better results and conclusions can be drawn. This further leads to various questions like how it can be ensured that which data is relevant, how much data would be enough for decision making and whether the stored data is accurate or not to draw conclusions from it etc. (Katal, Wazid, & Goudar, 2013). 4.) Heterogeneous Data: Unstructured data represents almost every kind of data being produced like social media interactions, to recorded meetings, to handling of PDF documents, fax transfers, to s and more. Working with unstructured data is cumbersome and of course costly too. Converting all this unstructured data into structured one is also not feasible.structured data is always organized into highly mechanized and manageable way. It shows well integration with database but unstructured data is completely raw and unorganized

9 VII. ISSUES IN BIG DATA The issues in Big Data are some of the conceptual points that should be understood by the organization to implement the technology effectively. Big data Issues are need not be confused with problems but they are important to know and crucial to handle. A. Issues related to the Characteristics Data Volume As data volume increases, the value of different data records will decrease in proportion to age, type, richness, and quantity among other factors. The social networking sites existing are themselves producing data in order of terabytes everyday and this amount of data is definitely difficult to be handled using the existing traditional systems.[7] Data Velocity Our traditional systems are not capable enough on performing the analytics on the data which is constantly in motion. E-Commerce has rapidly increased the speed and richness of data used for different business transactions (for example, web-site clicks). Data velocity management is much more than a bandwidth issue. Data Variety All this data is totally different consisting of raw, structured, semi structured and even unstructured data which is difficult to be handled by the existing traditional analytic systems. From an analytic perspective, it is probably the biggest obstacle toeffectively using large volumes of data. Incompatible data formats, non-aligned data structures, and inconsistent data semantics represents significant challenges that can lead to analytic sprawl. Fig 10 : Explosion in size of Data (Hewlett-Packard Development Company, 2012) Data Value As the data stored by different organizations is being used by them for data analytics. It will produce a kind of gap in between the Business leaders and the IT professionals the main concern of business leaders would be to just adding value to their business and getting more and more profit unlike the IT leaders who would have to concern with the technicalities of the storage and processing. Data Complexity One current difficulty of big data is working with it using relational databases and desktop statistics/visualization packages, requiring massively parallel software running on tens, hundreds, or even thousands of servers. It is quite an undertaking to link, match, cleanse and transform data across systems coming from various sources. It is also necessary to connect and correlate relationships, hierarchies and multiple data linkages or data can quickly spiral out of control (Katal, Wazid, & Goudar, 2013).[8] B. Storage and Transport Issues The quantity of data has exploded each time we have invented a new storage medium. The difference about the most recent data explosion, mainly due to social media, is that there has been no new storage medium. Moreover, data is being created by everyone and everything, (from Mobile Devices to Super Computers) not just, as here to fore, by professionals such as scientist, journalists, writers etc. Current disk technology limits are about 4 terabytes (1012) per disk. So, 1 Exabyte (1018) would require 25,000 disks. Even if an Exabyte of data could be processed on a single computer system, it would be unable to directly attach the requisite number of disks. Access to that data would overwhelm current communication networks. Assuming that a 1 gigabyte per second network has an effective sustainable transfer rate of 80%, the sustainable bandwidth is about 100 megabytes. Thus, transferring an Exabyte would take about 2800 hours, if we assume that a sustained transfer could be maintained. It would take longer time to transmit the data from a collection or storage point to a processing point than the time required to actually process it. To handle this issue, the data should be processed in place and transmit only the resulting information. In other words, bring the code to the data, unlike the traditional method of bring the data to the code. (Kaisler, Armour, Espinosa, & Money, 2013). C.Data Management Issues Data Management will, perhaps, be the most difficult problem to address with big data. Resolving issues of access, utilization, updating, governance, and reference (in publications) have proven to be major stumbling blocks. The sources of the data are varied - by size, by format, and by method of collection. Individuals contribute digital data in mediums comfortable to them like- documents, drawings, pictures, sound and video recordings, models, software behaviors, user interface designs, etc., with or without adequate metadata describing what, when, where, who, why and how it was collected and its provenance. Unlike the collection of data by manual methods, where rigorous protocols are often followed in order to ensure accuracy and validity, Digital data collection is much more relaxed. Given the volume, it is impractical to validate every data item. New approaches to data qualification and validation are needed. The richness of digital data representation prohibits a personalized methodology for data collection. To summarize, there is no perfect big data management solution yet. This represents an important gap in the research literature on big data that needs to be filled. D. Processing Issues Assume that an Exabyte of data needs to be processed in its entirety. For simplicity, assume the data is chunked into blocks of 8 words, so 1 Exabyte = 1K

10 petabytes.[9] Assuming a processor expends 100 instructions on one block at 5 gigahertz, the time required for end-to-end processing would be 20 nanoseconds. To process 1K petabytes would require a total end-to-end processing time of roughly 635 years. Thus, effective processing of Exabyte of data will require extensive parallel processing and new analytics algorithms in order to provide timely and actionable information (Kaisler, Armour, Espinosa, & Money, 2013). VIII. SECURITY A BIG QUESTION FOR BIG DATA Big security for big data We are children of the information generation. No longer tied to large mainframe computers, we now access information via applications, mobile devices, and laptops to make decisions based on real-time data. It is because information is so pervasive that businesses want to capture this data and analyze it for intelligence. A. Data explosion The multitude of devices, users, and generated traffic all combine to create a proliferation of data that is being created with incredible volume, velocity, and variety. As a result, organizations need a way to protect, utilize, and gain real-time insight from big data. This intelligence is not only valuable to businesses and consumers, but also to hackers. Robust information marketplaces have arisen for hackers to sell credit card information, account usernames, passwords, national secrets (WikiLeaks), as well as intellectual property. How does anyone keep secrets anymore? How does anyone keep secrets protected from hackers? In the past when the network infrastructure was straightforward and perimeters used to exist, controlling access to data was much simpler. If your secrets rested within the company network, all you had to do to keep the data safe was to make sure you had a strong firewall in place. However, as data became available through the Internet, mobile devices, and the cloud having a firewall was not enough. Companies tried to solve each security problem in a piecemeal manner, tacking on more security devices like patching a hole in the wall. But, because these products did not interoperate, you could not coordinate a defense against hackers. In order to meet the current security problems faced by organizations, a new paradigm shift needs to occur. Businesses need the ability to secure data, collect it, and aggregate into an intelligent format, so that real-time alerting and reporting can take place. The first step is to establish complete visibility so that your data and who accesses the data can be monitored. Next, you need to understand the context, so that you can focus on the valued assets, which are critical to your business. Finally, utilize the intelligence gathered so that you can harden your attack surface and stop attacks before the data is exfiltrated. So, how do we get started? B. Data collection Your first job is to aggregate all the information from every device into one place. This means collecting information from cloud, virtual, and real appliances: network devices, applications, servers, databases, desktops, and security devices. With Software-as-a-Service (SaaS) applications deployed in the cloud, it is important to collect logs from those applications as well since data stored in the cloud can contain information spanning from human resource management to customer information. Collecting this information gives you visibility into who is accessing your company s information, what information they are accessing, and when this access is occurring. The goal is to capture usage patterns and look for signs of malicious behavior. Typically, data theft is done in five stages1. First, hackers research their target in order to find a way to enter the network. After infiltrating the network, they may install an agent to lie dormant and gather information until they discover where the payload is hosted, and how to acquire it. Once the target is captured, the next step is to exfiltrate the information out of the network. Most advanced attacks progress through these five stages, and having this understanding helps you look for clues on whether an attack is taking place in your environment, and how to stop the attacker from reaching their target. The key to determining what logs to collect are to focus on records where an actor is accessing information or systems. C. Data integration Once the machine data is collected, the data needs to be parsed to derive intelligence from cryptic log messages. Automation and rule-based processing is needed because having a person review logs manually would make the problem of finding an attacker quite difficult since the security analyst would need to manually separate attacks from logs of normal behaviour. The solution is to normalize machine logs so that queries can pull context-aware information from log data. For example, HP ArcSight connectors normalize and categorize log data into over 400 meta fields. Logs that have been normalized become more useful because you no longer need an expert on a particular device to interpret the log. By enriching logs with metadata, you can turn strings of text into information that can be indexed and searched. D. Data analytics Normalized logs are indexed and categorized to make it easy for a correlation engine to process and identify patterns based on heuristics and security rules. It is here where the art of combining logs from multiple sources and correlating events together help to create real-time alerts. This preprocessing also speeds up correlation and makes vendoragnostic event logs, which give analysts the ability to build reports and filters with simple English queries. In real time versus the past Catching a hacker and being able to stop them as the attack is taking place is more useful to a company than being able to use forensics to piece together an attack that already took place.[10] However, in order to have that as part of your arsenal, we have to resolve four problems: How do you insert data faster into your data store? How do you store all this data? How do you quickly process events? How do you return results faster?

Introduction to Data Science

Introduction to Data Science UNIT I INTRODUCTION TO DATA SCIENCE Syllabus Introduction of Data Science Basic Data Analytics using R R Graphical User Interfaces Data Import and Export Attribute and Data Types Descriptive Statistics

More information

Strategic Briefing Paper Big Data

Strategic Briefing Paper Big Data Strategic Briefing Paper Big Data The promise of Big Data is improved competitiveness, reduced cost and minimized risk by taking better decisions. This requires affordable solution architectures which

More information

Challenges and Opportunities with Big Data. By: Rohit Ranjan

Challenges and Opportunities with Big Data. By: Rohit Ranjan Challenges and Opportunities with Big Data By: Rohit Ranjan Introduction What is Big Data? Big data is data sets that are so voluminous and complex that traditional data processing application software

More information

How to integrate data into Tableau

How to integrate data into Tableau 1 How to integrate data into Tableau a comparison of 3 approaches: ETL, Tableau self-service and WHITE PAPER WHITE PAPER 2 data How to integrate data into Tableau a comparison of 3 es: ETL, Tableau self-service

More information

Progress DataDirect For Business Intelligence And Analytics Vendors

Progress DataDirect For Business Intelligence And Analytics Vendors Progress DataDirect For Business Intelligence And Analytics Vendors DATA SHEET FEATURES: Direction connection to a variety of SaaS and on-premises data sources via Progress DataDirect Hybrid Data Pipeline

More information

ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V

ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V WHITE PAPER Create the Data Center of the Future Accelerate

More information

Version 11

Version 11 The Big Challenges Networked and Electronic Media European Technology Platform The birth of a new sector www.nem-initiative.org Version 11 1. NEM IN THE WORLD The main objective of the Networked and Electronic

More information

Embedded Technosolutions

Embedded Technosolutions Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication

More information

ALIGNING CYBERSECURITY AND MISSION PLANNING WITH ADVANCED ANALYTICS AND HUMAN INSIGHT

ALIGNING CYBERSECURITY AND MISSION PLANNING WITH ADVANCED ANALYTICS AND HUMAN INSIGHT THOUGHT PIECE ALIGNING CYBERSECURITY AND MISSION PLANNING WITH ADVANCED ANALYTICS AND HUMAN INSIGHT Brad Stone Vice President Stone_Brad@bah.com Brian Hogbin Distinguished Technologist Hogbin_Brian@bah.com

More information

Introduction to Big-Data

Introduction to Big-Data Introduction to Big-Data Ms.N.D.Sonwane 1, Mr.S.P.Taley 2 1 Assistant Professor, Computer Science & Engineering, DBACER, Maharashtra, India 2 Assistant Professor, Information Technology, DBACER, Maharashtra,

More information

Fast Innovation requires Fast IT

Fast Innovation requires Fast IT Fast Innovation requires Fast IT Cisco Data Virtualization Puneet Kumar Bhugra Business Solutions Manager 1 Challenge In Data, Big Data & Analytics Siloed, Multiple Sources Business Outcomes Business Opportunity:

More information

Virtualization. Q&A with an industry leader. Virtualization is rapidly becoming a fact of life for agency executives,

Virtualization. Q&A with an industry leader. Virtualization is rapidly becoming a fact of life for agency executives, Virtualization Q&A with an industry leader Virtualization is rapidly becoming a fact of life for agency executives, as the basis for data center consolidation and cloud computing and, increasingly, as

More information

The Value of Data Modeling for the Data-Driven Enterprise

The Value of Data Modeling for the Data-Driven Enterprise Solution Brief: erwin Data Modeler (DM) The Value of Data Modeling for the Data-Driven Enterprise Designing, documenting, standardizing and aligning any data from anywhere produces an enterprise data model

More information

Automate Transform Analyze

Automate Transform Analyze Competitive Intelligence 2.0 Turning the Web s Big Data into Big Insights Automate Transform Analyze Introduction Today, the web continues to grow at a dizzying pace. There are more than 1 billion websites

More information

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E Powering Knowledge Discovery Insights from big data with Linguamatics I2E Gain actionable insights from unstructured data The world now generates an overwhelming amount of data, most of it written in natural

More information

Understanding Managed Services

Understanding Managed Services Understanding Managed Services The buzzword relating to IT Support is Managed Services, and every day more and more businesses are jumping on the bandwagon. But what does managed services actually mean

More information

Hybrid Data Platform

Hybrid Data Platform UniConnect-Powered Data Aggregation Across Enterprise Data Warehouses and Big Data Storage Platforms A Percipient Technology White Paper Author: Ai Meun Lim Chief Product Officer Updated Aug 2017 2017,

More information

Data Mining: Approach Towards The Accuracy Using Teradata!

Data Mining: Approach Towards The Accuracy Using Teradata! Data Mining: Approach Towards The Accuracy Using Teradata! Shubhangi Pharande Department of MCA NBNSSOCS,Sinhgad Institute Simantini Nalawade Department of MCA NBNSSOCS,Sinhgad Institute Ajay Nalawade

More information

Preparing your network for the next wave of innovation

Preparing your network for the next wave of innovation Preparing your network for the next wave of innovation The future is exciting. Ready? 2 Executive brief For modern businesses, every day brings fresh challenges and opportunities. You must be able to adapt

More information

Real-Time Insights from the Source

Real-Time Insights from the Source LATENCY LATENCY LATENCY Real-Time Insights from the Source This white paper provides an overview of edge computing, and how edge analytics will impact and improve the trucking industry. What Is Edge Computing?

More information

Running Head: APPLIED KNOWLEDGE MANAGEMENT. MetaTech Consulting, Inc. White Paper

Running Head: APPLIED KNOWLEDGE MANAGEMENT. MetaTech Consulting, Inc. White Paper Applied Knowledge Management 1 Running Head: APPLIED KNOWLEDGE MANAGEMENT MetaTech Consulting, Inc. White Paper Application of Knowledge Management Constructs to the Massive Data Problem Jim Thomas July

More information

BUYING SERVER HARDWARE FOR A SCALABLE VIRTUAL INFRASTRUCTURE

BUYING SERVER HARDWARE FOR A SCALABLE VIRTUAL INFRASTRUCTURE E-Guide BUYING SERVER HARDWARE FOR A SCALABLE VIRTUAL INFRASTRUCTURE SearchServer Virtualization P art 1 of this series explores how trends in buying server hardware have been influenced by the scale-up

More information

SYMANTEC: SECURITY ADVISORY SERVICES. Symantec Security Advisory Services The World Leader in Information Security

SYMANTEC: SECURITY ADVISORY SERVICES. Symantec Security Advisory Services The World Leader in Information Security SYMANTEC: SECURITY ADVISORY SERVICES Symantec Security Advisory Services The World Leader in Information Security Knowledge, as the saying goes, is power. At Symantec we couldn t agree more. And when it

More information

INTELLIGENCE DRIVEN GRC FOR SECURITY

INTELLIGENCE DRIVEN GRC FOR SECURITY INTELLIGENCE DRIVEN GRC FOR SECURITY OVERVIEW Organizations today strive to keep their business and technology infrastructure organized, controllable, and understandable, not only to have the ability to

More information

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED PLATFORM Executive Summary Financial institutions have implemented and continue to implement many disparate applications

More information

Big Data Integration BIG DATA 9/15/2017. Business Performance

Big Data Integration BIG DATA 9/15/2017. Business Performance BIG DATA Business Performance Big Data Integration Big data is often about doing things that weren t widely possible because the technology was not advanced enough or the cost of doing so was prohibitive.

More information

Exploiting and Gaining New Insights for Big Data Analysis

Exploiting and Gaining New Insights for Big Data Analysis Exploiting and Gaining New Insights for Big Data Analysis K.Vishnu Vandana Assistant Professor, Dept. of CSE Science, Kurnool, Andhra Pradesh. S. Yunus Basha Assistant Professor, Dept.of CSE Sciences,

More information

IJESRT. Scientific Journal Impact Factor: (ISRA), Impact Factor:2.114

IJESRT. Scientific Journal Impact Factor: (ISRA), Impact Factor:2.114 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY Review of Major Challenges in Data Analysis of Big- Data Mr. Rahul R. Papalkar *, Mr. Pravin R. Nerkar, Mr Umesh V. Nikam Information

More information

Full file at

Full file at Chapter 2 Data Warehousing True-False Questions 1. A real-time, enterprise-level data warehouse combined with a strategy for its use in decision support can leverage data to provide massive financial benefits

More information

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX 1 Successful companies know that analytics are key to winning customer loyalty, optimizing business processes and beating their

More information

The Emerging Data Lake IT Strategy

The Emerging Data Lake IT Strategy The Emerging Data Lake IT Strategy An Evolving Approach for Dealing with Big Data & Changing Environments bit.ly/datalake SPEAKERS: Thomas Kelly, Practice Director Cognizant Technology Solutions Sean Martin,

More information

Shine a Light on Dark Data with Vertica Flex Tables

Shine a Light on Dark Data with Vertica Flex Tables White Paper Analytics and Big Data Shine a Light on Dark Data with Vertica Flex Tables Hidden within the dark recesses of your enterprise lurks dark data, information that exists but is forgotten, unused,

More information

SIEM: Five Requirements that Solve the Bigger Business Issues

SIEM: Five Requirements that Solve the Bigger Business Issues SIEM: Five Requirements that Solve the Bigger Business Issues After more than a decade functioning in production environments, security information and event management (SIEM) solutions are now considered

More information

Using the Network to Optimize a Virtualized Data Center

Using the Network to Optimize a Virtualized Data Center Using the Network to Optimize a Virtualized Data Center Contents Section I: Introduction The Rise of Virtual Computing. 1 Section II: The Role of the Network. 3 Section III: Network Requirements of the

More information

IT1105 Information Systems and Technology. BIT 1 ST YEAR SEMESTER 1 University of Colombo School of Computing. Student Manual

IT1105 Information Systems and Technology. BIT 1 ST YEAR SEMESTER 1 University of Colombo School of Computing. Student Manual IT1105 Information Systems and Technology BIT 1 ST YEAR SEMESTER 1 University of Colombo School of Computing Student Manual Lesson 3: Organizing Data and Information (6 Hrs) Instructional Objectives Students

More information

New Approach to Unstructured Data

New Approach to Unstructured Data Innovations in All-Flash Storage Deliver a New Approach to Unstructured Data Table of Contents Developing a new approach to unstructured data...2 Designing a new storage architecture...2 Understanding

More information

Improving the ROI of Your Data Warehouse

Improving the ROI of Your Data Warehouse Improving the ROI of Your Data Warehouse Many organizations are struggling with a straightforward but challenging problem: their data warehouse can t affordably house all of their data and simultaneously

More information

BI Moves Operational - The Case for High-Performance Aggregation Infrastructure

BI Moves Operational - The Case for High-Performance Aggregation Infrastructure WHITE PAPER BI Moves Operational - The Case for High-Performance Aggregation Infrastructure MARCH 2005 This white paper will detail the requirements for operational business intelligence, and will show

More information

Continuous Processing versus Oracle RAC: An Analyst s Review

Continuous Processing versus Oracle RAC: An Analyst s Review Continuous Processing versus Oracle RAC: An Analyst s Review EXECUTIVE SUMMARY By Dan Kusnetzky, Distinguished Analyst Most organizations have become so totally reliant on information technology solutions

More information

SIEM Solutions from McAfee

SIEM Solutions from McAfee SIEM Solutions from McAfee Monitor. Prioritize. Investigate. Respond. Today s security information and event management (SIEM) solutions need to be able to identify and defend against attacks within an

More information

ELTMaestro for Spark: Data integration on clusters

ELTMaestro for Spark: Data integration on clusters Introduction Spark represents an important milestone in the effort to make computing on clusters practical and generally available. Hadoop / MapReduce, introduced the early 2000s, allows clusters to be

More information

Best practices in IT security co-management

Best practices in IT security co-management Best practices in IT security co-management How to leverage a meaningful security partnership to advance business goals Whitepaper Make Security Possible Table of Contents The rise of co-management...3

More information

Finding a needle in Haystack: Facebook's photo storage

Finding a needle in Haystack: Facebook's photo storage Finding a needle in Haystack: Facebook's photo storage The paper is written at facebook and describes a object storage system called Haystack. Since facebook processes a lot of photos (20 petabytes total,

More information

CLEARING THE PATH: PREVENTING THE BLOCKS TO CYBERSECURITY IN BUSINESS

CLEARING THE PATH: PREVENTING THE BLOCKS TO CYBERSECURITY IN BUSINESS CLEARING THE PATH: PREVENTING THE BLOCKS TO CYBERSECURITY IN BUSINESS Introduction The world of cybersecurity is changing. As all aspects of our lives become increasingly connected, businesses have made

More information

When, Where & Why to Use NoSQL?

When, Where & Why to Use NoSQL? When, Where & Why to Use NoSQL? 1 Big data is becoming a big challenge for enterprises. Many organizations have built environments for transactional data with Relational Database Management Systems (RDBMS),

More information

Big Data - Some Words BIG DATA 8/31/2017. Introduction

Big Data - Some Words BIG DATA 8/31/2017. Introduction BIG DATA Introduction Big Data - Some Words Connectivity Social Medias Share information Interactivity People Business Data Data mining Text mining Business Intelligence 1 What is Big Data Big Data means

More information

THE FUTURE OF BUSINESS DEPENDS ON SOFTWARE DEFINED STORAGE (SDS)

THE FUTURE OF BUSINESS DEPENDS ON SOFTWARE DEFINED STORAGE (SDS) THE FUTURE OF BUSINESS DEPENDS ON SOFTWARE DEFINED STORAGE (SDS) How SSDs can fit into and accelerate an SDS strategy SPONSORED BY TABLE OF CONTENTS Introduction 3 An Overview of SDS 4 Achieving the Goals

More information

The Hadoop Paradigm & the Need for Dataset Management

The Hadoop Paradigm & the Need for Dataset Management The Hadoop Paradigm & the Need for Dataset Management 1. Hadoop Adoption Hadoop is being adopted rapidly by many different types of enterprises and government entities and it is an extraordinarily complex

More information

Hierarchy of knowledge BIG DATA 9/7/2017. Architecture

Hierarchy of knowledge BIG DATA 9/7/2017. Architecture BIG DATA Architecture Hierarchy of knowledge Data: Element (fact, figure, etc.) which is basic information that can be to be based on decisions, reasoning, research and which is treated by the human or

More information

The data quality trends report

The data quality trends report Report The 2015 email data quality trends report How organizations today are managing and using email Table of contents: Summary...1 Research methodology...1 Key findings...2 Email collection and database

More information

The Seven Steps to Implement DataOps

The Seven Steps to Implement DataOps The Seven Steps to Implement Ops ABSTRACT analytics teams challenged by inflexibility and poor quality have found that Ops can address these and many other obstacles. Ops includes tools and process improvements

More information

APPLYING THE POWER OF AI TO YOUR VIDEO PRODUCTION STORAGE

APPLYING THE POWER OF AI TO YOUR VIDEO PRODUCTION STORAGE APPLYING THE POWER OF AI TO YOUR VIDEO PRODUCTION STORAGE FINDING WHAT YOU NEED IN YOUR IN-HOUSE VIDEO STORAGE SECTION 1 You need ways to generate metadata for stored videos without time-consuming manual

More information

Accelerate your SAS analytics to take the gold

Accelerate your SAS analytics to take the gold Accelerate your SAS analytics to take the gold A White Paper by Fuzzy Logix Whatever the nature of your business s analytics environment we are sure you are under increasing pressure to deliver more: more

More information

Data Warehousing. Ritham Vashisht, Sukhdeep Kaur and Shobti Saini

Data Warehousing. Ritham Vashisht, Sukhdeep Kaur and Shobti Saini Advance in Electronic and Electric Engineering. ISSN 2231-1297, Volume 3, Number 6 (2013), pp. 669-674 Research India Publications http://www.ripublication.com/aeee.htm Data Warehousing Ritham Vashisht,

More information

ACCELERATE YOUR ANALYTICS GAME WITH ORACLE SOLUTIONS ON PURE STORAGE

ACCELERATE YOUR ANALYTICS GAME WITH ORACLE SOLUTIONS ON PURE STORAGE ACCELERATE YOUR ANALYTICS GAME WITH ORACLE SOLUTIONS ON PURE STORAGE An innovative storage solution from Pure Storage can help you get the most business value from all of your data THE SINGLE MOST IMPORTANT

More information

MAPR DATA GOVERNANCE WITHOUT COMPROMISE

MAPR DATA GOVERNANCE WITHOUT COMPROMISE MAPR TECHNOLOGIES, INC. WHITE PAPER JANUARY 2018 MAPR DATA GOVERNANCE TABLE OF CONTENTS EXECUTIVE SUMMARY 3 BACKGROUND 4 MAPR DATA GOVERNANCE 5 CONCLUSION 7 EXECUTIVE SUMMARY The MapR DataOps Governance

More information

HOW WELL DO YOU KNOW YOUR IT NETWORK? BRIEFING DOCUMENT

HOW WELL DO YOU KNOW YOUR IT NETWORK? BRIEFING DOCUMENT HOW WELL DO YOU KNOW YOUR IT NETWORK? BRIEFING DOCUMENT ARE YOU REALLY READY TO EXECUTE A GLOBAL IOT STRATEGY? Increased demand driven by long-term trends of the Internet of Things, WLAN, connected LED

More information

Overview of Web Mining Techniques and its Application towards Web

Overview of Web Mining Techniques and its Application towards Web Overview of Web Mining Techniques and its Application towards Web *Prof.Pooja Mehta Abstract The World Wide Web (WWW) acts as an interactive and popular way to transfer information. Due to the enormous

More information

Investing in a Better Storage Environment:

Investing in a Better Storage Environment: Investing in a Better Storage Environment: Best Practices for the Public Sector Investing in a Better Storage Environment 2 EXECUTIVE SUMMARY The public sector faces numerous and known challenges that

More information

The Future of Business Depends on Software Defined Storage (SDS) How SSDs can fit into and accelerate an SDS strategy

The Future of Business Depends on Software Defined Storage (SDS) How SSDs can fit into and accelerate an SDS strategy The Future of Business Depends on Software Defined Storage (SDS) Table of contents Introduction 2 An Overview of SDS 3 Achieving the Goals of SDS Hinges on Smart Hardware Decisions 5 Assessing the Role

More information

Massive Data Analysis

Massive Data Analysis Professor, Department of Electrical and Computer Engineering Tennessee Technological University February 25, 2015 Big Data This talk is based on the report [1]. The growth of big data is changing that

More information

The #1 Key to Removing the Chaos. in Modern Analytical Environments

The #1 Key to Removing the Chaos. in Modern Analytical Environments October/2018 Advanced Data Lineage: The #1 Key to Removing the Chaos in Modern Analytical Environments Claudia Imhoff, Ph.D. Sponsored By: Table of Contents Executive Summary... 1 Data Lineage Introduction...

More information

Xcelerated Business Insights (xbi): Going beyond business intelligence to drive information value

Xcelerated Business Insights (xbi): Going beyond business intelligence to drive information value KNOWLEDGENT INSIGHTS volume 1 no. 5 October 7, 2011 Xcelerated Business Insights (xbi): Going beyond business intelligence to drive information value Today s growing commercial, operational and regulatory

More information

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining. About the Tutorial Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts

More information

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data Oracle Big Data SQL Release 3.2 The unprecedented explosion in data that can be made useful to enterprises from the Internet of Things, to the social streams of global customer bases has created a tremendous

More information

Evolution of the ICT Field: New Requirements to Specialists

Evolution of the ICT Field: New Requirements to Specialists Lumen 2/2017 ARTICLE Evolution of the ICT Field: New Requirements to Specialists Vladimir Ryabov, PhD, Principal Lecturer, School of Business and Culture, Lapland UAS Tuomo Lindholm, MBA, Senior Lecturer,

More information

Massive Scalability With InterSystems IRIS Data Platform

Massive Scalability With InterSystems IRIS Data Platform Massive Scalability With InterSystems IRIS Data Platform Introduction Faced with the enormous and ever-growing amounts of data being generated in the world today, software architects need to pay special

More information

78% of CIOs can t guarantee application performance 1. What s holding your apps back?

78% of CIOs can t guarantee application performance 1. What s holding your apps back? INSIGHTS PAPER What s holding your apps back? 5 culprits behind sluggish network performance and what you can do to stop them Whether they re your customers or employees, today s end-users demand applications

More information

TDWI Data Modeling. Data Analysis and Design for BI and Data Warehousing Systems

TDWI Data Modeling. Data Analysis and Design for BI and Data Warehousing Systems Data Analysis and Design for BI and Data Warehousing Systems Previews of TDWI course books offer an opportunity to see the quality of our material and help you to select the courses that best fit your

More information

Chapter 6 VIDEO CASES

Chapter 6 VIDEO CASES Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

More information

QLIKVIEW ARCHITECTURAL OVERVIEW

QLIKVIEW ARCHITECTURAL OVERVIEW QLIKVIEW ARCHITECTURAL OVERVIEW A QlikView Technology White Paper Published: October, 2010 qlikview.com Table of Contents Making Sense of the QlikView Platform 3 Most BI Software Is Built on Old Technology

More information

IBM Software IBM InfoSphere Information Server for Data Quality

IBM Software IBM InfoSphere Information Server for Data Quality IBM InfoSphere Information Server for Data Quality A component index Table of contents 3 6 9 9 InfoSphere QualityStage 10 InfoSphere Information Analyzer 12 InfoSphere Discovery 13 14 2 Do you have confidence

More information

10 Steps to Building an Architecture for Space Surveillance Projects. Eric A. Barnhart, M.S.

10 Steps to Building an Architecture for Space Surveillance Projects. Eric A. Barnhart, M.S. 10 Steps to Building an Architecture for Space Surveillance Projects Eric A. Barnhart, M.S. Eric.Barnhart@harris.com Howard D. Gans, Ph.D. Howard.Gans@harris.com Harris Corporation, Space and Intelligence

More information

Networking for a dynamic infrastructure: getting it right.

Networking for a dynamic infrastructure: getting it right. IBM Global Technology Services Networking for a dynamic infrastructure: getting it right. A guide for realizing the full potential of virtualization June 2009 Executive summary June 2009 Networking for

More information

Solving the Enterprise Data Dilemma

Solving the Enterprise Data Dilemma Solving the Enterprise Data Dilemma Harmonizing Data Management and Data Governance to Accelerate Actionable Insights Learn More at erwin.com Is Our Company Realizing Value from Our Data? If your business

More information

An Overview of Smart Sustainable Cities and the Role of Information and Communication Technologies (ICTs)

An Overview of Smart Sustainable Cities and the Role of Information and Communication Technologies (ICTs) An Overview of Smart Sustainable Cities and the Role of Information and Communication Technologies (ICTs) Sekhar KONDEPUDI Ph.D. Vice Chair FG-SSC & Coordinator Working Group 1 ICT role and roadmap for

More information

WHAT CIOs NEED TO KNOW TO CAPITALIZE ON HYBRID CLOUD

WHAT CIOs NEED TO KNOW TO CAPITALIZE ON HYBRID CLOUD WHAT CIOs NEED TO KNOW TO CAPITALIZE ON HYBRID CLOUD 2 A CONVERSATION WITH DAVID GOULDEN Hybrid clouds are rapidly coming of age as the platforms for managing the extended computing environments of innovative

More information

Partner Presentation Faster and Smarter Data Warehouses with Oracle OLAP 11g

Partner Presentation Faster and Smarter Data Warehouses with Oracle OLAP 11g Partner Presentation Faster and Smarter Data Warehouses with Oracle OLAP 11g Vlamis Software Solutions, Inc. Founded in 1992 in Kansas City, Missouri Oracle Partner and reseller since 1995 Specializes

More information

Traditional Security Solutions Have Reached Their Limit

Traditional Security Solutions Have Reached Their Limit Traditional Security Solutions Have Reached Their Limit CHALLENGE #1 They are reactive They force you to deal only with symptoms, rather than root causes. CHALLENGE #2 256 DAYS TO IDENTIFY A BREACH TRADITIONAL

More information

Modernizing Healthcare IT for the Data-driven Cognitive Era Storage and Software-Defined Infrastructure

Modernizing Healthcare IT for the Data-driven Cognitive Era Storage and Software-Defined Infrastructure Modernizing Healthcare IT for the Data-driven Cognitive Era Storage and Software-Defined Infrastructure An IDC InfoBrief, Sponsored by IBM April 2018 Executive Summary Today s healthcare organizations

More information

Tips for Effective Patch Management. A Wanstor Guide

Tips for Effective Patch Management. A Wanstor Guide Tips for Effective Patch Management A Wanstor Guide 1 Contents + INTRODUCTION + UNDERSTAND YOUR NETWORK + ASSESS THE PATCH STATUS + TRY USING A SINGLE SOURCE FOR PATCHES + MAKE SURE YOU CAN ROLL BACK +

More information

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed?

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed? Simple to start What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed? What is the maximum download speed you get? Simple computation

More information

Big Data The end of Data Warehousing?

Big Data The end of Data Warehousing? Big Data The end of Data Warehousing? Hermann Bär Oracle USA Redwood Shores, CA Schlüsselworte Big data, data warehousing, advanced analytics, Hadoop, unstructured data Introduction If there was an Unwort

More information

Why the Threat of Downtime Should Be Keeping You Up at Night

Why the Threat of Downtime Should Be Keeping You Up at Night Why the Threat of Downtime Should Be Keeping You Up at Night White Paper 2 Your Plan B Just Isn t Good Enough. Learn Why and What to Do About It. Server downtime is an issue that many organizations struggle

More information

Transforming Security from Defense in Depth to Comprehensive Security Assurance

Transforming Security from Defense in Depth to Comprehensive Security Assurance Transforming Security from Defense in Depth to Comprehensive Security Assurance February 28, 2016 Revision #3 Table of Contents Introduction... 3 The problem: defense in depth is not working... 3 The new

More information

DATA MINING II - 1DL460

DATA MINING II - 1DL460 DATA MINING II - 1DL460 Spring 2016 A second course in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt16 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,

More information

SOLUTION BRIEF RSA ARCHER IT & SECURITY RISK MANAGEMENT

SOLUTION BRIEF RSA ARCHER IT & SECURITY RISK MANAGEMENT RSA ARCHER IT & SECURITY RISK MANAGEMENT INTRODUCTION Organizations battle growing security challenges by building layer upon layer of defenses: firewalls, antivirus, intrusion prevention systems, intrusion

More information

ETL is No Longer King, Long Live SDD

ETL is No Longer King, Long Live SDD ETL is No Longer King, Long Live SDD How to Close the Loop from Discovery to Information () to Insights (Analytics) to Outcomes (Business Processes) A presentation by Brian McCalley of DXC Technology,

More information

OUTSMART ADVANCED CYBER ATTACKS WITH AN INTELLIGENCE-DRIVEN SECURITY OPERATIONS CENTER

OUTSMART ADVANCED CYBER ATTACKS WITH AN INTELLIGENCE-DRIVEN SECURITY OPERATIONS CENTER OUTSMART ADVANCED CYBER ATTACKS WITH AN INTELLIGENCE-DRIVEN SECURITY OPERATIONS CENTER HOW TO ADDRESS GARTNER S FIVE CHARACTERISTICS OF AN INTELLIGENCE-DRIVEN SECURITY OPERATIONS CENTER 1 POWERING ACTIONABLE

More information

Chapter 3. Foundations of Business Intelligence: Databases and Information Management

Chapter 3. Foundations of Business Intelligence: Databases and Information Management Chapter 3 Foundations of Business Intelligence: Databases and Information Management THE DATA HIERARCHY TRADITIONAL FILE PROCESSING Organizing Data in a Traditional File Environment Problems with the traditional

More information

The 7 Habits of Highly Effective API and Service Management

The 7 Habits of Highly Effective API and Service Management 7 Habits of Highly Effective API and Service Management: Introduction The 7 Habits of Highly Effective API and Service Management... A New Enterprise challenge has emerged. With the number of APIs growing

More information

Healthcare IT Optimization: 6 Mistakes to Avoid Along the Way

Healthcare IT Optimization: 6 Mistakes to Avoid Along the Way Healthcare IT Optimization: 6 Mistakes to Avoid Along the Way Healthcare IT: Transforming for tomorrow s needs Healthcare organizations face a sea of change in what will soon be required of them. Great

More information

The security challenge in a mobile world

The security challenge in a mobile world The security challenge in a mobile world Contents Executive summary 2 Executive summary 3 Controlling devices and data from the cloud 4 Managing mobile devices - Overview - How it works with MDM - Scenario

More information

Evolution For Enterprises In A Cloud World

Evolution For Enterprises In A Cloud World Evolution For Enterprises In A Cloud World Foreword Cloud is no longer an unseen, futuristic technology that proves unattainable for enterprises. Rather, it s become the norm; a necessity for realizing

More information

data-based banking customer analytics

data-based banking customer analytics icare: A framework for big data-based banking customer analytics Authors: N.Sun, J.G. Morris, J. Xu, X.Zhu, M. Xie Presented By: Hardik Sahi Overview 1. 2. 3. 4. 5. 6. Why Big Data? Traditional versus

More information

Informatica Enterprise Information Catalog

Informatica Enterprise Information Catalog Data Sheet Informatica Enterprise Information Catalog Benefits Automatically catalog and classify all types of data across the enterprise using an AI-powered catalog Identify domains and entities with

More information

Using Threat Analytics to Protect Privileged Access and Prevent Breaches

Using Threat Analytics to Protect Privileged Access and Prevent Breaches Using Threat Analytics to Protect Privileged Access and Prevent Breaches Under Attack Protecting privileged access and preventing breaches remains an urgent concern for companies of all sizes. Attackers

More information

Fundamental Shift: A LOOK INSIDE THE RISING ROLE OF IT IN PHYSICAL ACCESS CONTROL

Fundamental Shift: A LOOK INSIDE THE RISING ROLE OF IT IN PHYSICAL ACCESS CONTROL Fundamental Shift: A LOOK INSIDE THE RISING ROLE OF IT IN PHYSICAL ACCESS CONTROL Shifting budgets and responsibilities require IT and physical security teams to consider fundamental change in day-to-day

More information

12 Minute Guide to Archival Search

12 Minute Guide to  Archival Search X1 Technologies, Inc. 130 W. Union Street Pasadena, CA 91103 phone: 626.585.6900 fax: 626.535.2701 www.x1.com June 2008 Foreword Too many whitepapers spend too much time building up to the meat of the

More information

The Business Value of Metadata for Data Governance: The Challenge of Integrating Packaged Applications

The Business Value of Metadata for Data Governance: The Challenge of Integrating Packaged Applications The Business Value of Metadata for Data Governance: The Challenge of Integrating Packaged Applications By Donna Burbank Managing Director, Global Data Strategy, Ltd www.globaldatastrategy.com Sponsored

More information