Crawling Rich Internet Applications

Size: px
Start display at page:

Download "Crawling Rich Internet Applications"

Transcription

1 Crawling Rich Internet Applications Gregor v. Bochmann (in collaboration with the SSRG group) University of Ottawa Canada Oldenburg, den 16 Dezember 2013 Qiao

2 Overview Background The evolving web Why crawling Our research project Web Crawling Traditional web crawling RIA crawling Performance objectives, assumptions Crawling strategies Breadth-first - Depth-first - Greedy Model-based strategies (Hypercube - Menu) Probabilistic strategy Component-based crawling Distributed crawling Different architectures Experimental results Conclusions 2

3 Web Crawling is Exploring Web Applications automatically Discovering the pages of a Web application Emulating the user behaviour to retrieve states of a web application. Web Crawling is as old as the web itself! From the early times of the web, matching the expansion of the web has been a challenge 3

4 Traditional Web The evolving Web static HTML pages stored as separate files, identified by a URL Deep Web Server application accesses a database, user fills request forms HTML pages dynamically created by server, identified by URL including request parameters Rich Internet Applications (RIA Web-2 ) pages contain executable code (e.g. JavaScript, Silverlight, Adobe Flex...); executed in response to user interactions, or timeouts (so-called events); script may change displayed page (the state of the application changes) with same URL. AJAX: script may interact asynchronously with the server to update the page 4

5 Example of a traditional web application Show my web site ( ) Simplified model of the web site Bochmann publications Pub publications hobbies research group DSRG Hobbies Gregor von Bochmann Painter B page with URL link (event) 5

6 6 RIA examples TestRIA, AltroMutual

7 7 RIA example - Clipmarks

8 8 RIA example Google Mail

9 The Graph Model a web application Graph model: Web page (client state of the application) node is encoded in HTML called DOM Event (click, mouse-over, etc.) edge An event triggers a transition between states Bochmann publications Pub publications hobbies research group DSRG Hobbies Gregor von Bochmann Painter B 9 page with URL link (event)

10 RIA vs. Traditional Web (Web-1) Graph model: Web-1 RIA Web page (state) : has URL few pages have a URL Event includes next URL code execution Bochmann Bochmann publications hobbies Hobbies publications hobbies Hobbies Pub publications research group DSRG Gregor von Bochmann Painter B Pub publications research group DSRG Gregor von Bochmann Painter B 10 page with URL link (event) State (no URL)

11 Why crawling Objective A: find all (or all important ) pages for content indexing for search engines for security testing and vulnerability assessment for accessibility testing Objective B: find all links between pages for ranking pages, e.g. Google ranking in search queries for automated testing and model checking of the web application for assuring that all pages have been found 11

12 Software Security Research Group (SSRG), University of Ottawa in collaboration with IBM Software Security Research Group (SSRG), University of Ottawa In collaboration with IBM University of Ottawa IBM R&D (Ottawa) Prof. Guy-Vincent Jourdan Prof. Gregor v. Bochmann -- Iosif Viorel Onut (PhD) Suryakant Choudhary (Master student) -- AppScan product team Emre Dincturk (PhD student) Khaled Ben Hafaiedh (PhD student) Seyed M. Mir Taheri (PhD student) Ali Moosavi (Master student) 12

13 View detailed security issues reports Security Issues Identified with Static Analysis (white-box view) Security Issues Identified with Dynamic Analysis (black-box view) Aggregated and correlated results Remediation Tasks Security Risk Assessment 13

14 Overview Background The evolving web Why crawling Our research project Web Crawling Traditional web crawling RIA crawling Performance objectives and assumptions Crawling strategies Breadth-first, Depth-first, Greedy Model-based strategies (Hypercube - Menu) Probabilistic strategy Component-based crawling Distributed crawling Different architectures Experimental results Conclusions 14

15 Traditional Web Crawling HTML page is a tree data structure, called DOM. It includes information about display by the browser events that can be activated by the user (for instance, clicking on certain displayed fields); for each event URL to be requested from the server through an HTTP Request (link to next page) The page returned by the server for a given URL, in general, depends on the server state and the values of cookies The displayed page is identified by its URL if we ignore server state and cookies 15

16 Traditional web crawling algorithm Given: an initial seed URL a domain (or list of domains) defining the limit of the web space to be explored Crawler variables (of type set of URLs ): exploredurls = empty unexploredurls = {seedurl} Algorithm While unexploredurls is not empty do Take a URL from unexploredurls, add it to exploredurls, request it from the server, analyse the returned page (according to the purpose of the crawl), extract the links in the page and add the corresponding URLs (if they are new, and if they are in the domain) to unexploredurls 16

17 RIA Crawling Difference from traditional web Most pages have no URL and therefore are not directly accessible When an event triggers the execution of a script, the script may change the DOM structure which may lead to a new display and a new set of enabled events that is a new state of the application. Crawling means: finding all URLs that are part of the application, plus for each URL, find all states reached (from the seed URL) by executing any sequence of events Important note: only the seed states are directly accessible by a URL publications Pub publications Bochmann hobbies research group DSRG Hobbies Gregor von Bochmann Painter B 17 State (no URL)

18 18 Difficulties for crawling RIA State identification A state can not be identified by a URL. Instead, we consider that the state is identified by the current DOM in the browser. Most links (events) do not contain a URL An event included in the DOM may not explicitly identify the next state reached when this event is executed. To determine the state reached by such an event, we have to execute that event. In traditional crawling, the event (link) contains the URL - identification of the next state reached Accessibility of states Most states are not directly accessible (no URL) only through seed URL and a sequence of events (and intermediate states)

19 Important consequence For a complete crawl (a crawl that ensures that all states of the application are found), the crawler has to execute all events in all states of the application since for any of these events, we do not know, a priory, whether its execution in the current state will lead to a new state or not. Note: In the case of traditional web crawling, it is not necessary to execute all events on all pages; it is sufficient to extract the URLs from these events, and get the page for each URL only once. 19

20 The links publication in the pages Bochmann and DSRG have the same URL The page Pub will be retrieved only once. Example The events publication in the pages Bochmann and DSRG have no URL Both events publication must be executed, and the crawler finds out that they both lead to the same client state. Bochmann Bochmann publications hobbies Hobbies publications hobbies Hobbies Pub publications research group DSRG Gregor von Bochmann Painter B Pub publications research group DSRG Gregor von Bochmann Painter B 20

21 AJAX: asynchronous interactions with the server We ignore the intermediate states in our current work, by simply waiting that a new stable state is reach after each user input 21

22 RIA: Need for DOM equivalence A given page often contains information that changes frequently, e.g. advertizing, time of the day information. This information is usually of no importance for the purpose of crawling. In the traditional web, the page identification (i.e. the URL) does not change when this information changes. In RIA, states are identified by their DOM. Therefore similar states with different advertizing would be identified as different states (which leads to a too large state space). We would like to have a state identifier that is independent of the unimportant changing information. We introduce a DOM equivalence, and all states with equivalent DOMs have the same identifier. 22

23 DOM equivalence The DOM equivalence depends on the purpose of the crawl. In the case of security testing, we are not interested in the textual content of the DOM, however, this is important for content indexing. The DOM equivalence relation is realized by a DOM reduction algorithm which produces (from a given DOM) a reduced canonical representation of the information that is considered relevant for the crawl. If the reduced DOMs obtained from two given DOMs are the same, then the given DOMs are considered equivalent, that is, they represent the same application state (for this purpose of crawl). 23

24 Form of the state identifiers The reduced DOM could be used as state identifier. however, it is quite voluminous we have to store the application model in memory during its exploration, each edge in the graph contains the identifiers of the current and next states. This is necessary to check whether a state obtained after the execution of some event is a new state or a known one Condensed state identifier: A hash of the reduced DOM The crawler also stores for each state the list of events included in the DOM, and whether they are executed or not used to select the next event to be executed during the crawl 24

25 Performance objectives Execution speed: How many events (state transitions) can be executed per hour? Complete crawl: Given enough time, the strategy terminates the crawl when all states of the application have been found. Efficiency of finding states - finding states fast : If the crawl is terminated by the user before a complete crawl is attained, the number of discovered state should be as large as possible. For many applications, a complete crawl cannot be obtained within a reasonable length of time. Therefore the third objective is very important. 25

26 Our working assumptions Deterministic RIA : the crawled RIA is deterministic from the point of view of the client (e.g. no dependence on updated database content) Given user input : we are provided a set of user inputs for text fields and build the model that corresponds to these inputs Reliable reset : we can reliably reset the system by reloading the seed URL (thus the graph is strongly connected) 26

27 Overview Background The evolving web Why crawling Our research project Web Crawling Traditional web crawling RIA crawling Performance objectives Crawling strategies Breadth-first, Depth-first, Greedy Model-based strategies (Hypercube - Menu) Probabilistic strategy Component-based crawling Distributed crawling Different architectures Experimental results Conclusions 27

28 Crawling Strategies Most work on crawling RIA do not intend to build a complete model of the application. Some consider standard strategies for the exploration of the graph model, such as Depth- First and Breadth-First. We have developed more efficient strategies based on the assumed structure of the application ( model-based strategies, see below) 28

29 Example of crawling sequence Depth-first strategy geturl(bochmann); analysedom; execute(publications) and find new state Pub; analysedom; - go back (reset) - geturl(bochmann); execute(research group) and find new state DSRG; analysedom; execute(publications) and find known state Pub; - go back (reset) - geturl(bochmann); execute(hobbies) and find new state Hobbies; analysedom and find new URL PainterB; geturl(painterb); analysedom; etc. Bochmann publications Pub publications hobbies research group DSRG Hobbies Gregor von Bochmann Painter B Such a systematic approach will execute all events and eventually find all states. 29

30 30 Resets Each time there is a go back in the crawling sequence, the crawler has to go back to a seed-url (which takes more time than executing an event) and possibly execute several events in order to reach the desired state. For instance, in the Breadth-First strategy, the crawler has to later go back to the state DSRG in order to execute the event publications Resets are much more expensive (in terms of execution times) than event executions The number of resets should be minimized. publications Pub publications Bochmann hobbies research group DSRG Hobbies Gregor von Bochmann Painter B

31 Disadvantages of standard strategies Breadth-First: No long sequences of event executions Very many Resets Depth-First: Advantage: has long sequences of event executions Disadvantage: when reaching a known state, the strategy takes a path back to a specific previous state for further event exploration. This path through known edges is often long and may involve a Reset (overhead) going back to another state with nonexecuted events may be much more efficient. 31

32 Greedy and model-based crawling The Greedy strategy Forward exploration until a state with no unexecuted events is encountered then find closest state with an unexecuted event, and continue Model-based crawling Meta-model: assumed structure of the application Crawling strategy is optimized for the case that the application follows these assumptions Crawling strategy must be able to adapt to applications that do not satisfy the meta-model 32

33 Model-based crawling: Two phases 33 State exploration phase finding all states assuming that the application follows the assumptions of the meta-model Transition exploration phase executing all remaining events in all known states (that have not been executed during the state exploration phase) Order of execution First state exploration; then transition exploration Adaptation: If new states are discovered during transition exploration phase, go back to state exploration phase, etc.

34 Comparing efficiency of finding states ```` Cost (number of event executions + reset cost) Note: log scale Number of states discovered This is for a specific application Total: 129 such comparisons should be done for many different types of applications Note: Hypercube gives similar results to Greedy

35 Comparing efficiency of exploring all edges Cost (number of event executions + reset cost) Number of edges explored Total: 10364

36 Model-based crawling: Hypercube Hypercube The state reached by a sequence of events from the initial state is independent of the order of the events. The enabled events at a state are those at the initial state minus those executed to reach that state. ++ : One can find optimal paths for state and transition exploration phases -- : very few applications follow the hypercube model 36 Example: 4-dim. Hypercube

37 Model-based crawling: Menu model Example web site: Ikebana-Ottawa ( ikebanaottawa.ca ) Hypothesis: There are three types of events: Menu events: The next state obtained is independent of the state where the event is executed Normal events: Next state depends on current page Self-loop events: Next state is equal to current state Crawling strategy Explore Normal events before Menu events, because menu events do not find any new states To classify the events, they must be executed from two different states

38 Menu strategy: state exploration From the current state, choose the next event according to the following event priority 1. Unclassified events not yet executed 2. Unclassified events once executed from a different state 3. Normal events 4. Menu events (we do not expect to find a new state) 5. Self-loop events (we do not expect to find a new state) If all events have already be executed on the current page: find a short path to a page with an event of high priority 38

39 Menu model: finding a path to next event Find path on current application model, based on executed edges predicted edges: Locally nonexecuted, but globally executed once are predicted to be of type menu Predicted edges Executed edges 39

40 Probability strategy This is a variation of the Greedy strategy. Inspired by the Menu strategy, we introduce event priorities. The priority of an event is based on statistical observations (during the crawl of the application) about the number of new states discovered when executing the given event. The strategy is based on the belief that an event which was often observed to lead to new states in the past will be more likely to lead to new states in the future. 40

41 Probability strategy: event priorities 41 Priority of events from current state: Probability of a given event e finding a new state from the current state is P(e) = ( S(e) + p S ) / ( N(e) + p N ) S : number of states found by e - N : number of times executed This is a Bayesian formula, with p S = 1 and p N = 2 gives initial probability = 0.5 If current state s has no non-executed event: Find a locally non-executed event e from some nearby state s such that P(e) is high and the path from s to s is short Note: the path from s to s is through events already executed How to find an optimal combination of high-priority event on a nearby state is described in our paper at ICWE 2012

42 Experiments We did experiments with the different crawling strategies using the following web sites: Periodic table (Local version: Clipmarks (Local version: TestRIA ( ) Altoro Mutual ( ) 42

43 43 Results: State exploration

44 Results: Transition exploration Cost for a complete crawl Cost = number of event executions + R * number of resets R = 18 for the Clipmarks web site 44

45 Component-based crawling In many web sites, the number of pages is immense because of different ordering of elements or combinations of several components : a complete crawl is not feasible Revised coverage criteria: Cover all components of pages in the application (but not all combinations or ordering of these components) Assumption: Components are independent of one another. 45

46 46 Examples of components

47 47 Assumed structure of a page

48 48 Example: The Bebop application

49 49 Performance

50 Scalability Execution time of crawl as a function of items stored in the application As expected: normal crawling has exponential complexity Component-based crawl appears to have quadratic complexity 50

51 Overview Background The evolving web Why crawling Our research project Web Crawling Traditional web crawling RIA crawling Performance objectives Crawling strategies Breadth-first, Depth-first, Greedy Model-based strategies (Hypercube - Menu) Probabilistic strategy Component-based crawling Distributed crawling Different architectures Experimental results Conclusions 51

52 Distributed crawling Observation: On average, event execution and analysis of the next state discovered takes about 20 times more time than deciding on the next event to be executed. Question: Can the crawling of a complex application be accelerated by distributing the crawling over several computers / cores? 52

53 53 Different distributed architectures 1. Central coordinator keeps information about the discovered application model 1. Each crawler contacts the Coordinator after each execution of an event and obtains the next event to be executed (coordinator performs the crawling strategy) dynamic event allocation to crawlers 2. Static event allocation to crawlers (crawlers obtain application model from coordinator and perform crawling strategy locally, only for allocated events) 2. Several coordinators share the information about the application model A distributed hash table is used to allocate the states of the model to the different coordinators Each coordinator is associated with approximately 20 crawlers Coordinators perform the crawling strategy, but using partial model information different sharing schemes can be envisioned for exchanging information between the coordinators

54 Notes: The BF strategy has bad performance, but has the advantage that only the states of the model must be shared with the crawlers (not the transitions). One sees the expected decrease in crawling time The delay due to the coordinator is negligible, even for 15 crawlers The static allocation of events leads to unequal loads dynamic load sharing among crawlers may be useful 54 Experimental results (architecture 1.2 BF strategy)

55 Experimental results (architecture 1.1 Greedy strategy) Notes: The greedy strategy has good performance. In this architecture, the model information is not shared with the crawlers. Again, one sees the expected decrease in crawling time 55

56 Performance depends on sharing scheme In case that there is no unexecuted event from the current state, the coordinator has to find another state with an unexecuted event Reset-only: Use reset to reach a different state Local Knowledge: Find shortest path (SP) to new state based on local knowledge of the application model Shared Knowledge: Use SP based on knowledge sharing, piggy-backed on other messages Forward Exploration: A distributed algorithm for finding SP 56 Simulation results (architecture 2 Greedy strategy) Notes: Fixed number of crawlers, varying number of Coordinators (overload ignored)

57 Overview Background The evolving web Why crawling Our research project Web Crawling Traditional web crawling RIA crawling Performance objectives Crawling strategies Breadth-first, Depth-first, Greedy Model-based strategies (Hypercube - Menu) Probabilistic strategy Component-based crawling Distributed crawling Different architectures Experimental results Conclusions 57

58 58 Conclusions RIA crawling is quite different from traditional web crawling Different crawling strategies can improve the efficiency of crawling The crawling of a RIA can be effectively distributed over several crawling engines We have developed prototypes of our crawling strategies, integrated with the IBM AppScan product

59 References Background; MESBAH, A., DEURSEN, A.V. AND LENSELINK, S., Crawling Ajax-based Web Applications through Dynamic Analysis of User Interface State Changes. ACM Transactions on the Web (TWEB), 6(1), a23. Our Papers: Seyed M. Mirtaheri, Mustafa Emre Dincturk, Salman Hooshmand, Bochmann, G.v., Jourdan, G.-V. and Onut, I.V., A Brief History of Web Crawlers, in Proceedings of the CASCON 2013, November pages Seyed M. Mirtaheri, Zou, D., Bochmann, G.v., Jourdan, G.-V. and Onut, I.V. Dist-RIA Crawler: A Distributed Crawler for Rich Internet Applications, in Proceedings of the 8TH International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC 2013), Compiegne, France, October pages Choudhary, S., Dincturk, M.E., Mirtaheri, S.M., Jourdan, G.-V., Bochmann, G.v. and Onut, I.V. Building Rich Internet Applications Models: Example of a Better Strategy, in Proceedings of the 13th International Conference on Web engineering (ICWE 2013), Aalborg, North Denmark, July pages Choudhary, S., Dincturk, M.E., Mirtaheri, S.M., Moosavi, A., Bochmann, G.v., Jourdan, G.-V. and Onut, I.V., Crawling Rich Internet Applications: The State of the Art, in Proceedings of the CASCON 2012, November pages Dincturk, M.E., Choudhary, S., Bochmann, G.v., Jourdan, G.-V. and Onut, I.V., A Statistical Approach for Efficient Crawling of Rich Internet Applications, in Proceedings of the 12th International Conference on Web engineering (ICWE 2012), Berlin, Germany, July pages Choudhary, S., Dincturk, M.E., Bochmann, G.v., Jourdan, G.-V., Onut, I.V. and Ionescu, P., Solving Some Modeling Challenges when Testing Rich Internet Aplications for Security, in The Third International Workshop on Security Testing (SECTEST 2012), Montreal, Canada, April pages Benjamin, K., Bochmann, G.v., Dincturk, M.E., Jourdan, G.-V. and Onut, I.V., A Strategy for Efficient Crawling of Rich Internet Applications, in Proceedings of the 11th International Conference on Web engineering (ICWE 2011), Paphos, Cyprus, July pages Benjamin, K., Bochmann, G.v., Jourdan, G.-V. and Onut, I.V., Some Modeling Challenges when Testing Rich Internet Applications for Security, in First International Workshop on Modeling and Detection of Vulnerabilities (MDV 2010), Paris, France, April pages Dincturk, M.E., Jourdan, G.-V., Bochmann, G.v. and Onut, I.V., A Model-Based Approach for Crawling Rich Internet Applications, submitted to a journal. 59

60 Questions?? Comments?? These slides can be downloaded from

MODEL-BASED RICH INTERNET APPLICATIONS CRAWLING: MENU AND PROBABILITY MODELS

MODEL-BASED RICH INTERNET APPLICATIONS CRAWLING: MENU AND PROBABILITY MODELS Journal of Web Engineering, Vol. 0, No. 0 (2003) 000 000 c Rinton Press MODEL-BASED RICH INTERNET APPLICATIONS CRAWLING: MENU AND PROBABILITY MODELS SURYAKANT CHOUDHARY, EMRE DINCTURK, SEYED MIRTAHERI

More information

Distributed Crawling of Rich Internet Applications

Distributed Crawling of Rich Internet Applications Distributed Crawling of Rich Internet Applications Seyed M. Mir Taheri Thesis submitted to the Faculty of Graduate and Postdoctoral Studies in partial fulfilment of the requirements for the Doctorate in

More information

Building Rich Internet Applications Models: Example of a Better Strategy

Building Rich Internet Applications Models: Example of a Better Strategy Building Rich Internet Applications Models: Example of a Better Strategy Suryakant Choudhary 1, Mustafa Emre Dincturk 1, Seyed M. Mirtaheri 1, Guy-Vincent Jourdan 1,2, Gregor v. Bochmann 1,2, and Iosif

More information

A Statistical Approach for Efficient Crawling of Rich Internet Applications

A Statistical Approach for Efficient Crawling of Rich Internet Applications A Statistical Approach for Efficient Crawling of Rich Internet Applications Mustafa Emre Dincturk 1,3, Suryakant Choudhary 1,3, Gregor von Bochmann 1,3, Guy-Vincent Jourdan 1,3, Iosif Viorel Onut 2,3 1

More information

Component-Based Crawling of Complex Rich Internet Applications

Component-Based Crawling of Complex Rich Internet Applications Component-Based Crawling of Complex Rich Internet Applications Seyed Ali Moosavi Byooki Thesis submitted to the Faculty of Graduate and Postdoctoral Studies In partial fulfillment of the requirements For

More information

M-Crawler: Crawling Rich Internet Applications Using Menu Meta-Model

M-Crawler: Crawling Rich Internet Applications Using Menu Meta-Model M-Crawler: Crawling Rich Internet Applications Using Menu Meta-Model Suryakant Choudhary Thesis submitted to the Faculty of Graduate and Postdoctoral Studies In partial fulfillment of the requirements

More information

GDist-RIA Crawler: A Greedy Distributed Crawler for Rich Internet Applications

GDist-RIA Crawler: A Greedy Distributed Crawler for Rich Internet Applications GDist-RIA Crawler: A Greedy Distributed Crawler for Rich Internet Applications Seyed M. Mirtaheri 1(B), Gregor von Bochmann 1, Guy-Vincent Jourdan 1, and Iosif Viorel Onut 2 1 School of Electrical Engineering

More information

Efficient Reconstruction of User Sessions from HTTP traces for Rich Internet Applications

Efficient Reconstruction of User Sessions from HTTP traces for Rich Internet Applications Efficient Reconstruction of User Sessions from HTTP traces for Rich Internet Applications by Salman Hooshmand Thesis submitted in partial fulfillment of the requirements for the Doctorate in Philosophy

More information

A Strategy for Efficient Crawling of Rich Internet Applications

A Strategy for Efficient Crawling of Rich Internet Applications A Strategy for Efficient Crawling of Rich Internet Applications Kamara Benjamin 1, Gregor von Bochmann 1, Mustafa Emre Dincturk 1, Guy-Vincent Jourdan 1, and Iosif Viorel Onut 2 1 SITE, University of Ottawa.

More information

Dist-RIA Crawler: A Distributed Crawler for Rich Internet Applications

Dist-RIA Crawler: A Distributed Crawler for Rich Internet Applications Software Security Research Group (SSRG) University of Ottawa In collaboration with IBM Dist-RIA Crawler: A Distributed Crawler for Rich Internet Applications Seyed M. Mirtaheri, Di Zou, Gregor v. Bochmann,

More information

Searching for Behavioural Bugs with Stateful Test Oracles in Web Crawlers

Searching for Behavioural Bugs with Stateful Test Oracles in Web Crawlers 2017 IEEE/ACM 10th International Workshop on Search-Based Software Testing (SBST) Searching for Behavioural Bugs with Stateful Test Oracles in Web Crawlers Oussama Beroual, Francis Guérin, Sylvain Hallé

More information

TESTBEDS Paris

TESTBEDS Paris TESTBEDS 2010 - Paris Rich Internet Application Testing Using Execution Trace Data Dipartimento di Informatica e Sistemistica Università di Napoli, Federico II Naples, Italy Domenico Amalfitano Anna Rita

More information

A Brief History of Web Crawlers

A Brief History of Web Crawlers A Brief History of Web Crawlers Seyed M. Mirtaheri, Mustafa Emre Dinçtürk, Salman Hooshmand, Gregor V. Bochmann, Guy-Vincent Jourdan School of Electrical Engineering and Computer Science University of

More information

Executive Summary. Performance Report for: The web should be fast. Top 1 Priority Issues. How does this affect me?

Executive Summary. Performance Report for:   The web should be fast. Top 1 Priority Issues. How does this affect me? The web should be fast. Executive Summary Performance Report for: http://instantwebapp.co.uk/8/ Report generated: Test Server Region: Using: Fri, May 19, 2017, 4:01 AM -0700 Vancouver, Canada Firefox (Desktop)

More information

Enhanced Crawler with Multiple Search Techniques using Adaptive Link-Ranking and Pre-Query Processing

Enhanced Crawler with Multiple Search Techniques using Adaptive Link-Ranking and Pre-Query Processing Circulation in Computer Science Vol.1, No.1, pp: (40-44), Aug 2016 Available online at Enhanced Crawler with Multiple Search Techniques using Adaptive Link-Ranking and Pre-Query Processing Suchetadevi

More information

CRAWLING AJAX-BASED WEB APPLICATIONS: EVOLUTION AND STATE-OF-THE-ART. DOI: https://doi.org/ /mjcs.vol31no1.3

CRAWLING AJAX-BASED WEB APPLICATIONS: EVOLUTION AND STATE-OF-THE-ART. DOI: https://doi.org/ /mjcs.vol31no1.3 CRAWLING AJAX-BASED WEB APPLICATIONS: EVOLUTION AND STATE-OF-THE-ART Shah Khalid 1, Shah Khusro 2, Irfan Ullah 3 1 School of Computer Science and Communication Engineering, Jiangsu University, China 2,3

More information

Executive Summary. Performance Report for: The web should be fast. Top 5 Priority Issues. How does this affect me?

Executive Summary. Performance Report for:   The web should be fast. Top 5 Priority Issues. How does this affect me? The web should be fast. Executive Summary Performance Report for: https://designmartijn.nl/ Report generated: Test Server Region: Using: Sun, Sep 30, 2018, 7:29 AM -0700 Vancouver, Canada Chrome (Desktop)

More information

Executive Summary. Performance Report for: The web should be fast. Top 4 Priority Issues

Executive Summary. Performance Report for:   The web should be fast. Top 4 Priority Issues The web should be fast. Executive Summary Performance Report for: https://www.wpspeedupoptimisation.com/ Report generated: Test Server Region: Using: Tue,, 2018, 12:04 PM -0800 London, UK Chrome (Desktop)

More information

Part I: Data Mining Foundations

Part I: Data Mining Foundations Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?

More information

Executive Summary. Performance Report for: The web should be fast. Top 5 Priority Issues. How does this affect me?

Executive Summary. Performance Report for:   The web should be fast. Top 5 Priority Issues. How does this affect me? The web should be fast. Executive Summary Performance Report for: http://idwebcare.nl/ Report generated: Test Server Region: Using: Tue, Aug 29, 2017, 5:08 AM -0700 Vancouver, Canada Firefox (Desktop)

More information

The tools of the MATERIA Project

The tools of the MATERIA Project The tools of the MATERIA Project Dipartimento di Informatica e Sistemistica University of Naples Federico II, Italy Anna Rita Fasolino Porfirio Tramontana Domenico Amalfitano Outline Introduction Models

More information

Executive Summary. Performance Report for: https://edwardtbabinski.us/blogger/social/index. The web should be fast. How does this affect me?

Executive Summary. Performance Report for: https://edwardtbabinski.us/blogger/social/index. The web should be fast. How does this affect me? The web should be fast. Executive Summary Performance Report for: https://edwardtbabinski.us/blogger/social/index Report generated: Test Server Region: Using: Analysis options: Tue,, 2017, 4:21 AM -0400

More information

Executive Summary. Performance Report for: The web should be fast. Top 5 Priority Issues. How does this affect me?

Executive Summary. Performance Report for:  The web should be fast. Top 5 Priority Issues. How does this affect me? The web should be fast. Executive Summary Performance Report for: http://ardrosscs.ie/ Report generated: Test Server Region: Using: Sat, May 6, 2017, 5:14 AM -0700 Vancouver, Canada Firefox (Desktop) 49.0.2,

More information

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How

More information

Crawler with Search Engine based Simple Web Application System for Forum Mining

Crawler with Search Engine based Simple Web Application System for Forum Mining IJSRD - International Journal for Scientific Research & Development Vol. 3, Issue 04, 2015 ISSN (online): 2321-0613 Crawler with Search Engine based Simple Web Application System for Forum Mining Parina

More information

A Crawljax Based Approach to Exploit Traditional Accessibility Evaluation Tools for AJAX Applications

A Crawljax Based Approach to Exploit Traditional Accessibility Evaluation Tools for AJAX Applications A Crawljax Based Approach to Exploit Traditional Accessibility Evaluation Tools for AJAX Applications F. Ferrucci 1, F. Sarro 1, D. Ronca 1, S. Abrahao 2 Abstract In this paper, we present a Crawljax based

More information

Executive Summary. Performance Report for: The web should be fast. Top 5 Priority Issues. How does this affect me?

Executive Summary. Performance Report for:   The web should be fast. Top 5 Priority Issues. How does this affect me? The web should be fast. Executive Summary Performance Report for: http://www.ksero24h.pl/ Report generated: Test Server Region: Using: Sun, Sep 23, 2018, 9:13 AM -0700 Vancouver, Canada Chrome (Desktop)

More information

DATA MINING II - 1DL460. Spring 2014"

DATA MINING II - 1DL460. Spring 2014 DATA MINING II - 1DL460 Spring 2014" A second course in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt14 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,

More information

Executive Summary. Performance Report for: The web should be fast. Top 5 Priority Issues. How does this affect me?

Executive Summary. Performance Report for:   The web should be fast. Top 5 Priority Issues. How does this affect me? The web should be fast. Executive Summary Performance Report for: https://www.cookandlucas.com/ Report generated: Test Server Region: Using: Fri, Jul 20, 2018, 4:28 AM -0700 Vancouver, Canada Chrome (Desktop)

More information

Search Engines. Information Retrieval in Practice

Search Engines. Information Retrieval in Practice Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Web Crawler Finds and downloads web pages automatically provides the collection for searching Web is huge and constantly

More information

Short Test Cycles for Performance Testing with TruClient Technology

Short Test Cycles for Performance Testing with TruClient Technology White Paper Application Development, Test & Delivery Short Test Cycles for Performance Testing with TruClient Technology Table of Contents page Keeping Up in a Complex Era... 1 Riding the Web 2.0 Wave...

More information

Searching the Deep Web

Searching the Deep Web Searching the Deep Web 1 What is Deep Web? Information accessed only through HTML form pages database queries results embedded in HTML pages Also can included other information on Web can t directly index

More information

웹소프트웨어의신뢰성. Instructor: Gregg Rothermel Institution: 한국과학기술원 Dictated: 김윤정, 장보윤, 이유진, 이해솔, 이정연

웹소프트웨어의신뢰성. Instructor: Gregg Rothermel Institution: 한국과학기술원 Dictated: 김윤정, 장보윤, 이유진, 이해솔, 이정연 웹소프트웨어의신뢰성 Instructor: Gregg Rothermel Institution: 한국과학기술원 Dictated: 김윤정, 장보윤, 이유진, 이해솔, 이정연 [0:00] Hello everyone My name is Kyu-chul Today I m going to talk about this paper, IESE 09, name is "Invariant-based

More information

Evaluation of Long-Held HTTP Polling for PHP/MySQL Architecture

Evaluation of Long-Held HTTP Polling for PHP/MySQL Architecture Evaluation of Long-Held HTTP Polling for PHP/MySQL Architecture David Cutting University of East Anglia Purplepixie Systems David.Cutting@uea.ac.uk dcutting@purplepixie.org Abstract. When a web client

More information

UNIT-V WEB MINING. 3/18/2012 Prof. Asha Ambhaikar, RCET Bhilai.

UNIT-V WEB MINING. 3/18/2012 Prof. Asha Ambhaikar, RCET Bhilai. UNIT-V WEB MINING 1 Mining the World-Wide Web 2 What is Web Mining? Discovering useful information from the World-Wide Web and its usage patterns. 3 Web search engines Index-based: search the Web, index

More information

In this Lecture you will Learn: Testing in Software Development Process. What is Software Testing. Static Testing vs.

In this Lecture you will Learn: Testing in Software Development Process. What is Software Testing. Static Testing vs. In this Lecture you will Learn: Testing in Software Development Process Examine the verification and validation activities in software development process stage by stage Introduce some basic concepts of

More information

A crawler is a program that visits Web sites and reads their pages and other information in order to create entries for a search engine index.

A crawler is a program that visits Web sites and reads their pages and other information in order to create entries for a search engine index. A crawler is a program that visits Web sites and reads their pages and other information in order to create entries for a search engine index. The major search engines on the Web all have such a program,

More information

Cloud Computing Service Discovery Framework for IaaS and PaaS Models

Cloud Computing Service Discovery Framework for IaaS and PaaS Models Cloud Computing Service Discovery Framework for IaaS and PaaS Models Farzad Firozbakht Thesis submitted to the Faculty of Graduate and Postdoctoral Studies in partial fulfillment of the requirements for

More information

Finding Vulnerabilities in Web Applications

Finding Vulnerabilities in Web Applications Finding Vulnerabilities in Web Applications Christopher Kruegel, Technical University Vienna Evolving Networks, Evolving Threats The past few years have witnessed a significant increase in the number of

More information

Information Retrieval Spring Web retrieval

Information Retrieval Spring Web retrieval Information Retrieval Spring 2016 Web retrieval The Web Large Changing fast Public - No control over editing or contents Spam and Advertisement How big is the Web? Practically infinite due to the dynamic

More information

Executive Summary. Performance Report for: The web should be fast. Top 5 Priority Issues

Executive Summary. Performance Report for:   The web should be fast. Top 5 Priority Issues The web should be fast. Executive Summary Performance Report for: http://wkladki.net/porady/jak-usunac-zarysowa Report generated: Test Server Region: Using: Fri, Jan 22, 2016, 4:30 PM -0800 Vancouver,

More information

Business white paper. Setting the pace. Testing performance on modern applications

Business white paper. Setting the pace. Testing performance on modern applications Business white paper Setting the pace Testing performance on modern applications Table of contents 3 Keeping up in a complex era 3 Riding the 2.0 wave 4 Adjusting for modern methods 4 Out with the old:

More information

AGENCE WEB MADE IN DOM

AGENCE WEB MADE IN DOM AGENCE WEB MADE IN DOM https://madeindom.com/ Création de site internet dans les DROM GUADELOUPE - MARTINIQUE GUYANE-MAYOTTE LA REUNION RAPPORT DE VITESSE SITE INTERNET The web should be fast. Executive

More information

SERG. Crawl-Based Analysis of Web Applications: Prospects and Challenges

SERG. Crawl-Based Analysis of Web Applications: Prospects and Challenges Delft University of Technology Software Engineering Research Group Technical Report Series Crawl-Based Analysis of Web Applications: Prospects and Challenges Arie van Deursen, Ali Mesbah, and Alex Nederlof

More information

Performance Report for: Report generated: Tuesday, June 30, 2015, 3:21 AM -0700

Performance Report for:  Report generated: Tuesday, June 30, 2015, 3:21 AM -0700 The web should be fast. Executive Summary Performance Report for: http://smallbusinessfirststep.com/ Report generated: Tuesday, June 30, 2015, 3:21 AM -0700 Test Server Region: Vancouver, Canada Using:

More information

Domain Specific Search Engine for Students

Domain Specific Search Engine for Students Domain Specific Search Engine for Students Domain Specific Search Engine for Students Wai Yuen Tang The Department of Computer Science City University of Hong Kong, Hong Kong wytang@cs.cityu.edu.hk Lam

More information

Lesson 12: JavaScript and AJAX

Lesson 12: JavaScript and AJAX Lesson 12: JavaScript and AJAX Objectives Define fundamental AJAX elements and procedures Diagram common interactions among JavaScript, XML and XHTML Identify key XML structures and restrictions in relation

More information

Executive Summary. Performance Report for: The web should be fast. Top 5 Priority Issues

Executive Summary. Performance Report for:  The web should be fast. Top 5 Priority Issues The web should be fast. Executive Summary Performance Report for: http://magento-standard.eworld-accelerator.com Report generated: Test Server Region: Using: Tue, Sep 22, 2015, 11:12 AM +0200 London, UK

More information

Software Architecture and Engineering: Part II

Software Architecture and Engineering: Part II Software Architecture and Engineering: Part II ETH Zurich, Spring 2016 Prof. http://www.srl.inf.ethz.ch/ Framework SMT solver Alias Analysis Relational Analysis Assertions Second Project Static Analysis

More information

Executive Summary. Flex Bounty Program Overview. Bugcrowd Inc Page 2 of 7

Executive Summary. Flex Bounty Program Overview. Bugcrowd Inc Page 2 of 7 CANVAS by Instructure Bugcrowd Flex Program Results December 01 Executive Summary Bugcrowd Inc was engaged by Instructure to perform a Flex Bounty program, commonly known as a crowdsourced penetration

More information

Executive Summary. Performance Report for: The web should be fast. Top 5 Priority Issues. How does this affect me?

Executive Summary. Performance Report for:   The web should be fast. Top 5 Priority Issues. How does this affect me? The web should be fast. Executive Summary Performance Report for: https://www.weebly.com/ Report generated: Test Server Region: Using: Mon, Jul 30, 2018, 2:22 PM -0500 Vancouver, Canada Chrome (Android,

More information

Implementation of Enhanced Web Crawler for Deep-Web Interfaces

Implementation of Enhanced Web Crawler for Deep-Web Interfaces Implementation of Enhanced Web Crawler for Deep-Web Interfaces Yugandhara Patil 1, Sonal Patil 2 1Student, Department of Computer Science & Engineering, G.H.Raisoni Institute of Engineering & Management,

More information

Automatic Wrapper Adaptation by Tree Edit Distance Matching

Automatic Wrapper Adaptation by Tree Edit Distance Matching Automatic Wrapper Adaptation by Tree Edit Distance Matching E. Ferrara 1 R. Baumgartner 2 1 Department of Mathematics University of Messina, Italy 2 Lixto Software GmbH Vienna, Austria 2nd International

More information

Etanova Enterprise Solutions

Etanova Enterprise Solutions Etanova Enterprise Solutions Front End Development» 2018-09-23 http://www.etanova.com/technologies/front-end-development Contents HTML 5... 6 Rich Internet Applications... 6 Web Browser Hardware Acceleration...

More information

Financial. AngularJS. AngularJS.

Financial. AngularJS. AngularJS. Financial http://killexams.com/exam-detail/ Section 1: Sec One (1 to 50) Details:This section provides a huge collection of Angularjs Interview Questions with their answers hidden in a box to challenge

More information

CS6200 Information Retreival. Crawling. June 10, 2015

CS6200 Information Retreival. Crawling. June 10, 2015 CS6200 Information Retreival Crawling Crawling June 10, 2015 Crawling is one of the most important tasks of a search engine. The breadth, depth, and freshness of the search results depend crucially on

More information

Web 2.0 Käyttöliittymätekniikat

Web 2.0 Käyttöliittymätekniikat Web 2.0 Käyttöliittymätekniikat ELKOM 07 Sami Ekblad Projektipäällikkö Oy IT Mill Ltd What is Web 2.0? Social side: user generated contents: comments, opinions, images, users own the data The Long Tail:

More information

Competitive Intelligence and Web Mining:

Competitive Intelligence and Web Mining: Competitive Intelligence and Web Mining: Domain Specific Web Spiders American University in Cairo (AUC) CSCE 590: Seminar1 Report Dr. Ahmed Rafea 2 P age Khalid Magdy Salama 3 P age Table of Contents Introduction

More information

Financial. AngularJS. AngularJS. Download Full Version :

Financial. AngularJS. AngularJS. Download Full Version : Financial AngularJS AngularJS Download Full Version : https://killexams.com/pass4sure/exam-detail/angularjs Section 1: Sec One (1 to 50) Details:This section provides a huge collection of Angularjs Interview

More information

Web Data mining-a Research area in Web usage mining

Web Data mining-a Research area in Web usage mining IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 13, Issue 1 (Jul. - Aug. 2013), PP 22-26 Web Data mining-a Research area in Web usage mining 1 V.S.Thiyagarajan,

More information

Web Crawling. Jitali Patel 1, Hardik Jethva 2 Dept. of Computer Science and Engineering, Nirma University, Ahmedabad, Gujarat, India

Web Crawling. Jitali Patel 1, Hardik Jethva 2 Dept. of Computer Science and Engineering, Nirma University, Ahmedabad, Gujarat, India Web Crawling Jitali Patel 1, Hardik Jethva 2 Dept. of Computer Science and Engineering, Nirma University, Ahmedabad, Gujarat, India - 382 481. Abstract- A web crawler is a relatively simple automated program

More information

Evaluation Methods for Focused Crawling

Evaluation Methods for Focused Crawling Evaluation Methods for Focused Crawling Andrea Passerini, Paolo Frasconi, and Giovanni Soda DSI, University of Florence, ITALY {passerini,paolo,giovanni}@dsi.ing.unifi.it Abstract. The exponential growth

More information

Human vs Artificial intelligence Battle of Trust

Human vs Artificial intelligence Battle of Trust Human vs Artificial intelligence Battle of Trust Hemil Shah Co-CEO & Director Blueinfy Solutions Pvt Ltd About Hemil Shah hemil@blueinjfy.net Position -, Co-CEO & Director at BlueInfy Solutions, - Founder

More information

Workload Characterization using the TAU Performance System

Workload Characterization using the TAU Performance System Workload Characterization using the TAU Performance System Sameer Shende, Allen D. Malony, and Alan Morris Performance Research Laboratory, Department of Computer and Information Science University of

More information

Executive Summary. Performance Report for: The web should be fast. Top 5 Priority Issues. How does this affect me?

Executive Summary. Performance Report for:   The web should be fast. Top 5 Priority Issues. How does this affect me? The web should be fast. Executive Summary Performance Report for: http://atlantek.net/ Report generated: Test Server Region: Using: Sat, May 13, 2017, 8:24 AM -0700 Vancouver, Canada Firefox (Desktop)

More information

Self Adjusting Refresh Time Based Architecture for Incremental Web Crawler

Self Adjusting Refresh Time Based Architecture for Incremental Web Crawler IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.12, December 2008 349 Self Adjusting Refresh Time Based Architecture for Incremental Web Crawler A.K. Sharma 1, Ashutosh

More information

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures Springer Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web

More information

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.

More information

HP TruClient technology: Accelerating the path to testing modern applications. Business white paper

HP TruClient technology: Accelerating the path to testing modern applications. Business white paper HP TruClient technology: Accelerating the path to testing modern applications Business white paper Table of contents Executive summary...3 Introduction...3 The challenges of 2.0 applications...4 Why traditional

More information

Collection Building on the Web. Basic Algorithm

Collection Building on the Web. Basic Algorithm Collection Building on the Web CS 510 Spring 2010 1 Basic Algorithm Initialize URL queue While more If URL is not a duplicate Get document with URL [Add to database] Extract, add to queue CS 510 Spring

More information

Software Testing

Software Testing Ali Complex, 2nd block, Kormangala, Madiwala, Bengaluru-560068 Page 1 What is Software Testing? Software Testing is the process of testing software with the purpose of finding bugs and ensuring that it

More information

Distributed Web Crawling over DHTs. Boon Thau Loo, Owen Cooper, Sailesh Krishnamurthy CS294-4

Distributed Web Crawling over DHTs. Boon Thau Loo, Owen Cooper, Sailesh Krishnamurthy CS294-4 Distributed Web Crawling over DHTs Boon Thau Loo, Owen Cooper, Sailesh Krishnamurthy CS294-4 Search Today Search Index Crawl What s Wrong? Users have a limited search interface Today s web is dynamic and

More information

Design of an Agile All-Photonic Network (AAPN)

Design of an Agile All-Photonic Network (AAPN) Design of an Agile All-Photonic Network (AAPN) Gregor v. Bochmann School of Information Technology and Engineering (SITE) University of Ottawa Canada http://www.site.uottawa.ca/~bochmann/talks/aapn-results

More information

Revisiting Join Site Selection in Distributed Database Systems

Revisiting Join Site Selection in Distributed Database Systems Revisiting Join Site Selection in Distributed Database Systems Haiwei Ye 1, Brigitte Kerhervé 2, and Gregor v. Bochmann 3 1 Département d IRO, Université de Montréal, CP 6128 succ Centre-Ville, Montréal

More information

Focused crawling: a new approach to topic-specific Web resource discovery. Authors

Focused crawling: a new approach to topic-specific Web resource discovery. Authors Focused crawling: a new approach to topic-specific Web resource discovery Authors Soumen Chakrabarti Martin van den Berg Byron Dom Presented By: Mohamed Ali Soliman m2ali@cs.uwaterloo.ca Outline Why Focused

More information

Web Crawlers Detection. Yomna ElRashidy

Web Crawlers Detection. Yomna ElRashidy Web Crawlers Detection Yomna ElRashidy yomna.elrashidi@aucegypt.com Outline A web crawler is a program that traverse the web autonomously with the purpose of discovering and retrieving content and knowledge

More information

Performance Testing: Respect the Difference

Performance Testing: Respect the Difference Performance Testing: Respect the Difference Software Quality Days 2014 January 16, 2014 Alexander Podelko apodelko@yahoo.com http://alexanderpodelko.com/blog @apodelko About Me Have specialized in performance

More information

Simile Tools Workshop Summary MacKenzie Smith, MIT Libraries

Simile Tools Workshop Summary MacKenzie Smith, MIT Libraries Simile Tools Workshop Summary MacKenzie Smith, MIT Libraries Intro On June 10 th and 11 th, 2010 a group of Simile Exhibit users, software developers and architects met in Washington D.C. to discuss the

More information

Administrivia. Crawlers: Nutch. Course Overview. Issues. Crawling Issues. Groups Formed Architecture Documents under Review Group Meetings CSE 454

Administrivia. Crawlers: Nutch. Course Overview. Issues. Crawling Issues. Groups Formed Architecture Documents under Review Group Meetings CSE 454 Administrivia Crawlers: Nutch Groups Formed Architecture Documents under Review Group Meetings CSE 454 4/14/2005 12:54 PM 1 4/14/2005 12:54 PM 2 Info Extraction Course Overview Ecommerce Standard Web Search

More information

INTERRACTION COMPONENT STATE-OF-THE-ART

INTERRACTION COMPONENT STATE-OF-THE-ART INTERRACTION COMPONENT STATE-OF-THE-ART DELIVERABLE D6.1.1 By C2TECH Due date of deliverable : t0+ 6 Actual submission date: t0+ xxx Version :01 State : Draft/For approval/approved/obsolete Dissemination

More information

EXTRACTION OF RELEVANT WEB PAGES USING DATA MINING

EXTRACTION OF RELEVANT WEB PAGES USING DATA MINING Chapter 3 EXTRACTION OF RELEVANT WEB PAGES USING DATA MINING 3.1 INTRODUCTION Generally web pages are retrieved with the help of search engines which deploy crawlers for downloading purpose. Given a query,

More information

Test Automation to the Limit

Test Automation to the Limit Test Automation to the Limit Arie van Deursen Delft University of Technology Test Automation Day, 23 June, 2011 1 Outline 1. Background Joint work with Ali Mesbah (UBC), Danny Roest (TU Delft) Michaela

More information

A Source Code History Navigator

A Source Code History Navigator A Source Code History Navigator Alexander Bradley (awjb@cs.ubc.ca) CPSC 533C Project Proposal University of British Columbia October 30, 2009 1 Problem Description This project will address a problem drawn

More information

Executive Summary. Performance Report for: The web should be fast. Top 5 Priority Issues. How does this affect me?

Executive Summary. Performance Report for:  The web should be fast. Top 5 Priority Issues. How does this affect me? The web should be fast. Executive Summary Performance Report for: http://paratiboutique.com.br/ Report generated: Test Server Region: Using: Wed, Mar 7, 2018, 11:36 AM -0800 Vancouver, Canada Chrome (Desktop)

More information

Executive Summary. Performance Report for: The web should be fast. Top 5 Priority Issues. How does this affect me?

Executive Summary. Performance Report for:   The web should be fast. Top 5 Priority Issues. How does this affect me? The web should be fast. Executive Summary Performance Report for: https://lightshop1.899themes.ru/ Report generated: Test Server Region: Using: Thu, May 17, 2018, 4:02 AM -0700 Vancouver, Canada Chrome

More information

Site Audit Virgin Galactic

Site Audit Virgin Galactic Site Audit 27 Virgin Galactic Site Audit: Issues Total Score Crawled Pages 59 % 79 Healthy (34) Broken (3) Have issues (27) Redirected (3) Blocked (2) Errors Warnings Notices 25 236 5 3 25 2 Jan Jan Jan

More information

PlantSimLab An Innovative Web Application Tool for Plant Biologists

PlantSimLab An Innovative Web Application Tool for Plant Biologists PlantSimLab An Innovative Web Application Tool for Plant Biologists Feb. 17, 2014 Sook S. Ha, PhD Postdoctoral Associate Virginia Bioinformatics Institute (VBI) 1 Outline PlantSimLab Project A NSF proposal

More information

The influence of caching on web usage mining

The influence of caching on web usage mining The influence of caching on web usage mining J. Huysmans 1, B. Baesens 1,2 & J. Vanthienen 1 1 Department of Applied Economic Sciences, K.U.Leuven, Belgium 2 School of Management, University of Southampton,

More information

The Analysis and Proposed Modifications to ISO/IEC Software Engineering Software Quality Requirements and Evaluation Quality Requirements

The Analysis and Proposed Modifications to ISO/IEC Software Engineering Software Quality Requirements and Evaluation Quality Requirements Journal of Software Engineering and Applications, 2016, 9, 112-127 Published Online April 2016 in SciRes. http://www.scirp.org/journal/jsea http://dx.doi.org/10.4236/jsea.2016.94010 The Analysis and Proposed

More information

Executive Summary. Performance Report for: The web should be fast. Top 5 Priority Issues. How does this affect me?

Executive Summary. Performance Report for:   The web should be fast. Top 5 Priority Issues. How does this affect me? The web should be fast. Executive Summary Performance Report for: http://www.element-roofing.com/ Report generated: Test Server Region: Using: Wed, Nov 2, 2016, 10:31 PM -0700 Vancouver, Canada Firefox

More information

ISSN: [Zade* et al., 7(1): January, 2018] Impact Factor: 4.116

ISSN: [Zade* et al., 7(1): January, 2018] Impact Factor: 4.116 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY AN EFFICIENT METHOD FOR DEEP WEB CRAWLER BASED ON ACCURACY -A REVIEW Pranali Zade 1, Dr.S.W.Mohod 2 Student 1, Professor 2 Computer

More information

Semantic Website Clustering

Semantic Website Clustering Semantic Website Clustering I-Hsuan Yang, Yu-tsun Huang, Yen-Ling Huang 1. Abstract We propose a new approach to cluster the web pages. Utilizing an iterative reinforced algorithm, the model extracts semantic

More information

UNCLASSIFIED R-1 ITEM NOMENCLATURE FY 2013 OCO

UNCLASSIFIED R-1 ITEM NOMENCLATURE FY 2013 OCO Exhibit R-2, RDT&E Budget Item Justification: PB 2013 Office of Secretary Of Defense DATE: February 2012 0400: Research,, Test & Evaluation, Defense-Wide BA 3: Advanced Technology (ATD) COST ($ in Millions)

More information

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 12: Distributed Information Retrieval CS 347 Notes 12 2 CS 347 Notes 12 3 CS 347 Notes 12 4 CS 347 Notes 12 5 Web Search Engine Crawling

More information

Site Audit SpaceX

Site Audit SpaceX Site Audit 217 SpaceX Site Audit: Issues Total Score Crawled Pages 48 % -13 3868 Healthy (649) Broken (39) Have issues (276) Redirected (474) Blocked () Errors Warnings Notices 4164 +3311 1918 +7312 5k

More information

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 12: Distributed Information Retrieval CS 347 Notes 12 2 CS 347 Notes 12 3 CS 347 Notes 12 4 Web Search Engine Crawling Indexing Computing

More information

AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS

AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS Nilam B. Lonkar 1, Dinesh B. Hanchate 2 Student of Computer Engineering, Pune University VPKBIET, Baramati, India Computer Engineering, Pune University VPKBIET,

More information

Design and Implementation of Search Engine Using Vector Space Model for Personalized Search

Design and Implementation of Search Engine Using Vector Space Model for Personalized Search Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 1, January 2014,

More information

CMMI Version 1.2. Josh Silverman Northrop Grumman

CMMI Version 1.2. Josh Silverman Northrop Grumman CMMI Version 1.2 Josh Silverman Northrop Grumman Topics The Concept of Maturity: Why CMMI? CMMI Overview/Aspects Version 1.2 Changes Sunsetting of Version 1.1 Training Summary The Concept of Maturity:

More information

Web Usage Mining: A Research Area in Web Mining

Web Usage Mining: A Research Area in Web Mining Web Usage Mining: A Research Area in Web Mining Rajni Pamnani, Pramila Chawan Department of computer technology, VJTI University, Mumbai Abstract Web usage mining is a main research area in Web mining

More information