Crawling Rich Internet Applications
|
|
- Dylan French
- 6 years ago
- Views:
Transcription
1 Crawling Rich Internet Applications Gregor v. Bochmann (in collaboration with the SSRG group) University of Ottawa Canada Oldenburg, den 16 Dezember 2013 Qiao
2 Overview Background The evolving web Why crawling Our research project Web Crawling Traditional web crawling RIA crawling Performance objectives, assumptions Crawling strategies Breadth-first - Depth-first - Greedy Model-based strategies (Hypercube - Menu) Probabilistic strategy Component-based crawling Distributed crawling Different architectures Experimental results Conclusions 2
3 Web Crawling is Exploring Web Applications automatically Discovering the pages of a Web application Emulating the user behaviour to retrieve states of a web application. Web Crawling is as old as the web itself! From the early times of the web, matching the expansion of the web has been a challenge 3
4 Traditional Web The evolving Web static HTML pages stored as separate files, identified by a URL Deep Web Server application accesses a database, user fills request forms HTML pages dynamically created by server, identified by URL including request parameters Rich Internet Applications (RIA Web-2 ) pages contain executable code (e.g. JavaScript, Silverlight, Adobe Flex...); executed in response to user interactions, or timeouts (so-called events); script may change displayed page (the state of the application changes) with same URL. AJAX: script may interact asynchronously with the server to update the page 4
5 Example of a traditional web application Show my web site ( ) Simplified model of the web site Bochmann publications Pub publications hobbies research group DSRG Hobbies Gregor von Bochmann Painter B page with URL link (event) 5
6 6 RIA examples TestRIA, AltroMutual
7 7 RIA example - Clipmarks
8 8 RIA example Google Mail
9 The Graph Model a web application Graph model: Web page (client state of the application) node is encoded in HTML called DOM Event (click, mouse-over, etc.) edge An event triggers a transition between states Bochmann publications Pub publications hobbies research group DSRG Hobbies Gregor von Bochmann Painter B 9 page with URL link (event)
10 RIA vs. Traditional Web (Web-1) Graph model: Web-1 RIA Web page (state) : has URL few pages have a URL Event includes next URL code execution Bochmann Bochmann publications hobbies Hobbies publications hobbies Hobbies Pub publications research group DSRG Gregor von Bochmann Painter B Pub publications research group DSRG Gregor von Bochmann Painter B 10 page with URL link (event) State (no URL)
11 Why crawling Objective A: find all (or all important ) pages for content indexing for search engines for security testing and vulnerability assessment for accessibility testing Objective B: find all links between pages for ranking pages, e.g. Google ranking in search queries for automated testing and model checking of the web application for assuring that all pages have been found 11
12 Software Security Research Group (SSRG), University of Ottawa in collaboration with IBM Software Security Research Group (SSRG), University of Ottawa In collaboration with IBM University of Ottawa IBM R&D (Ottawa) Prof. Guy-Vincent Jourdan Prof. Gregor v. Bochmann -- Iosif Viorel Onut (PhD) Suryakant Choudhary (Master student) -- AppScan product team Emre Dincturk (PhD student) Khaled Ben Hafaiedh (PhD student) Seyed M. Mir Taheri (PhD student) Ali Moosavi (Master student) 12
13 View detailed security issues reports Security Issues Identified with Static Analysis (white-box view) Security Issues Identified with Dynamic Analysis (black-box view) Aggregated and correlated results Remediation Tasks Security Risk Assessment 13
14 Overview Background The evolving web Why crawling Our research project Web Crawling Traditional web crawling RIA crawling Performance objectives and assumptions Crawling strategies Breadth-first, Depth-first, Greedy Model-based strategies (Hypercube - Menu) Probabilistic strategy Component-based crawling Distributed crawling Different architectures Experimental results Conclusions 14
15 Traditional Web Crawling HTML page is a tree data structure, called DOM. It includes information about display by the browser events that can be activated by the user (for instance, clicking on certain displayed fields); for each event URL to be requested from the server through an HTTP Request (link to next page) The page returned by the server for a given URL, in general, depends on the server state and the values of cookies The displayed page is identified by its URL if we ignore server state and cookies 15
16 Traditional web crawling algorithm Given: an initial seed URL a domain (or list of domains) defining the limit of the web space to be explored Crawler variables (of type set of URLs ): exploredurls = empty unexploredurls = {seedurl} Algorithm While unexploredurls is not empty do Take a URL from unexploredurls, add it to exploredurls, request it from the server, analyse the returned page (according to the purpose of the crawl), extract the links in the page and add the corresponding URLs (if they are new, and if they are in the domain) to unexploredurls 16
17 RIA Crawling Difference from traditional web Most pages have no URL and therefore are not directly accessible When an event triggers the execution of a script, the script may change the DOM structure which may lead to a new display and a new set of enabled events that is a new state of the application. Crawling means: finding all URLs that are part of the application, plus for each URL, find all states reached (from the seed URL) by executing any sequence of events Important note: only the seed states are directly accessible by a URL publications Pub publications Bochmann hobbies research group DSRG Hobbies Gregor von Bochmann Painter B 17 State (no URL)
18 18 Difficulties for crawling RIA State identification A state can not be identified by a URL. Instead, we consider that the state is identified by the current DOM in the browser. Most links (events) do not contain a URL An event included in the DOM may not explicitly identify the next state reached when this event is executed. To determine the state reached by such an event, we have to execute that event. In traditional crawling, the event (link) contains the URL - identification of the next state reached Accessibility of states Most states are not directly accessible (no URL) only through seed URL and a sequence of events (and intermediate states)
19 Important consequence For a complete crawl (a crawl that ensures that all states of the application are found), the crawler has to execute all events in all states of the application since for any of these events, we do not know, a priory, whether its execution in the current state will lead to a new state or not. Note: In the case of traditional web crawling, it is not necessary to execute all events on all pages; it is sufficient to extract the URLs from these events, and get the page for each URL only once. 19
20 The links publication in the pages Bochmann and DSRG have the same URL The page Pub will be retrieved only once. Example The events publication in the pages Bochmann and DSRG have no URL Both events publication must be executed, and the crawler finds out that they both lead to the same client state. Bochmann Bochmann publications hobbies Hobbies publications hobbies Hobbies Pub publications research group DSRG Gregor von Bochmann Painter B Pub publications research group DSRG Gregor von Bochmann Painter B 20
21 AJAX: asynchronous interactions with the server We ignore the intermediate states in our current work, by simply waiting that a new stable state is reach after each user input 21
22 RIA: Need for DOM equivalence A given page often contains information that changes frequently, e.g. advertizing, time of the day information. This information is usually of no importance for the purpose of crawling. In the traditional web, the page identification (i.e. the URL) does not change when this information changes. In RIA, states are identified by their DOM. Therefore similar states with different advertizing would be identified as different states (which leads to a too large state space). We would like to have a state identifier that is independent of the unimportant changing information. We introduce a DOM equivalence, and all states with equivalent DOMs have the same identifier. 22
23 DOM equivalence The DOM equivalence depends on the purpose of the crawl. In the case of security testing, we are not interested in the textual content of the DOM, however, this is important for content indexing. The DOM equivalence relation is realized by a DOM reduction algorithm which produces (from a given DOM) a reduced canonical representation of the information that is considered relevant for the crawl. If the reduced DOMs obtained from two given DOMs are the same, then the given DOMs are considered equivalent, that is, they represent the same application state (for this purpose of crawl). 23
24 Form of the state identifiers The reduced DOM could be used as state identifier. however, it is quite voluminous we have to store the application model in memory during its exploration, each edge in the graph contains the identifiers of the current and next states. This is necessary to check whether a state obtained after the execution of some event is a new state or a known one Condensed state identifier: A hash of the reduced DOM The crawler also stores for each state the list of events included in the DOM, and whether they are executed or not used to select the next event to be executed during the crawl 24
25 Performance objectives Execution speed: How many events (state transitions) can be executed per hour? Complete crawl: Given enough time, the strategy terminates the crawl when all states of the application have been found. Efficiency of finding states - finding states fast : If the crawl is terminated by the user before a complete crawl is attained, the number of discovered state should be as large as possible. For many applications, a complete crawl cannot be obtained within a reasonable length of time. Therefore the third objective is very important. 25
26 Our working assumptions Deterministic RIA : the crawled RIA is deterministic from the point of view of the client (e.g. no dependence on updated database content) Given user input : we are provided a set of user inputs for text fields and build the model that corresponds to these inputs Reliable reset : we can reliably reset the system by reloading the seed URL (thus the graph is strongly connected) 26
27 Overview Background The evolving web Why crawling Our research project Web Crawling Traditional web crawling RIA crawling Performance objectives Crawling strategies Breadth-first, Depth-first, Greedy Model-based strategies (Hypercube - Menu) Probabilistic strategy Component-based crawling Distributed crawling Different architectures Experimental results Conclusions 27
28 Crawling Strategies Most work on crawling RIA do not intend to build a complete model of the application. Some consider standard strategies for the exploration of the graph model, such as Depth- First and Breadth-First. We have developed more efficient strategies based on the assumed structure of the application ( model-based strategies, see below) 28
29 Example of crawling sequence Depth-first strategy geturl(bochmann); analysedom; execute(publications) and find new state Pub; analysedom; - go back (reset) - geturl(bochmann); execute(research group) and find new state DSRG; analysedom; execute(publications) and find known state Pub; - go back (reset) - geturl(bochmann); execute(hobbies) and find new state Hobbies; analysedom and find new URL PainterB; geturl(painterb); analysedom; etc. Bochmann publications Pub publications hobbies research group DSRG Hobbies Gregor von Bochmann Painter B Such a systematic approach will execute all events and eventually find all states. 29
30 30 Resets Each time there is a go back in the crawling sequence, the crawler has to go back to a seed-url (which takes more time than executing an event) and possibly execute several events in order to reach the desired state. For instance, in the Breadth-First strategy, the crawler has to later go back to the state DSRG in order to execute the event publications Resets are much more expensive (in terms of execution times) than event executions The number of resets should be minimized. publications Pub publications Bochmann hobbies research group DSRG Hobbies Gregor von Bochmann Painter B
31 Disadvantages of standard strategies Breadth-First: No long sequences of event executions Very many Resets Depth-First: Advantage: has long sequences of event executions Disadvantage: when reaching a known state, the strategy takes a path back to a specific previous state for further event exploration. This path through known edges is often long and may involve a Reset (overhead) going back to another state with nonexecuted events may be much more efficient. 31
32 Greedy and model-based crawling The Greedy strategy Forward exploration until a state with no unexecuted events is encountered then find closest state with an unexecuted event, and continue Model-based crawling Meta-model: assumed structure of the application Crawling strategy is optimized for the case that the application follows these assumptions Crawling strategy must be able to adapt to applications that do not satisfy the meta-model 32
33 Model-based crawling: Two phases 33 State exploration phase finding all states assuming that the application follows the assumptions of the meta-model Transition exploration phase executing all remaining events in all known states (that have not been executed during the state exploration phase) Order of execution First state exploration; then transition exploration Adaptation: If new states are discovered during transition exploration phase, go back to state exploration phase, etc.
34 Comparing efficiency of finding states ```` Cost (number of event executions + reset cost) Note: log scale Number of states discovered This is for a specific application Total: 129 such comparisons should be done for many different types of applications Note: Hypercube gives similar results to Greedy
35 Comparing efficiency of exploring all edges Cost (number of event executions + reset cost) Number of edges explored Total: 10364
36 Model-based crawling: Hypercube Hypercube The state reached by a sequence of events from the initial state is independent of the order of the events. The enabled events at a state are those at the initial state minus those executed to reach that state. ++ : One can find optimal paths for state and transition exploration phases -- : very few applications follow the hypercube model 36 Example: 4-dim. Hypercube
37 Model-based crawling: Menu model Example web site: Ikebana-Ottawa ( ikebanaottawa.ca ) Hypothesis: There are three types of events: Menu events: The next state obtained is independent of the state where the event is executed Normal events: Next state depends on current page Self-loop events: Next state is equal to current state Crawling strategy Explore Normal events before Menu events, because menu events do not find any new states To classify the events, they must be executed from two different states
38 Menu strategy: state exploration From the current state, choose the next event according to the following event priority 1. Unclassified events not yet executed 2. Unclassified events once executed from a different state 3. Normal events 4. Menu events (we do not expect to find a new state) 5. Self-loop events (we do not expect to find a new state) If all events have already be executed on the current page: find a short path to a page with an event of high priority 38
39 Menu model: finding a path to next event Find path on current application model, based on executed edges predicted edges: Locally nonexecuted, but globally executed once are predicted to be of type menu Predicted edges Executed edges 39
40 Probability strategy This is a variation of the Greedy strategy. Inspired by the Menu strategy, we introduce event priorities. The priority of an event is based on statistical observations (during the crawl of the application) about the number of new states discovered when executing the given event. The strategy is based on the belief that an event which was often observed to lead to new states in the past will be more likely to lead to new states in the future. 40
41 Probability strategy: event priorities 41 Priority of events from current state: Probability of a given event e finding a new state from the current state is P(e) = ( S(e) + p S ) / ( N(e) + p N ) S : number of states found by e - N : number of times executed This is a Bayesian formula, with p S = 1 and p N = 2 gives initial probability = 0.5 If current state s has no non-executed event: Find a locally non-executed event e from some nearby state s such that P(e) is high and the path from s to s is short Note: the path from s to s is through events already executed How to find an optimal combination of high-priority event on a nearby state is described in our paper at ICWE 2012
42 Experiments We did experiments with the different crawling strategies using the following web sites: Periodic table (Local version: Clipmarks (Local version: TestRIA ( ) Altoro Mutual ( ) 42
43 43 Results: State exploration
44 Results: Transition exploration Cost for a complete crawl Cost = number of event executions + R * number of resets R = 18 for the Clipmarks web site 44
45 Component-based crawling In many web sites, the number of pages is immense because of different ordering of elements or combinations of several components : a complete crawl is not feasible Revised coverage criteria: Cover all components of pages in the application (but not all combinations or ordering of these components) Assumption: Components are independent of one another. 45
46 46 Examples of components
47 47 Assumed structure of a page
48 48 Example: The Bebop application
49 49 Performance
50 Scalability Execution time of crawl as a function of items stored in the application As expected: normal crawling has exponential complexity Component-based crawl appears to have quadratic complexity 50
51 Overview Background The evolving web Why crawling Our research project Web Crawling Traditional web crawling RIA crawling Performance objectives Crawling strategies Breadth-first, Depth-first, Greedy Model-based strategies (Hypercube - Menu) Probabilistic strategy Component-based crawling Distributed crawling Different architectures Experimental results Conclusions 51
52 Distributed crawling Observation: On average, event execution and analysis of the next state discovered takes about 20 times more time than deciding on the next event to be executed. Question: Can the crawling of a complex application be accelerated by distributing the crawling over several computers / cores? 52
53 53 Different distributed architectures 1. Central coordinator keeps information about the discovered application model 1. Each crawler contacts the Coordinator after each execution of an event and obtains the next event to be executed (coordinator performs the crawling strategy) dynamic event allocation to crawlers 2. Static event allocation to crawlers (crawlers obtain application model from coordinator and perform crawling strategy locally, only for allocated events) 2. Several coordinators share the information about the application model A distributed hash table is used to allocate the states of the model to the different coordinators Each coordinator is associated with approximately 20 crawlers Coordinators perform the crawling strategy, but using partial model information different sharing schemes can be envisioned for exchanging information between the coordinators
54 Notes: The BF strategy has bad performance, but has the advantage that only the states of the model must be shared with the crawlers (not the transitions). One sees the expected decrease in crawling time The delay due to the coordinator is negligible, even for 15 crawlers The static allocation of events leads to unequal loads dynamic load sharing among crawlers may be useful 54 Experimental results (architecture 1.2 BF strategy)
55 Experimental results (architecture 1.1 Greedy strategy) Notes: The greedy strategy has good performance. In this architecture, the model information is not shared with the crawlers. Again, one sees the expected decrease in crawling time 55
56 Performance depends on sharing scheme In case that there is no unexecuted event from the current state, the coordinator has to find another state with an unexecuted event Reset-only: Use reset to reach a different state Local Knowledge: Find shortest path (SP) to new state based on local knowledge of the application model Shared Knowledge: Use SP based on knowledge sharing, piggy-backed on other messages Forward Exploration: A distributed algorithm for finding SP 56 Simulation results (architecture 2 Greedy strategy) Notes: Fixed number of crawlers, varying number of Coordinators (overload ignored)
57 Overview Background The evolving web Why crawling Our research project Web Crawling Traditional web crawling RIA crawling Performance objectives Crawling strategies Breadth-first, Depth-first, Greedy Model-based strategies (Hypercube - Menu) Probabilistic strategy Component-based crawling Distributed crawling Different architectures Experimental results Conclusions 57
58 58 Conclusions RIA crawling is quite different from traditional web crawling Different crawling strategies can improve the efficiency of crawling The crawling of a RIA can be effectively distributed over several crawling engines We have developed prototypes of our crawling strategies, integrated with the IBM AppScan product
59 References Background; MESBAH, A., DEURSEN, A.V. AND LENSELINK, S., Crawling Ajax-based Web Applications through Dynamic Analysis of User Interface State Changes. ACM Transactions on the Web (TWEB), 6(1), a23. Our Papers: Seyed M. Mirtaheri, Mustafa Emre Dincturk, Salman Hooshmand, Bochmann, G.v., Jourdan, G.-V. and Onut, I.V., A Brief History of Web Crawlers, in Proceedings of the CASCON 2013, November pages Seyed M. Mirtaheri, Zou, D., Bochmann, G.v., Jourdan, G.-V. and Onut, I.V. Dist-RIA Crawler: A Distributed Crawler for Rich Internet Applications, in Proceedings of the 8TH International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC 2013), Compiegne, France, October pages Choudhary, S., Dincturk, M.E., Mirtaheri, S.M., Jourdan, G.-V., Bochmann, G.v. and Onut, I.V. Building Rich Internet Applications Models: Example of a Better Strategy, in Proceedings of the 13th International Conference on Web engineering (ICWE 2013), Aalborg, North Denmark, July pages Choudhary, S., Dincturk, M.E., Mirtaheri, S.M., Moosavi, A., Bochmann, G.v., Jourdan, G.-V. and Onut, I.V., Crawling Rich Internet Applications: The State of the Art, in Proceedings of the CASCON 2012, November pages Dincturk, M.E., Choudhary, S., Bochmann, G.v., Jourdan, G.-V. and Onut, I.V., A Statistical Approach for Efficient Crawling of Rich Internet Applications, in Proceedings of the 12th International Conference on Web engineering (ICWE 2012), Berlin, Germany, July pages Choudhary, S., Dincturk, M.E., Bochmann, G.v., Jourdan, G.-V., Onut, I.V. and Ionescu, P., Solving Some Modeling Challenges when Testing Rich Internet Aplications for Security, in The Third International Workshop on Security Testing (SECTEST 2012), Montreal, Canada, April pages Benjamin, K., Bochmann, G.v., Dincturk, M.E., Jourdan, G.-V. and Onut, I.V., A Strategy for Efficient Crawling of Rich Internet Applications, in Proceedings of the 11th International Conference on Web engineering (ICWE 2011), Paphos, Cyprus, July pages Benjamin, K., Bochmann, G.v., Jourdan, G.-V. and Onut, I.V., Some Modeling Challenges when Testing Rich Internet Applications for Security, in First International Workshop on Modeling and Detection of Vulnerabilities (MDV 2010), Paris, France, April pages Dincturk, M.E., Jourdan, G.-V., Bochmann, G.v. and Onut, I.V., A Model-Based Approach for Crawling Rich Internet Applications, submitted to a journal. 59
60 Questions?? Comments?? These slides can be downloaded from
MODEL-BASED RICH INTERNET APPLICATIONS CRAWLING: MENU AND PROBABILITY MODELS
Journal of Web Engineering, Vol. 0, No. 0 (2003) 000 000 c Rinton Press MODEL-BASED RICH INTERNET APPLICATIONS CRAWLING: MENU AND PROBABILITY MODELS SURYAKANT CHOUDHARY, EMRE DINCTURK, SEYED MIRTAHERI
More informationDistributed Crawling of Rich Internet Applications
Distributed Crawling of Rich Internet Applications Seyed M. Mir Taheri Thesis submitted to the Faculty of Graduate and Postdoctoral Studies in partial fulfilment of the requirements for the Doctorate in
More informationBuilding Rich Internet Applications Models: Example of a Better Strategy
Building Rich Internet Applications Models: Example of a Better Strategy Suryakant Choudhary 1, Mustafa Emre Dincturk 1, Seyed M. Mirtaheri 1, Guy-Vincent Jourdan 1,2, Gregor v. Bochmann 1,2, and Iosif
More informationA Statistical Approach for Efficient Crawling of Rich Internet Applications
A Statistical Approach for Efficient Crawling of Rich Internet Applications Mustafa Emre Dincturk 1,3, Suryakant Choudhary 1,3, Gregor von Bochmann 1,3, Guy-Vincent Jourdan 1,3, Iosif Viorel Onut 2,3 1
More informationComponent-Based Crawling of Complex Rich Internet Applications
Component-Based Crawling of Complex Rich Internet Applications Seyed Ali Moosavi Byooki Thesis submitted to the Faculty of Graduate and Postdoctoral Studies In partial fulfillment of the requirements For
More informationM-Crawler: Crawling Rich Internet Applications Using Menu Meta-Model
M-Crawler: Crawling Rich Internet Applications Using Menu Meta-Model Suryakant Choudhary Thesis submitted to the Faculty of Graduate and Postdoctoral Studies In partial fulfillment of the requirements
More informationGDist-RIA Crawler: A Greedy Distributed Crawler for Rich Internet Applications
GDist-RIA Crawler: A Greedy Distributed Crawler for Rich Internet Applications Seyed M. Mirtaheri 1(B), Gregor von Bochmann 1, Guy-Vincent Jourdan 1, and Iosif Viorel Onut 2 1 School of Electrical Engineering
More informationEfficient Reconstruction of User Sessions from HTTP traces for Rich Internet Applications
Efficient Reconstruction of User Sessions from HTTP traces for Rich Internet Applications by Salman Hooshmand Thesis submitted in partial fulfillment of the requirements for the Doctorate in Philosophy
More informationA Strategy for Efficient Crawling of Rich Internet Applications
A Strategy for Efficient Crawling of Rich Internet Applications Kamara Benjamin 1, Gregor von Bochmann 1, Mustafa Emre Dincturk 1, Guy-Vincent Jourdan 1, and Iosif Viorel Onut 2 1 SITE, University of Ottawa.
More informationDist-RIA Crawler: A Distributed Crawler for Rich Internet Applications
Software Security Research Group (SSRG) University of Ottawa In collaboration with IBM Dist-RIA Crawler: A Distributed Crawler for Rich Internet Applications Seyed M. Mirtaheri, Di Zou, Gregor v. Bochmann,
More informationSearching for Behavioural Bugs with Stateful Test Oracles in Web Crawlers
2017 IEEE/ACM 10th International Workshop on Search-Based Software Testing (SBST) Searching for Behavioural Bugs with Stateful Test Oracles in Web Crawlers Oussama Beroual, Francis Guérin, Sylvain Hallé
More informationTESTBEDS Paris
TESTBEDS 2010 - Paris Rich Internet Application Testing Using Execution Trace Data Dipartimento di Informatica e Sistemistica Università di Napoli, Federico II Naples, Italy Domenico Amalfitano Anna Rita
More informationA Brief History of Web Crawlers
A Brief History of Web Crawlers Seyed M. Mirtaheri, Mustafa Emre Dinçtürk, Salman Hooshmand, Gregor V. Bochmann, Guy-Vincent Jourdan School of Electrical Engineering and Computer Science University of
More informationExecutive Summary. Performance Report for: The web should be fast. Top 1 Priority Issues. How does this affect me?
The web should be fast. Executive Summary Performance Report for: http://instantwebapp.co.uk/8/ Report generated: Test Server Region: Using: Fri, May 19, 2017, 4:01 AM -0700 Vancouver, Canada Firefox (Desktop)
More informationEnhanced Crawler with Multiple Search Techniques using Adaptive Link-Ranking and Pre-Query Processing
Circulation in Computer Science Vol.1, No.1, pp: (40-44), Aug 2016 Available online at Enhanced Crawler with Multiple Search Techniques using Adaptive Link-Ranking and Pre-Query Processing Suchetadevi
More informationCRAWLING AJAX-BASED WEB APPLICATIONS: EVOLUTION AND STATE-OF-THE-ART. DOI: https://doi.org/ /mjcs.vol31no1.3
CRAWLING AJAX-BASED WEB APPLICATIONS: EVOLUTION AND STATE-OF-THE-ART Shah Khalid 1, Shah Khusro 2, Irfan Ullah 3 1 School of Computer Science and Communication Engineering, Jiangsu University, China 2,3
More informationExecutive Summary. Performance Report for: The web should be fast. Top 5 Priority Issues. How does this affect me?
The web should be fast. Executive Summary Performance Report for: https://designmartijn.nl/ Report generated: Test Server Region: Using: Sun, Sep 30, 2018, 7:29 AM -0700 Vancouver, Canada Chrome (Desktop)
More informationExecutive Summary. Performance Report for: The web should be fast. Top 4 Priority Issues
The web should be fast. Executive Summary Performance Report for: https://www.wpspeedupoptimisation.com/ Report generated: Test Server Region: Using: Tue,, 2018, 12:04 PM -0800 London, UK Chrome (Desktop)
More informationPart I: Data Mining Foundations
Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?
More informationExecutive Summary. Performance Report for: The web should be fast. Top 5 Priority Issues. How does this affect me?
The web should be fast. Executive Summary Performance Report for: http://idwebcare.nl/ Report generated: Test Server Region: Using: Tue, Aug 29, 2017, 5:08 AM -0700 Vancouver, Canada Firefox (Desktop)
More informationThe tools of the MATERIA Project
The tools of the MATERIA Project Dipartimento di Informatica e Sistemistica University of Naples Federico II, Italy Anna Rita Fasolino Porfirio Tramontana Domenico Amalfitano Outline Introduction Models
More informationExecutive Summary. Performance Report for: https://edwardtbabinski.us/blogger/social/index. The web should be fast. How does this affect me?
The web should be fast. Executive Summary Performance Report for: https://edwardtbabinski.us/blogger/social/index Report generated: Test Server Region: Using: Analysis options: Tue,, 2017, 4:21 AM -0400
More informationExecutive Summary. Performance Report for: The web should be fast. Top 5 Priority Issues. How does this affect me?
The web should be fast. Executive Summary Performance Report for: http://ardrosscs.ie/ Report generated: Test Server Region: Using: Sat, May 6, 2017, 5:14 AM -0700 Vancouver, Canada Firefox (Desktop) 49.0.2,
More informationIntroduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.
Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How
More informationCrawler with Search Engine based Simple Web Application System for Forum Mining
IJSRD - International Journal for Scientific Research & Development Vol. 3, Issue 04, 2015 ISSN (online): 2321-0613 Crawler with Search Engine based Simple Web Application System for Forum Mining Parina
More informationA Crawljax Based Approach to Exploit Traditional Accessibility Evaluation Tools for AJAX Applications
A Crawljax Based Approach to Exploit Traditional Accessibility Evaluation Tools for AJAX Applications F. Ferrucci 1, F. Sarro 1, D. Ronca 1, S. Abrahao 2 Abstract In this paper, we present a Crawljax based
More informationExecutive Summary. Performance Report for: The web should be fast. Top 5 Priority Issues. How does this affect me?
The web should be fast. Executive Summary Performance Report for: http://www.ksero24h.pl/ Report generated: Test Server Region: Using: Sun, Sep 23, 2018, 9:13 AM -0700 Vancouver, Canada Chrome (Desktop)
More informationDATA MINING II - 1DL460. Spring 2014"
DATA MINING II - 1DL460 Spring 2014" A second course in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt14 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,
More informationExecutive Summary. Performance Report for: The web should be fast. Top 5 Priority Issues. How does this affect me?
The web should be fast. Executive Summary Performance Report for: https://www.cookandlucas.com/ Report generated: Test Server Region: Using: Fri, Jul 20, 2018, 4:28 AM -0700 Vancouver, Canada Chrome (Desktop)
More informationSearch Engines. Information Retrieval in Practice
Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Web Crawler Finds and downloads web pages automatically provides the collection for searching Web is huge and constantly
More informationShort Test Cycles for Performance Testing with TruClient Technology
White Paper Application Development, Test & Delivery Short Test Cycles for Performance Testing with TruClient Technology Table of Contents page Keeping Up in a Complex Era... 1 Riding the Web 2.0 Wave...
More informationSearching the Deep Web
Searching the Deep Web 1 What is Deep Web? Information accessed only through HTML form pages database queries results embedded in HTML pages Also can included other information on Web can t directly index
More information웹소프트웨어의신뢰성. Instructor: Gregg Rothermel Institution: 한국과학기술원 Dictated: 김윤정, 장보윤, 이유진, 이해솔, 이정연
웹소프트웨어의신뢰성 Instructor: Gregg Rothermel Institution: 한국과학기술원 Dictated: 김윤정, 장보윤, 이유진, 이해솔, 이정연 [0:00] Hello everyone My name is Kyu-chul Today I m going to talk about this paper, IESE 09, name is "Invariant-based
More informationEvaluation of Long-Held HTTP Polling for PHP/MySQL Architecture
Evaluation of Long-Held HTTP Polling for PHP/MySQL Architecture David Cutting University of East Anglia Purplepixie Systems David.Cutting@uea.ac.uk dcutting@purplepixie.org Abstract. When a web client
More informationUNIT-V WEB MINING. 3/18/2012 Prof. Asha Ambhaikar, RCET Bhilai.
UNIT-V WEB MINING 1 Mining the World-Wide Web 2 What is Web Mining? Discovering useful information from the World-Wide Web and its usage patterns. 3 Web search engines Index-based: search the Web, index
More informationIn this Lecture you will Learn: Testing in Software Development Process. What is Software Testing. Static Testing vs.
In this Lecture you will Learn: Testing in Software Development Process Examine the verification and validation activities in software development process stage by stage Introduce some basic concepts of
More informationA crawler is a program that visits Web sites and reads their pages and other information in order to create entries for a search engine index.
A crawler is a program that visits Web sites and reads their pages and other information in order to create entries for a search engine index. The major search engines on the Web all have such a program,
More informationCloud Computing Service Discovery Framework for IaaS and PaaS Models
Cloud Computing Service Discovery Framework for IaaS and PaaS Models Farzad Firozbakht Thesis submitted to the Faculty of Graduate and Postdoctoral Studies in partial fulfillment of the requirements for
More informationFinding Vulnerabilities in Web Applications
Finding Vulnerabilities in Web Applications Christopher Kruegel, Technical University Vienna Evolving Networks, Evolving Threats The past few years have witnessed a significant increase in the number of
More informationInformation Retrieval Spring Web retrieval
Information Retrieval Spring 2016 Web retrieval The Web Large Changing fast Public - No control over editing or contents Spam and Advertisement How big is the Web? Practically infinite due to the dynamic
More informationExecutive Summary. Performance Report for: The web should be fast. Top 5 Priority Issues
The web should be fast. Executive Summary Performance Report for: http://wkladki.net/porady/jak-usunac-zarysowa Report generated: Test Server Region: Using: Fri, Jan 22, 2016, 4:30 PM -0800 Vancouver,
More informationBusiness white paper. Setting the pace. Testing performance on modern applications
Business white paper Setting the pace Testing performance on modern applications Table of contents 3 Keeping up in a complex era 3 Riding the 2.0 wave 4 Adjusting for modern methods 4 Out with the old:
More informationAGENCE WEB MADE IN DOM
AGENCE WEB MADE IN DOM https://madeindom.com/ Création de site internet dans les DROM GUADELOUPE - MARTINIQUE GUYANE-MAYOTTE LA REUNION RAPPORT DE VITESSE SITE INTERNET The web should be fast. Executive
More informationSERG. Crawl-Based Analysis of Web Applications: Prospects and Challenges
Delft University of Technology Software Engineering Research Group Technical Report Series Crawl-Based Analysis of Web Applications: Prospects and Challenges Arie van Deursen, Ali Mesbah, and Alex Nederlof
More informationPerformance Report for: Report generated: Tuesday, June 30, 2015, 3:21 AM -0700
The web should be fast. Executive Summary Performance Report for: http://smallbusinessfirststep.com/ Report generated: Tuesday, June 30, 2015, 3:21 AM -0700 Test Server Region: Vancouver, Canada Using:
More informationDomain Specific Search Engine for Students
Domain Specific Search Engine for Students Domain Specific Search Engine for Students Wai Yuen Tang The Department of Computer Science City University of Hong Kong, Hong Kong wytang@cs.cityu.edu.hk Lam
More informationLesson 12: JavaScript and AJAX
Lesson 12: JavaScript and AJAX Objectives Define fundamental AJAX elements and procedures Diagram common interactions among JavaScript, XML and XHTML Identify key XML structures and restrictions in relation
More informationExecutive Summary. Performance Report for: The web should be fast. Top 5 Priority Issues
The web should be fast. Executive Summary Performance Report for: http://magento-standard.eworld-accelerator.com Report generated: Test Server Region: Using: Tue, Sep 22, 2015, 11:12 AM +0200 London, UK
More informationSoftware Architecture and Engineering: Part II
Software Architecture and Engineering: Part II ETH Zurich, Spring 2016 Prof. http://www.srl.inf.ethz.ch/ Framework SMT solver Alias Analysis Relational Analysis Assertions Second Project Static Analysis
More informationExecutive Summary. Flex Bounty Program Overview. Bugcrowd Inc Page 2 of 7
CANVAS by Instructure Bugcrowd Flex Program Results December 01 Executive Summary Bugcrowd Inc was engaged by Instructure to perform a Flex Bounty program, commonly known as a crowdsourced penetration
More informationExecutive Summary. Performance Report for: The web should be fast. Top 5 Priority Issues. How does this affect me?
The web should be fast. Executive Summary Performance Report for: https://www.weebly.com/ Report generated: Test Server Region: Using: Mon, Jul 30, 2018, 2:22 PM -0500 Vancouver, Canada Chrome (Android,
More informationImplementation of Enhanced Web Crawler for Deep-Web Interfaces
Implementation of Enhanced Web Crawler for Deep-Web Interfaces Yugandhara Patil 1, Sonal Patil 2 1Student, Department of Computer Science & Engineering, G.H.Raisoni Institute of Engineering & Management,
More informationAutomatic Wrapper Adaptation by Tree Edit Distance Matching
Automatic Wrapper Adaptation by Tree Edit Distance Matching E. Ferrara 1 R. Baumgartner 2 1 Department of Mathematics University of Messina, Italy 2 Lixto Software GmbH Vienna, Austria 2nd International
More informationEtanova Enterprise Solutions
Etanova Enterprise Solutions Front End Development» 2018-09-23 http://www.etanova.com/technologies/front-end-development Contents HTML 5... 6 Rich Internet Applications... 6 Web Browser Hardware Acceleration...
More informationFinancial. AngularJS. AngularJS.
Financial http://killexams.com/exam-detail/ Section 1: Sec One (1 to 50) Details:This section provides a huge collection of Angularjs Interview Questions with their answers hidden in a box to challenge
More informationCS6200 Information Retreival. Crawling. June 10, 2015
CS6200 Information Retreival Crawling Crawling June 10, 2015 Crawling is one of the most important tasks of a search engine. The breadth, depth, and freshness of the search results depend crucially on
More informationWeb 2.0 Käyttöliittymätekniikat
Web 2.0 Käyttöliittymätekniikat ELKOM 07 Sami Ekblad Projektipäällikkö Oy IT Mill Ltd What is Web 2.0? Social side: user generated contents: comments, opinions, images, users own the data The Long Tail:
More informationCompetitive Intelligence and Web Mining:
Competitive Intelligence and Web Mining: Domain Specific Web Spiders American University in Cairo (AUC) CSCE 590: Seminar1 Report Dr. Ahmed Rafea 2 P age Khalid Magdy Salama 3 P age Table of Contents Introduction
More informationFinancial. AngularJS. AngularJS. Download Full Version :
Financial AngularJS AngularJS Download Full Version : https://killexams.com/pass4sure/exam-detail/angularjs Section 1: Sec One (1 to 50) Details:This section provides a huge collection of Angularjs Interview
More informationWeb Data mining-a Research area in Web usage mining
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 13, Issue 1 (Jul. - Aug. 2013), PP 22-26 Web Data mining-a Research area in Web usage mining 1 V.S.Thiyagarajan,
More informationWeb Crawling. Jitali Patel 1, Hardik Jethva 2 Dept. of Computer Science and Engineering, Nirma University, Ahmedabad, Gujarat, India
Web Crawling Jitali Patel 1, Hardik Jethva 2 Dept. of Computer Science and Engineering, Nirma University, Ahmedabad, Gujarat, India - 382 481. Abstract- A web crawler is a relatively simple automated program
More informationEvaluation Methods for Focused Crawling
Evaluation Methods for Focused Crawling Andrea Passerini, Paolo Frasconi, and Giovanni Soda DSI, University of Florence, ITALY {passerini,paolo,giovanni}@dsi.ing.unifi.it Abstract. The exponential growth
More informationHuman vs Artificial intelligence Battle of Trust
Human vs Artificial intelligence Battle of Trust Hemil Shah Co-CEO & Director Blueinfy Solutions Pvt Ltd About Hemil Shah hemil@blueinjfy.net Position -, Co-CEO & Director at BlueInfy Solutions, - Founder
More informationWorkload Characterization using the TAU Performance System
Workload Characterization using the TAU Performance System Sameer Shende, Allen D. Malony, and Alan Morris Performance Research Laboratory, Department of Computer and Information Science University of
More informationExecutive Summary. Performance Report for: The web should be fast. Top 5 Priority Issues. How does this affect me?
The web should be fast. Executive Summary Performance Report for: http://atlantek.net/ Report generated: Test Server Region: Using: Sat, May 13, 2017, 8:24 AM -0700 Vancouver, Canada Firefox (Desktop)
More informationSelf Adjusting Refresh Time Based Architecture for Incremental Web Crawler
IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.12, December 2008 349 Self Adjusting Refresh Time Based Architecture for Incremental Web Crawler A.K. Sharma 1, Ashutosh
More informationBing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer
Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures Springer Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web
More informationTERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES
TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.
More informationHP TruClient technology: Accelerating the path to testing modern applications. Business white paper
HP TruClient technology: Accelerating the path to testing modern applications Business white paper Table of contents Executive summary...3 Introduction...3 The challenges of 2.0 applications...4 Why traditional
More informationCollection Building on the Web. Basic Algorithm
Collection Building on the Web CS 510 Spring 2010 1 Basic Algorithm Initialize URL queue While more If URL is not a duplicate Get document with URL [Add to database] Extract, add to queue CS 510 Spring
More informationSoftware Testing
Ali Complex, 2nd block, Kormangala, Madiwala, Bengaluru-560068 Page 1 What is Software Testing? Software Testing is the process of testing software with the purpose of finding bugs and ensuring that it
More informationDistributed Web Crawling over DHTs. Boon Thau Loo, Owen Cooper, Sailesh Krishnamurthy CS294-4
Distributed Web Crawling over DHTs Boon Thau Loo, Owen Cooper, Sailesh Krishnamurthy CS294-4 Search Today Search Index Crawl What s Wrong? Users have a limited search interface Today s web is dynamic and
More informationDesign of an Agile All-Photonic Network (AAPN)
Design of an Agile All-Photonic Network (AAPN) Gregor v. Bochmann School of Information Technology and Engineering (SITE) University of Ottawa Canada http://www.site.uottawa.ca/~bochmann/talks/aapn-results
More informationRevisiting Join Site Selection in Distributed Database Systems
Revisiting Join Site Selection in Distributed Database Systems Haiwei Ye 1, Brigitte Kerhervé 2, and Gregor v. Bochmann 3 1 Département d IRO, Université de Montréal, CP 6128 succ Centre-Ville, Montréal
More informationFocused crawling: a new approach to topic-specific Web resource discovery. Authors
Focused crawling: a new approach to topic-specific Web resource discovery Authors Soumen Chakrabarti Martin van den Berg Byron Dom Presented By: Mohamed Ali Soliman m2ali@cs.uwaterloo.ca Outline Why Focused
More informationWeb Crawlers Detection. Yomna ElRashidy
Web Crawlers Detection Yomna ElRashidy yomna.elrashidi@aucegypt.com Outline A web crawler is a program that traverse the web autonomously with the purpose of discovering and retrieving content and knowledge
More informationPerformance Testing: Respect the Difference
Performance Testing: Respect the Difference Software Quality Days 2014 January 16, 2014 Alexander Podelko apodelko@yahoo.com http://alexanderpodelko.com/blog @apodelko About Me Have specialized in performance
More informationSimile Tools Workshop Summary MacKenzie Smith, MIT Libraries
Simile Tools Workshop Summary MacKenzie Smith, MIT Libraries Intro On June 10 th and 11 th, 2010 a group of Simile Exhibit users, software developers and architects met in Washington D.C. to discuss the
More informationAdministrivia. Crawlers: Nutch. Course Overview. Issues. Crawling Issues. Groups Formed Architecture Documents under Review Group Meetings CSE 454
Administrivia Crawlers: Nutch Groups Formed Architecture Documents under Review Group Meetings CSE 454 4/14/2005 12:54 PM 1 4/14/2005 12:54 PM 2 Info Extraction Course Overview Ecommerce Standard Web Search
More informationINTERRACTION COMPONENT STATE-OF-THE-ART
INTERRACTION COMPONENT STATE-OF-THE-ART DELIVERABLE D6.1.1 By C2TECH Due date of deliverable : t0+ 6 Actual submission date: t0+ xxx Version :01 State : Draft/For approval/approved/obsolete Dissemination
More informationEXTRACTION OF RELEVANT WEB PAGES USING DATA MINING
Chapter 3 EXTRACTION OF RELEVANT WEB PAGES USING DATA MINING 3.1 INTRODUCTION Generally web pages are retrieved with the help of search engines which deploy crawlers for downloading purpose. Given a query,
More informationTest Automation to the Limit
Test Automation to the Limit Arie van Deursen Delft University of Technology Test Automation Day, 23 June, 2011 1 Outline 1. Background Joint work with Ali Mesbah (UBC), Danny Roest (TU Delft) Michaela
More informationA Source Code History Navigator
A Source Code History Navigator Alexander Bradley (awjb@cs.ubc.ca) CPSC 533C Project Proposal University of British Columbia October 30, 2009 1 Problem Description This project will address a problem drawn
More informationExecutive Summary. Performance Report for: The web should be fast. Top 5 Priority Issues. How does this affect me?
The web should be fast. Executive Summary Performance Report for: http://paratiboutique.com.br/ Report generated: Test Server Region: Using: Wed, Mar 7, 2018, 11:36 AM -0800 Vancouver, Canada Chrome (Desktop)
More informationExecutive Summary. Performance Report for: The web should be fast. Top 5 Priority Issues. How does this affect me?
The web should be fast. Executive Summary Performance Report for: https://lightshop1.899themes.ru/ Report generated: Test Server Region: Using: Thu, May 17, 2018, 4:02 AM -0700 Vancouver, Canada Chrome
More informationSite Audit Virgin Galactic
Site Audit 27 Virgin Galactic Site Audit: Issues Total Score Crawled Pages 59 % 79 Healthy (34) Broken (3) Have issues (27) Redirected (3) Blocked (2) Errors Warnings Notices 25 236 5 3 25 2 Jan Jan Jan
More informationPlantSimLab An Innovative Web Application Tool for Plant Biologists
PlantSimLab An Innovative Web Application Tool for Plant Biologists Feb. 17, 2014 Sook S. Ha, PhD Postdoctoral Associate Virginia Bioinformatics Institute (VBI) 1 Outline PlantSimLab Project A NSF proposal
More informationThe influence of caching on web usage mining
The influence of caching on web usage mining J. Huysmans 1, B. Baesens 1,2 & J. Vanthienen 1 1 Department of Applied Economic Sciences, K.U.Leuven, Belgium 2 School of Management, University of Southampton,
More informationThe Analysis and Proposed Modifications to ISO/IEC Software Engineering Software Quality Requirements and Evaluation Quality Requirements
Journal of Software Engineering and Applications, 2016, 9, 112-127 Published Online April 2016 in SciRes. http://www.scirp.org/journal/jsea http://dx.doi.org/10.4236/jsea.2016.94010 The Analysis and Proposed
More informationExecutive Summary. Performance Report for: The web should be fast. Top 5 Priority Issues. How does this affect me?
The web should be fast. Executive Summary Performance Report for: http://www.element-roofing.com/ Report generated: Test Server Region: Using: Wed, Nov 2, 2016, 10:31 PM -0700 Vancouver, Canada Firefox
More informationISSN: [Zade* et al., 7(1): January, 2018] Impact Factor: 4.116
IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY AN EFFICIENT METHOD FOR DEEP WEB CRAWLER BASED ON ACCURACY -A REVIEW Pranali Zade 1, Dr.S.W.Mohod 2 Student 1, Professor 2 Computer
More informationSemantic Website Clustering
Semantic Website Clustering I-Hsuan Yang, Yu-tsun Huang, Yen-Ling Huang 1. Abstract We propose a new approach to cluster the web pages. Utilizing an iterative reinforced algorithm, the model extracts semantic
More informationUNCLASSIFIED R-1 ITEM NOMENCLATURE FY 2013 OCO
Exhibit R-2, RDT&E Budget Item Justification: PB 2013 Office of Secretary Of Defense DATE: February 2012 0400: Research,, Test & Evaluation, Defense-Wide BA 3: Advanced Technology (ATD) COST ($ in Millions)
More informationCS 347 Parallel and Distributed Data Processing
CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 12: Distributed Information Retrieval CS 347 Notes 12 2 CS 347 Notes 12 3 CS 347 Notes 12 4 CS 347 Notes 12 5 Web Search Engine Crawling
More informationSite Audit SpaceX
Site Audit 217 SpaceX Site Audit: Issues Total Score Crawled Pages 48 % -13 3868 Healthy (649) Broken (39) Have issues (276) Redirected (474) Blocked () Errors Warnings Notices 4164 +3311 1918 +7312 5k
More informationCS 347 Parallel and Distributed Data Processing
CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 12: Distributed Information Retrieval CS 347 Notes 12 2 CS 347 Notes 12 3 CS 347 Notes 12 4 Web Search Engine Crawling Indexing Computing
More informationAUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS
AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS Nilam B. Lonkar 1, Dinesh B. Hanchate 2 Student of Computer Engineering, Pune University VPKBIET, Baramati, India Computer Engineering, Pune University VPKBIET,
More informationDesign and Implementation of Search Engine Using Vector Space Model for Personalized Search
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 1, January 2014,
More informationCMMI Version 1.2. Josh Silverman Northrop Grumman
CMMI Version 1.2 Josh Silverman Northrop Grumman Topics The Concept of Maturity: Why CMMI? CMMI Overview/Aspects Version 1.2 Changes Sunsetting of Version 1.1 Training Summary The Concept of Maturity:
More informationWeb Usage Mining: A Research Area in Web Mining
Web Usage Mining: A Research Area in Web Mining Rajni Pamnani, Pramila Chawan Department of computer technology, VJTI University, Mumbai Abstract Web usage mining is a main research area in Web mining
More information