Chapter-IV WEBOMETRICS

Size: px
Start display at page:

Download "Chapter-IV WEBOMETRICS"

Transcription

1 Chapter-IV WEBOMETRICS 4.1-Introduction Webometrics is the quantitative analysis of web phenomena, drawing upon informetric methods and typically addressing problems related to bibliometrics. Webometrics was triggered by the realization that the web is an enormous document repository with many of these documents being academic-related Moreover, the web has its own citation indexes in the form of commercial search engines, and so it is ready for researchers to exploit. In fact, several major search engines can also deliver their results automatically to investigators' computer programs, allowing large-scale investigations one of the most visible outputs of webometrics is the ranking of world universities based upon their web sites and online impact. Webometrics includes link analysis, web citation analysis, search engine evaluation and purely descriptive studies of the web. These are reviewed below, in addition to one recent application: the analysis of Web 2.0 phenomena. Note that there is also some research into developing web-based metrics for web sites to evaluate various aspects of their construction, such as usability and information content, but this will not be reviewed here Since the mid-1990s increasing efforts have been made to investigate the nature and properties of the World Wide Web, named the Web in this article, by applying modern informetric methodologies to its space of contents, link structures, and search engines. Studies of the Web have been named.webometrics. By Almind and Ingwersen (1997) or.cybermetrics. As in the electronic journal of that name (1997). This article attempts to point to selected areas of webometric research that demonstrate interesting progress and space for development as well as to some currently less promising areas. The contribution is not an exhaustive review, but rather a view on the specialty. Webometrics displays several similarities to informetric and scientometric studies and the application of common bibliometric methods. For instance, simplistic counts and content analysis of web pages are like traditional publication analysis; counts and analyses of outgoing links from web pages, here named out links, and of links pointing to web pages, called in links, can be

2 seen as reference and citation analyses, respectively. Out links and in links are then similar to references and citations, respectively, in scientific articles. However, due to its dynamic and distributed nature, the Web often demonstrates web pages simultaneously linking to each other. A case not possible in the Traditional paper-based citation world. The coverage of search engines of the total Web can be investigated in the same way as the coverage of domain and citation databases in the total document landscape and possible overlaps between engines detected. Since the Web consists of contributions from anyone who wishes to contribute, the quality of information or knowledge value is opaque due to the lack of kinds of peer reviewing; but citation-like link analyses may reveal clusters of sites to be reviewed. Patterns of Web search behavior can be investigated as in traditional information seeking studies. Issue tracking on the Web is carried out and knowledge discovery attempts are made, similar to common data or text mining in administrative or textual (bibliographic) databases. Since the Web is an information space quite different from the common scientific or professional databases, the similarities mentioned above may sometimes be superficial. For example, we do not know for sure why people on the Web link up to other pages. There exists no convention of citation as in the scientific world. Further, time plays a different role on the Web. On the other hand, because the Web is a highly complex Conglomerate of all types of information carriers produced by all kinds of people and searched by all kinds of users, it is tempting to investigate; and informetrics indeed offers some methodologies to start from. However, one must be aware that like for the online application of the ISI controlled citation databases, for instance by means of the Dialog command language, data collection on the Web depends on the retrieval features of the various search engines and web robots. Prior to the appearance of the.set postings on. Command feature in Dialog during the 1990s, online citation counts were not possible; one would have to download all the citing documents to be analyzed locally for the actual number of citations. Within the ISI-defined information space. At present this is exactly the case in most Web engines, as demonstrated by Rousseau (1997; 1999). The engines do not index the entire Web, their overlaps are not substantial Lawrence and Giles, (1998), and their retrieval features too simplistic for extensive webometric analyses online.

3 Webometrics, the quantitative study of web-related phenomena, originated in the realization that methods originally designed for bibliometric analysis of scientific journal article citation patterns could be applied to the Web with commercial search engines providing the raw data. Almind and Ingwersen (1997) defined the discipline and gave its name, although the basic issue had been identi ied simultaneously by Rodriguez Gairin (1997) and was pursued in Spain by Aguillo (1998). Larson (1996) is also a pioneer with his early exploratory link structure analysis, as is Rousseau (1997) with the first pure informetric analysis of the Web. We interpret webometrics in a broad sense encompassing research from disciplines outside of Information Science such as Communication Studies, Statistical Physics and Computer Science. In this review we will concentrate on types of link analysis but also cover other webometric areas that Information Scientists have been involved with, including web log file analysis. One theme that runs through this chapter is the messiness of web data and the need for heuristics to cleanse it. This is a problem even at the most basic level of defining the Web. The uncontrolled Web creates numerous problems in the interpretation of results, for instance from the automatic creation or replication of links and deliberately misleading publishing. The loose connection between the apparent usage of top level domains and their actual content is also a frustrating problem, for example with the extensive non-commercial content hosted in.com sites. Indeed a skeptical researcher could claim that the obstacles of this kind are so great that all web analyses have little value. As will be seen below, one response to this perspective - also a recurrent theme for critics of evaluative bibliometrics - is to demonstrate significant correlation statistics to prove that information is present. A practical response has been to develop increasingly sophisticated data cleansing strategies and multiple data analysis techniques. The immense importance of the Web to scholars and the wider society means that it is essential to build an understanding of it, however difficult. This review is split into four parts: basic concepts and methods; scholarly communication on the Web; general and commercial web use; and topological modelling and mining of the Web. As a new field based around analyzing a new data source, methods of collecting and processing the data have been prominent in many studies. The second part, scholarly communication on the Web, is predominantly

4 concerned with using link analysis to identify patterns in academic or scholarly web spaces. Almost all of these studies have direct analogies in traditional bibliometrics, and have drawn from this area a concern with developing effective methods and validating results, the latter being an issue of particular concern on the Web. A key question that still does not have a satisfactory answer is how to interpret counts of links to academic web spaces. For' example, if one university web site attracts double the links of another, what conclusions should be drawn. The general and commercial web use section reviews link analysis studies that have used techniques similar to those applied to academic web spaces. Some have origins in Social Network Analysis rather than Information Science, producing an interesting complementary perspective. The section also includes quantitative studies of the size of the 'whole' Web and web server log analysis. The final section, topological modelling and mining of the Web, covers mathematical approaches to modeling the growth of the Web or its internal link structure, mostly the product of Computer Science and Statistical Physics research. It culminates with an exciting new information science contribution to this area, providing detailed interpretations of small-world linking phenomena. 4.2-Definition The origin of Webometrics can be found in the field of Information Science. Thellwall, Vaughan and Bjorneborn (2005) point out that the discipline "emerged from the realization that methods originally designed for bibliometric analysis of scientific journal article citation patterns could be applied to the Web, with commercial search engines providing the raw data". In fact, the idea that a link pointing to a webpage means a Vote' to that webpage or document is based on bibliometric methods to rank scientific production Garfield, (1979). The term Webometrics was first coined by Tomas Almind and Peter Ingwersen (1997) and seems to be widely accepted by the research community together with the term Cybermetrics. Bjorneborn (2004) defined both terms by limiting their research areas. Webometrics is "the study of the quantitative aspects of the construction and use of information resources, structures and technologies on the Web drawing on bibliometric and informetric approaches" in Bjorneborn and Ingwersen, (2004) while Cybermetrics does the same but on the whole Internet. Hence, Cybermetrics is more

5 focused on the study of non web-based Internet phenomena, e.g. s, chat, newsgroup studies, etc. Recent developments within the field suggest a move in the scope of the definition into a more general social science research approach instead of an approach that is mainly based on an informetric and bibliometric perspective. Thelwall (2009) defines Webometrics as "the study of web-based content with primarily quantitative methods for social science research goals using techniques that are not specific to one field of study". Interdisciplinary research is getting more significant by enlarging the types of subjects of study and the techniques used. This evolution aligns with the definition of Internet research given by Hine (2008) "Internet research itself is not a discipline but an interdisciplinary, a field or a research network populated by heterogeneous perspectives." 4.3-Basic concepts Bjorneborn and Ingwersen (2004) carried out the first attempt to develop a consistent terminology on the webometric field. Some years later, Thelwall and Wilkinson (2008) proposed a generic lexical framework that, based on the previous work, intended to unify and extend existing methods through abstract notions of link lists and URL lists. 4.4-The Entire Web Letters in the diagram represent any type of document in the Web, whether it is a webpage or a website, for instance. The following basic webometric terms Bjorneborn and Ingwersen have been discuss. Inlink: B has an inlink from A. Outlink: A has an outlink to B Self-link: C has a self-link. Page or site isolated: K is isolated as it does not have any inlinks or outlinks. Reciprocal links: I and J have reciprocal links. Transversal link: A has a transversal outlink to H. This type refers to a link that joins todifferent areas of the Web that are not well interconnected. Co-inlinks: 1 and 4 have a co-inlink, as B links both of them simultaneously. Co-outlinks: G has a co-outlink, as 1 and 3 are linking to it.

6 4.5-History of Webometrics the information science field of webometrics is "the study of the quantitative aspects of the construction and use of information resources, structures and technologies on the web drawing on bibliometric and informetric approaches" or, more generally, "the study of web-based content with primarily quantitative methods for social science research goals using techniques that are not specific to one field of study" While the former definition emphasizes the informetric heritage of many bibliometric methods, the latter focuses on the value that webometrics could provide to the wider social sciences, reflecting a shift in webometrics over time from more theoretical studies to more applied studies, though retaining an emphasis on methods development. Webometrics currently provides a range of methods and software for various kinds of quantitative analyses of the web, and, despite initial concerns that web data would always be easily manipulated because they are not quality-controlled, the advocates of webometrics claim that it is useful both for studies of aspects of the web itself, such as hyperlinking among academic websites, and studies of offline phenomena that might be reflected online, such as political attitudes reflected in blogs. The term webometrics was coined in 1997 by Tomas Almind and Peter Ingwersen in recognition that informetric analyses could be applied to the web The field really took off, however, with the introduction of the Web Impact Factor (WIF).metric to assess the impact of a website or other area of the web based upon the number of hyperlinks pointing to it.wifs seemed to make sense because more useful or important areas of the web would presumably attract more hyperlinks than average. The logic of this metric was derived from the importance of citations.in journal impact factors, but WIFs had the advantage that they could be easily calculated Using the new advanced search queries introduced by AltaVista, a leading commercial search engine at the time. Webometrics subsequently rose to become a large coherent field within information science, at least from a bibliometric perspective. Encompassing link analysis, web citation analysis and a range of other web-based quantitative techniques. In addition, webometrics became useful in various applied contexts, such as to construct the world webomerrics ranking of universities and for scientometric evaluations or investigations of bodies of research or research areas This article reviews a few key areas of webometrics and summarizes its contribution to information science research.

7 4.6-Exploring the web The first web pages emerged in that faraway era of the early 1990s. and the Internet were already becoming well known but the web, which like uses the Internet's global computer network to share information in commonly agreedupon ways, had its start among physicists only in It moved into the mainstream in 1993 when the National Center for Supercomputing Applications (NCSA) at the University of Illinois released Mosaic, an easy-to-use graphical web browser that ran on most standard computers. Between mid-1993 and mid-1995 the number of servers-the computers that house web sites-jumped from 130 to 22,000. Even with the user-friendly Mosaic encouraging a major expansion of this new medium, only a few historians ventured out on the web frontier. Many of the pioneers already had some technical interests or background. In November 1994 Morris Pierce, an engineer who had recently earned a history Ph.D., created one of the first departmental websites for the University of Rochester. It "seemed like a natural thing to do," he recalls. George Welling already worked in a department of humanities computing, which the University of Groningen (Netherlands) had created in In the fall of 1994, Welling developed a course in computer skills for American history students and asked them to construct an American Revolution website. Other History Web pioneers came to the medium out of experience with earlier Internet applications, particularly . In the late 1980s, Joni Makivirta, a student at the University of Jyvaskyla, Finland started an online history discussion list because he noticed lists on other topics and thought a history list would allow him "to get ideas from professional historians around the world" for his master thesis. The participants included George Welling, Thomas Zielke, who later took over the list, Richard Jensen, who went on to found H-Net in 1993, Don Mabry, a Latin American historian at Mississippi State University, and Lynn Nelson, a medievalist at the University of Kansas. In 1991, Mabry-responding to the difficulty of circulating large documents via -began to make available primary sources and other materials of interest to historians via "anonymous FTP"-a "file transfer protocol" that allows anyone with an Internet connection to download the files to their own computers.

8 Nelson created his own site and then had the idea of linking together the emerging set of history FTP sites into HNSource using Gopher, a hierarchical, menudriven system for navigating the Internet that was much more popular than the web in the early 1990s. In September 1993, just after Mosaic was released, Nelson made HNSource available through the new web protocols, and it became one of the first if not the very first historical site on the web. In the 1980s and early 1990s, the most intense energy in digital history centered not on the possibilities of online networks but rather on fixed-media products like laser disks and CD-ROM. In 1982, the Library of Congress began its Optical Disk Pilot Project, which placed text and images from its massive collections on laser disks and later CD-ROM. With a large amount of material already in digital form, the library could quickly take advantage of the newly emerging web. In 1992, it started to offer its exhibits through FTP sites. Two years later, the library posted its first webbased collection, Selected Civil War Photographs. Around the time that these early settlers carved out primitive digital history homesteads, the first signs emerged that this new frontier might feature more than noncommercial exchange. In October 1994 Marc Andreessen and some of his colleagues who had developed Mosaic at the government-funded NCSA released the first version of a commercially funded browser they called Netscape. Within months, Mosaic was, as they say, history, and Netscape was king of the World Wide Web. The Netscape era saw the History Web come into its own. In mid-1995, when one of us co-wrote the first published guide to the web for historians in the American Historical Association's (AHA) Perspectives, it announced, "the explosion in Web sites has brought with it an explosion in materials relevant to historians." Earlier that year, the Center for History and New Media (CHNM) had helped the venerable AHA launch its website; by that summer forty-five history departments had posted home pages. The online presence of the AHA and the Library of Congress provided an official imprimatur to the History Web. But in those early years, amateurs, not professional historical organizations, provided the crucial energy for much of its growth. Starting in 1995, for example, Larry Stevens, a telephone company worker from Newark, Ohio, established a series of websites on Ohio in the Civil War. The

9 sites combined his two hobbies of history' and computers, and, he explained, he "decided to carve a niche into the net before the big boys, aka Ohio Historical Society, Ohio State University, etc., entered the field." Since the mid-1990s, the history Web has spun its threads with astonishing speed. In 2004, the same search yields 640,000 hits. In the fall of 1996, we did some additional history searches with what we thought were even more remarkable results- 200 hits for the Civil War General George B. McClellan and 300 for the Socialist Eugene V. Debs. Even by 1996 the "walking city" that was the History Web a year earlier had become a sprawling megalopolis that no one person could fully explore. Yahoo counted 873 U.S. history websites in an incomplete census that fall. But seven years later, an even is complete tally returned almost ten times as many American history websites. 4.7-Webometricc Bibliometrics & Informetries Being a global document network initially developed for scholarly use Berners-Lee & Cailliau, (1990) and now inhabited by a diversity of users, the Web constitutes an obvious research area for bibliometrics, scientometrics and informetrics. A range of new terms for the emerging research area have been proposed since the mid-1990s, for instance, net metrics Bossy, (1995); web metrics Abraham, (1996); internet metrics Almind & Ingwersen, (1996); webometrics Almind & Ingwersen, (1997); Cybermetrics journal started 1997 by Isidro Aguillo ; web bibliometrics (Chakrabarti et al., 2002); web metrics (term used in Computer Science, e.g.dhyani, Keong & Bhowmick, (2002). Webometrics and cybermetrics are currently the two most widely adopted terms in Information Science, often used as synonyms. Bjorneborn & Ingwersen (in press) have proposed a differentiated terminology distinguishing between studies of the Web and studies of all Internet applications. They used an Information Science related definition of webometrics as "the study of the quantitative aspects of the construction and use of information resources, structures and technologies on the WWW drawing on bibliometric and informetric approaches" Bjorneborn & Ingwersen, This definition thus covers quantitative aspects of both the construction side and the usage side of the Web embracing the four main

10 areas of present webometric research; web page content analysis, web link structure analysis, web usage analysis (e.g., exploiting log files for users' searching and browsing behavior), and web technology analysis (including search engine performance). This includes hybrid forms, for example, Pirolli et al. (1996) who explored web analysis techniques for automatic categorization utilizing link graph topology, text content and metadata similarity, as well as usage data. All four main research areas include longitudinal studies of changes on the dynamic Web, for example, of page contents, link structures and usage patterns. So-called web archaeology Bjorneborn & Ingwersen, (2001) could in this webometric context be important for recovering historical web developments, for instance, by means of the Internet Archive ( an approach already used in webometrics Bjorneborn, (2003); Vaughan & Thelwall, (2003); Thelwall & Vaughan, (2004), Furthermore, Bjorneborn & Ingwersen have proposed cybermetrics as a generic term for "the study of the quantitative aspects of the construction and use of information resources, structures and technologies on the whole Internet, drawing on bibliometric and informetric approaches". Cybermetrics thus encompasses statistical studies of discussion groups, mailing lists, and other computer-mediated communication on the Internet (e.g., Bar Ilan, (1997); Hernandez-Borges et al., (1997); Matzat, (1998); Herring, (2002) including the Web. Besides covering all computer-mediated communication using Internet applications, this definition of cybermetrics also covers quantitative measures of the Internet backbone technology, topology and traffic Molyneux & Williams, (1999). The breadth of coverage of cybermetrics and webometrics implies large overlaps with proliferating Computer- Science-based approaches in analyses of web contents, link structures, web usage and web technologies. A range of such approaches has emerged since the mid-1990s with names like Cyber Geography / Cyber Cartography Girardin, (1996); Dodge, (1999); Dodge & Kitchin,( 2001), Web Ecology (e.g., Chi et al., (1998); Huberman, (2001), Web Mining (e.g., Etzioni, (1996); Kosala & Blocked, (2000); Chen & Chau,( 2004), Web Graph Analysis (e.g., Chakrabarti et al., (1999); Kleinberg et al., (1999); Broder et al., (2000), and Web Intelligence e.g., Yao et al., (2001). The raison for using the term webometrics in this context could be to denote a heritage to bibliometrics and informetrics and stress an Information Science perspective on Web studies.

11 There are different conceptions of informetrics, bibliometrics and scientometrics. The diagram in Fig. 1.1 Bjorneborn & Ingwersen shows the field of informetrics embracing the overlapping fields of bibliometrics and scientometrics following widely adopted definitions by, e.g., Brookes (1990), Egghe & Rousseau (1990) and Tague-Sutcliffe (1992). According to Tague-Sutcliffe (1992), informetrics is "the study of the quantitative aspects of information in any form, not just records or bibliographies, and in any social group, not just scientists". Bibliometrics is defined as "the study of the quantitative aspects of the production, dissemination and use of recorded information" and scientometrics as "the study of the quantitative aspects of science as a discipline or economic activity'tague-sutcliffe, (1992). In the figure, political economical aspects of scientometrics are covered by the part of the scientometric ellipse lying outside the bibliometric one. In this context, the field of webometrics may be seen as entirely encompassed by bibliometrics, because web documents, whether text or multimedia, are recorded information (cf. Tague-Sutcliffe's abovementioned definition of bibliometrics) stored on web servers. This recording may be temporary only, just as not all paper documents are properly archived. In the diagram, webometrics is partially covered by scientometrics, as many scholarly activities today are web-based. Furthermore, webometrics is totally included within the field of cybermetrics as defined above. In the diagram, the field of cybermetrics exceeds the boundaries of bibliometrics, because some activities in cyberspace normally are not recorded, but communicated synchronously as in chat rooms. Cyberrnetric studies of such activities still fit in the generic field of informetrics as the study of the Webometrics is a scientific discipline that studies the quantitative aspects of information sources and their use. In other words, webometrics try to measure the World Wide Web, analyses technology usage and allows us a simple content analysis. As Figure 1.1 shows, webometrics is affected by many scientific disciplines: Bibliometrics - is the quantitative analysis of documents in scientific communication; the documents reflect the state of scientific knowledge. Cybermetrics - is the quantitative research of information sources, structures and technologies on the Internet; a study of discussion groups, communication.

12 Informetrics - is focused on the information stream in networks and demonstrates on the basis of mathematical and statistical methods a variety of relations between them. Scientometrics - is focused on the evaluation of efficiency of scientificresearch or individual researchers by citation counts 4.8-Scholarly Communication on the Web The hope that web links could be used to provide similar kinds of information to that extracted from journal citations has been a key factor in motivating much webometrics research Larson, (1996); Rodriguez Gairin, (1997); Rousseau, (1997); Ingwersen, (1998); Davenport & Cronin, (2000); Cronin, (2001); Borgman & Furner, (2002); Thelwall, (2002). But can this hope be fulfilled? Although structurally very similar, journal citations are in refereed documents and therefore their production is subject to quality control and they are part of the mainstream of academic endeavor, whereas hyperlinks are none of these things, causing problems for the early hyperlinkcitation analogies, as also noted by, for instance, Meyer (2000), Egghe (2000), van Raan (2001), Bjorneborn & Ingwersen (2001) and Prime et al. (2002). In this section we will summarize the results of a series of studies, organized by scale of units analyzed, before considering the fundamental issue of the reasons why links are created that is essential to interpret the results. Finally, we will conclude with a discussion of how far the early hopes have been realized. The goal underlying almost all of the research reported here is to validate links as a new information source, as a preliminary step to extracting useful information from them. Such a task entails several different strategies, as reported by Oppenheim (2000) in the related context of patent citations. One of the key tasks is to compare the link data with other related data in order to establish the degree of correlation and overlap between the two. With links between university web sites, for instance, a positive correlation between link counts and a measure of research would provide some evidence that link creation was not completely random and could be useful for studying scholarly activities. An important methods issue is that given the typically skewed nature of web link data, nonparametric Spearman correlation tests are normally more appropriate than Pearson. Note also that many of the studies reported below have developed improved methods that have been reported in the previous section.

13 4.9-Link Analysis: Impact Measurements and Networks Link analysis drove early webometrics research, primarily through a combination of the development of improved methods and applications to a range of different contexts. Two types of studies emerged, link impact analyses and link network analyses. Link impact studies essentially compare the numbers of hyperlinks pointing to each website within a pre-defined set, such as all universities in a country or all departments within a discipline in a country. Links to university websites and, in some cases departmental websites, were found to correlate significantly with measures of research productivity or prestige, giving evidence of the validity of using link impact metrics as research-related indicator. They have been used in this role to provide an indication of the most important organizations or websites within specific groups. In addition, a breakdown of the sources of links used in the calculations has been used to identify the sources of the impact, such as the country and organization types that host most of The links. Link network research created network diagrams of the links among specified collections of websites in order to identify connectivity patterns. In addition to networks based upon direct links between pairs of sites, coinlinks have also been used to indicate connections Between pairs of sites. A co-inlink to a pair of websites A and B is a third website C that contains a hyperlink to both A and B. This relation is similar to co-citation in bibliometrics and is particularly useful when investigating websites that are similar but do not necessarily hyperlink to each other. Figure 1 is an example of a co-inlink network diagram for ASIS&T. A direct link network diagram would be likely to exclude links between pairs of sites that were similar in some way but were not directly related to each other. The nodes in the network are the websites most highly linked to from a set of 741 pages reported by Bing as containing a URL citation to "asis.org." Lines between websites indicate coinlinks between them from the 741 pages. All the organizations represented should be in some way related to ASIS&T. Green nodes are general international sites and pink nodes are university sites in the United States. Two important components of link analysis are the software and methods to extract link data. Researchers were for many years able to gather hyperlink information from commercial search engines like Bing, AltaVista and Yahoo! via their advanced link search commands, but these tools were all eventually withdrawn. Link data can still be obtained by the use of specialist link analysis web crawlers, including free programs like SocSciBot

14 ( and Issue Crawler ( as well as a range of other crawlers developed by individual researchers. The Issue Crawler initiative from sociology seems to have been particularly successful at spreading link analysis methods to the wider areas of social sciences and the humanities.within information science, hyperlink-based network diagrams have been used to investigate the interconnections between large. Groups of organizations, such as universities in Europe and organizations within a specific knowledge sector. Some link analysis research has focused on the links themselves, investigating why they are created and why some sites or pages attract more links than others. These studies seem to have focused exclusively on links in academic contexts. Content analyses have shown that links between academic websites tend to be created for scholarly or educational reasons, a partial similarity with citation analysis. Statistical tests have also been used to see which attributes of the website owners (other than research productivity or production, which was already a known factor) tend to associate with higher inlink counts, for example finding that research group website owner gender is unimportant. A recent quite comprehensive study used the most advanced statistical modeling approach yet on a large dataset to gain significant insights into the factors behind academic website interlinking in Europe. Among the findings were that country, region, domain specialism and level (whether awarding doctoral degrees or not) were the most important factors predicting hyperlinks, while reputation was 4.10-Web Citation Analysis to Altmetrics The second type of webometrics to become popular was web citation analysis: counting online citations to published academic documents like refereed journal articles. The rationale behind early research was to assess whether the web could replace traditional citation databases to assess the impact of articles in open access online journals and subsequently also for all journals. This early research found that although counts of web citations correlated with citation counts from traditional databases, many of the web citations derived from non-academic sources, such as online library catalogues. As a result, the web appeared to be an inferior source of citation impact evidence for journals or individual journal articles. This strand of webometric research gave way to more specialized investigations into particular types of web citations to academic publications, such as citations from PowerPoint presentations,online syllabi And Google Books on the basis that within these

15 restricted domains, web-based citation counts could reveal different types of impact from the scholarly impact reflected by traditional citation counts. For example, online syllabus citations could reflect the education impact or value of articles. This line of research was subsequently overtaken by the altmetrics initiative, Discussed elsewhere in this issue. A promising but relatively little studied type of webometrics is the analysis of mentions of keywords or phrases - not necessarily citations. This type of analysis was started by an investigation into the context of online mentions of academics. But the keyword approach has also been used to map concepts online and interactions between concepts online by tracking co-words in web pages Theoretical Perspectives and Information-Centered Research Webometrics has been a methods-centered field, developing methods to gather and analyze data from the web. Perhaps as a result of this focus, the theoretical component of most webometric studies has typically been drawn from citation analysis rather than being created specifically for web data. For example, many early studies assessed whether web citation counts or web link counts correlated with traditional citation counts, drawing upon Robert Merton's theoretical discussion of citation norms in science. Hence, such studies assessed to some extent how well web data fitted Merton's theory. The lack of development of specialist theory for the most developed area of webometrics, link analysis, reflects the web being a far more varied and complex space than academic journal databases, with theory development in the latter being recognized as problematic and controversial. One partial exception to the lack of native theory for webometrics is information-centered research. A style of research theorized to be particularly appropriate to webometrics. An informationcentered research study focuses on a new information source, such as a type of web data, and attempts to identify the social science research problems that the data is most suited to address rather than using a priori intuitions to match the data with a research problem and then to assess the value of the data for the problem. This theory was used to justify the development of a range of different methods to analyze web data and to match the methods to a variety of social science problem areas 4.12-Web Data Analysis Webometrics research has expanded from general or academic web analyses to investigations of social websites, often by automatically downloading data from

16 those websites either through a web crawler or through data requests sent through permitted routes (application programming interfaces). For example, exploiting the information-centered research approach, blogs and RSS feeds have been analyzed to detect public fears about science while social network sites have been investigated to detect friendship patterns and language use. Twitter has been analyzed for the sentiment of public reactions. To major media events and YouTube for the factors associated with discussions attached to online videos. In all cases, the methods of the research have been webometrics - large scale data gathering and analysis for social science purposes - but the findings of the research have been targeted at disciplines outside information science, such as media studies, politics and science communication. Many of the programs used are now publicly available in the free software Webometric Analyst 4.13-Link analysis Link analysis is the quantitative study of hyperlinks between web pages. The use of links in bibliometrics was triggered by Ingwersen's web impact factor (WIF), created through analogy to the JIF, and the potential that hyperlinks might be usable by bibliometricians in ways analogous to citations, e.g. The standard WIF measures the average number of links per page to a web space (e.g. a web site or a whole country) from external pages. The hypothesis underlying early link analysis was that the number of links targeting an academic web site might be proportional to the research productivity of the owning organization, at the level of universities, departments, research groups, or individual scientists. Essentially the two are related because more productive researchers seem to produce more web content, on average, although this content does not attract more links per page. Nevertheless, the pattern is likely to be obscured in all except large-scale studies because of the often indirect relationship between research productivity and web visibility. For example, some researchers produce highly visible web resources as the main output of their research, whilst others with equally high quality offline research attract less online attention. Subsequent hyperlink research has introduced new metrics and applications as well as improved counting methods, such as the alternative document models. In most cases this research has focused on method development or case studies. The wide variety of reasons why links are created and the fact that, unlike citing, linking is not central to any areas of science, has led to hyperlinks rarely being used in an evaluative role.

17 Nevertheless, they can be useful in describing the evolution or connectivity of research groups within a field, especially in comparison with other sources of similar information, such as citations or patents. Links are also valuable to gain insights into web use in a variety of contexts, such as by departments in different fields. A generic problem with link analysis is that the web is continually changing and seems to be constantly expanding so that webometric findings might become rapidly obsolete. A series of longitudinal investigations into university web sites in Australia, New Zealand and the UK have addressed this issue. These university web sites seem to have stabilized in size from 2001, after several years of rapid growth. A comparison of links between the web sites from year to year found that this site size stabilization concealed changes in the individual links, but concluded that typical quantitative studies could nevertheless have a shelf life of many years.

18 4.14-Web Citation Analysis A number of webometric investigations have focused not on web sites but on academic publications; using the web to count how often journal articles are cited. The rationale behind this is partly to give a second opinion for the traditional ISI data, and partly to see if the web can produce evidence of wider use of research, including informal scholarly communication and for commercial applications. A number of studies have shown that the results of web-based citation counting correlates significantly with ISI citation counts across a range of disciplines, with web citations being typically more numerous. Nevertheless, many of the online citations are relatively trivial, for example appearing in journal contents lists rather than in the reference sections of academic articles. If this can be automated then it would give an interesting alternative to the ISI citation indexes 4.15-Search Engines A significant amount of webometrics research has evaluated commercial search engines. The two main investigation topics have been the extent of the coverage of the web and the accuracy of the reported results. Research into developing search engine algorithms (information retrieval), and into how search engines are used (information seeking) are not part of webometrics. The two audiences for webometrics search engine research are researchers who use the engines for data gathering (e.g. the link counts above) and web searchers wanting to understand their results. Search engines have been a main portal to the web for most users since the early years. Hence, it has been logical to assess how much of the web they cover. In 1999, a survey of the main search engines estimated that none covered more than 17.5% of the 'index able' web and that the overlap between search engines was surprisingly low. Here the 'index able' web is roughly the set of pages that a perfect search engine could be expected to find if it found all web site home pages and followed links to find the remainder of pages in the sites. The absence of comparable figures after 1999 is due to three factors: first, an obscure Hypertext Transfer Protocol technology, the virtual server, has rendered the sampling method of Lawrence and Giles ineffective; second, the rise of dynamic pages means that it is no longer reasonable to talk in terms of the 'total number of web pages'; finally, given that search engine coverage of the web is only partial, the exact

19 percentage is not particularly relevant, unless it has substantially changed. One outcome of this research, however, was clear evidence that meta-search engines could give more results through combining multiple engines. Nevertheless, these have lost out to Google, presumably because the key task of a search engine is to deliver relevant results in the first results page, rather than a comprehensive list of pages. Given that web coverage is partial, is it biased in any important ways? This is important because the key role of search engines as intermediaries between web users and content gives them considerable economic power in the new online economy. In fact, coverage is biased internationally In favor of countries that were early adopters of the web. This is a side effect of the way search engines find pages rather than a policy decision. The issue of accuracy of search engine results is multifaceted, relating to the extent to which a search engine correctly reports its own knowledge of the web. Bar-Ilan and Peritz have shown that search engines are not internally consistent in the way they report results to users. Through a longitudinal analysis of the results of the query 'Informetric OR Informetrics' in Google they showed that search engines reported only a fraction of the pages in their database. Although some of the omitted pages duplicated other returned results, this was not always the case and so some information would be lost to the user. A related analysis with Microsoft Live Search suggested that one reason for lost information could be the search engine policy of returning a maximum of two pages per site. Many webometric studies have used the hit count estimates provided by search engines on their Results pages (e.g. the '50,000* in 'Results 1-10 of about 50,000') rather than the list of matching URLs. For example, Ingwersen used these to estimate the number of hyperlinks between pairs of countries. The problem with using these estimates is that they can be unreliable and can even lead to inconsistencies, such as expanded queries giving fewer results. In the infancy of webometrics these estimates could be highly variable and so techniques were proposed to smooth out the inconsistencies, although the estimates subsequently became much more stable. A recent analysis of the accuracy of hit count estimates for Live Search found a surprising pattern Measuring Web 2.0 Web 2.0 is a term coined by the publisher Tim O'Reilly mainly to refer to web sites that are driven by consumer content, such as blogs, Wikipedia and social

20 network sites. The growth in volume of web content created by ordinary users has spawned a market intelligence industry and much measurement research. The idea behind these is data mining: since so many people have recorded informal thoughts online in various formats, such as blogs, chartrooms, bulletin boards and social network sites, it should be possible to extract patterns such as consumer reactions to products or world events. In order to address issues like these, new software has been developed by large companies like IBM's Web Fountain and Microsoft's Pulse. In addition, specialist web intelligence companies like Nielsen Buzz Metrics and Market Sentinel have been created or adapted. A good example of a research initiative to harness consumer generated media (CGM) is an attempt to predict sales patterns for books based upon the volume of blog discussions of them. The predictions had only limited success, however, perhaps because people often blogged about books after reading them, when it would be too late to predict a purchase. Other similar research has had less commercial goals. Gruhl et al. analysed the volume of discussion for a selection of topics in blog space, finding several different patterns. For example, some topics were discussed for one short period of time only, whereas others were discussed continuously, with or without occasional bursts of extra debate. A social sciences-oriented study sought to build retrospective timelines for major events from blog and news discussions, finding this to be possible to a limited extent. Problems occurred, for example, when a long running series of similar relatively minor events received little discussion but omitting them all from a timeline would omit an important aspect of the overall event. In addition to the data mining style of research, there have been many studies of Web 2.0 sites in Order to describe their contents and explain user behavior in them. Here, research into social network sites is reviewed. A large-scale study of the early years of Face book provides the most comprehensive overview of user activities. The data came from February 2004 to March 2006, when Face book was a social network site exclusively for US college students. Users seemed to fit their Face booking with their normal pattern of computer use whilst studying, rather than allocating separate times. In terms of the geography of friendship, members mainly used Face book to communicate with other students at the same college rather than school friends at distant universities. This suggests that social networking is an extension of offline communication rather than promoting radically new geographies of communication, although the latter is enabled by the

21 technology of Face book. This conclusion is supported by qualitative research into another popular site, MySpace. A webometric study of MySpace has indirectly investigated activity levels but focused on member profiles. Amongst other findings, this showed that about a third of registered members accessed the site weekly and the average reported age was 21. Although other research found that MySpace close friends tended to reflect offline friendships, both male and female users preferred to have a majority of female friends. Another study looked at the geography of friendship, finding that the majority of friends tended to live within a hundred miles, although a minority lived in the same town or city. Finally, many statistics about Web 2.0 have been published by market research companies. Despite the uncertain provenance of this data, the results sometimes seem reasonable and also, because of the cost of obtaining the data, seem unlikely to be duplicated by academic researchers. An example is the announcement by Hit Wise that MySpace had supplanted Google as the most visited web site by US users by December The data for this was reported to come from two million US web users via an agreement between Hit Wise and the users' internet service providers. Making the results of overview analyses public gives useful publicity to Hit Wise and valuable insights to web researchers The Development of Policy-Relevant Webometrics As introduced above, early link analysis webometrics developed methods or indicators but no clear practical applications. Early studies began with the Web Impact Factor, a type of calculation based on counting links to a web site or other web space Ingwersen, and (1998). This calculation was practical because links to a web space could be easily counted and listed using an advanced query in the web search engine AltaVista. It gave the promise that the impact of whole areas of the web, including entire countries, could be assessed and was inspired by the journal Impact Factor Garfield, (1999). Subsequent research found problems including the unreliability of search engines Bar-Ilan, (1999); Mettrop & Nieuwenhuysen, (2001) and the existence of links created for spam or recreational reasons Smith, (1999). This may have prevented the early adoption of Web Impact Factors as policy-relevant indicators and they subsequently attracted less interest. After the initial research there was a period of methodological development in which webometrics defined its key terminology Bjorneborn & Ingwersen, (2004), developed specialist data collection and analysis software Cothey, (2004); Heimeriks

The Webometrics. Prashant Goswami Umesh Sharma Anil Kumar Shukla

The Webometrics. Prashant Goswami Umesh Sharma Anil Kumar Shukla 656 International CALIBER-2008 The Webometrics Prashant Goswami Umesh Sharma Anil Kumar Shukla Abstract It has been experienced that web based information resources have great role to play in academic

More information

Objectives of the Webometrics Ranking of World's Universities (2016)

Objectives of the Webometrics Ranking of World's Universities (2016) Objectives of the Webometrics Ranking of World's Universities (2016) The original aim of the Ranking was to promote Web publication. Supporting Open Access initiatives, electronic access to scientific

More information

Webometric evaluation of institutional repositories

Webometric evaluation of institutional repositories Webometric evaluation of institutional repositories 1 Alastair G Smith 1 alastair.smith@vuw.ac.nz School of Information Management, Victoria University of Wellington, New Zealand Abstract This research

More information

Web Based Impact Measures for Institutional Repositories

Web Based Impact Measures for Institutional Repositories Web Based Impact Measures for Institutional Repositories Alastair G Smith 1 alastair.smith@vuw.ac.nz Victoria University of Wellington/ Te Whare Wānanga o te Ūpoko o te Ika a Māui, School of Information

More information

Benchmarking Google Scholar with the New Zealand PBRF research assessment exercise

Benchmarking Google Scholar with the New Zealand PBRF research assessment exercise Benchmarking Google Scholar with the New Zealand PBRF research assessment exercise Alastair G Smith School of Information Management Victoria University of Wellington New Zealand alastair.smith@vuw.ac.nz

More information

Overview of Web Mining Techniques and its Application towards Web

Overview of Web Mining Techniques and its Application towards Web Overview of Web Mining Techniques and its Application towards Web *Prof.Pooja Mehta Abstract The World Wide Web (WWW) acts as an interactive and popular way to transfer information. Due to the enormous

More information

DIGITAL ARCHIVING OF SPECIFIC SCIENTIFIC INFORMATION IN THE CZECH REPUBLIC

DIGITAL ARCHIVING OF SPECIFIC SCIENTIFIC INFORMATION IN THE CZECH REPUBLIC Data Science Journal, Volume 4, 31 December 2005 237 DIGITAL ARCHIVING OF SPECIFIC SCIENTIFIC INFORMATION IN THE CZECH REPUBLIC P. Slavik*, P. Mach, M. Snorek and J. Koutnik * Dept. of Computer Science

More information

CE4031 and CZ4031 Database System Principles

CE4031 and CZ4031 Database System Principles CE431 and CZ431 Database System Principles Course CE/CZ431 Course Database System Principles CE/CZ21 Algorithms; CZ27 Introduction to Databases CZ433 Advanced Data Management (not offered currently) Lectures

More information

Chapter 3: Google Penguin, Panda, & Hummingbird

Chapter 3: Google Penguin, Panda, & Hummingbird Chapter 3: Google Penguin, Panda, & Hummingbird Search engine algorithms are based on a simple premise: searchers want an answer to their queries. For any search, there are hundreds or thousands of sites

More information

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS 1 WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS BRUCE CROFT NSF Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts,

More information

VOLUME 10 (2006): ISSUE 1. PAPER 3. Operationalising Websites : lexically, semantically or topologically?

VOLUME 10 (2006): ISSUE 1. PAPER 3. Operationalising Websites : lexically, semantically or topologically? International Journal of Scientometrics, Informetrics and Bibliometrics ISSN 1137-5019 > Homepage > The Journal > Issues Contents > Vol. 10 (2006) > Paper 3 VOLUME 10 (2006): ISSUE 1. PAPER 3 Abstract

More information

Introduction to Webometrics: Web-based Methods for Research Evaluation

Introduction to Webometrics: Web-based Methods for Research Evaluation Information Studies Introduction to Webometrics: Web-based Methods for Research Evaluation Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Contents What is webometrics?

More information

Knowing something about how to create this optimization to harness the best benefits will definitely be advantageous.

Knowing something about how to create this optimization to harness the best benefits will definitely be advantageous. Blog Post Optimizer Contents Intro... 3 Page Rank Basics... 3 Using Articles And Blog Posts... 4 Using Backlinks... 4 Using Directories... 5 Using Social Media And Site Maps... 6 The Downfall Of Not Using

More information

CE4031 and CZ4031 Database System Principles

CE4031 and CZ4031 Database System Principles CE4031 and CZ4031 Database System Principles Academic AY1819 Semester 1 CE/CZ4031 Database System Principles s CE/CZ2001 Algorithms; CZ2007 Introduction to Databases CZ4033 Advanced Data Management (not

More information

Discovery services: next generation of searching scholarly information

Discovery services: next generation of searching scholarly information Discovery services: next generation of searching scholarly information Article (Unspecified) Keene, Chris (2011) Discovery services: next generation of searching scholarly information. Serials, 24 (2).

More information

CYBERCRIME AS A NEW FORM OF CONTEMPORARY CRIME

CYBERCRIME AS A NEW FORM OF CONTEMPORARY CRIME FACULTY OF LAW DEPARTEMENT: CIVIL LAW MASTER STUDY THEME: CYBERCRIME AS A NEW FORM OF CONTEMPORARY CRIME Mentor: Prof. Ass. Dr. Xhemajl Ademaj Candidate: Abdurrahim Gashi Pristinë, 2015 Key words List

More information

In the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google,

In the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google, 1 1.1 Introduction In the recent past, the World Wide Web has been witnessing an explosive growth. All the leading web search engines, namely, Google, Yahoo, Askjeeves, etc. are vying with each other to

More information

Archives in a Networked Information Society: The Problem of Sustainability in the Digital Information Environment

Archives in a Networked Information Society: The Problem of Sustainability in the Digital Information Environment Archives in a Networked Information Society: The Problem of Sustainability in the Digital Information Environment Shigeo Sugimoto Research Center for Knowledge Communities Graduate School of Library, Information

More information

Massive Data Analysis

Massive Data Analysis Professor, Department of Electrical and Computer Engineering Tennessee Technological University February 25, 2015 Big Data This talk is based on the report [1]. The growth of big data is changing that

More information

New Zealand information on the Internet: the power to find the knowledge

New Zealand information on the Internet: the power to find the knowledge New Zealand information on the Internet: the power to find the knowledge Smith, Alastair G. School of Information Management, Victoria University of Wellington, Wellington, New Zealand Paper for presentation

More information

Adaptable and Adaptive Web Information Systems. Lecture 1: Introduction

Adaptable and Adaptive Web Information Systems. Lecture 1: Introduction Adaptable and Adaptive Web Information Systems School of Computer Science and Information Systems Birkbeck College University of London Lecture 1: Introduction George Magoulas gmagoulas@dcs.bbk.ac.uk October

More information

Evaluating the suitability of Web 2.0 technologies for online atlas access interfaces

Evaluating the suitability of Web 2.0 technologies for online atlas access interfaces Evaluating the suitability of Web 2.0 technologies for online atlas access interfaces Ender ÖZERDEM, Georg GARTNER, Felix ORTAG Department of Geoinformation and Cartography, Vienna University of Technology

More information

History and Backgound: Internet & Web 2.0

History and Backgound: Internet & Web 2.0 1 History and Backgound: Internet & Web 2.0 History of the Internet and World Wide Web 2 ARPANET Implemented in late 1960 s by ARPA (Advanced Research Projects Agency of DOD) Networked computer systems

More information

Headings: Academic Libraries. Database Management. Database Searching. Electronic Information Resource Searching Evaluation. Web Portals.

Headings: Academic Libraries. Database Management. Database Searching. Electronic Information Resource Searching Evaluation. Web Portals. Erin R. Holmes. Reimagining the E-Research by Discipline Portal. A Master s Project for the M.S. in IS degree. April, 2014. 20 pages. Advisor: Emily King This project presents recommendations and wireframes

More information

INTRODUCTION. Chapter GENERAL

INTRODUCTION. Chapter GENERAL Chapter 1 INTRODUCTION 1.1 GENERAL The World Wide Web (WWW) [1] is a system of interlinked hypertext documents accessed via the Internet. It is an interactive world of shared information through which

More information

A comparison of computer science and software engineering programmes in English universities

A comparison of computer science and software engineering programmes in English universities A comparison of computer science and software engineering programmes in English universities Meziane, F and Vadera, S Title Authors Type URL Published Date 2004 A comparison of computer science and software

More information

Accessibility of INGO FAST 1997 ARTVILLE, LLC. 32 Spring 2000 intelligence

Accessibility of INGO FAST 1997 ARTVILLE, LLC. 32 Spring 2000 intelligence Accessibility of INGO FAST 1997 ARTVILLE, LLC 32 Spring 2000 intelligence On the Web Information On the Web Steve Lawrence C. Lee Giles Search engines do not index sites equally, may not index new pages

More information

5 Choosing keywords Initially choosing keywords Frequent and rare keywords Evaluating the competition rates of search

5 Choosing keywords Initially choosing keywords Frequent and rare keywords Evaluating the competition rates of search Seo tutorial Seo tutorial Introduction to seo... 4 1. General seo information... 5 1.1 History of search engines... 5 1.2 Common search engine principles... 6 2. Internal ranking factors... 8 2.1 Web page

More information

Google technology for teachers

Google technology for teachers Google technology for teachers Sandhya Digambar Shinde Assistant Professor, Department of Library and Information Science, Jayakar Library, University of Pune-411007 Pune, Maharashtra, India srmaharnor@unipune.ac.in

More information

INSPIRE and SPIRES Log File Analysis

INSPIRE and SPIRES Log File Analysis INSPIRE and SPIRES Log File Analysis Cole Adams Science Undergraduate Laboratory Internship Program Wheaton College SLAC National Accelerator Laboratory August 5, 2011 Prepared in partial fulfillment of

More information

Linked Title Mentions: A New Automated Link Search Candidate 1

Linked Title Mentions: A New Automated Link Search Candidate 1 Linked Mentions: A New Automated Link Search Candidate 1 Pardeep Sud, Mike Thelwall Statistical Cybermetrics Research Group, School of Mathematics and Computer Science, University of Wolverhampton, Wulfruna

More information

EERQI Innovative Indicators and Test Results

EERQI Innovative Indicators and Test Results This project is funded by the Socioeconomic Sciences and Humanities Section. EERQI Final Conference, Brussels, 15-16 March 2011 EERQI Innovative Indicators and Test Results Prof. Dr. Stefan Gradmann /

More information

Does metadata count? A Webometric investigation

Does metadata count? A Webometric investigation Proc. Int. Conf. on Dublin Core and Metadata for e-communities 2002: 133-138 Firenze University Press Does metadata count? A Webometric investigation Alastair G. Smith School of Information Management

More information

Your Institution's Footprint in the Web

Your Institution's Footprint in the Web V Seminario Internacional sobre Estudios Cualitativos y Cuantitativos de la Ciencia y la Técnica Prof. Gilberto Sotolongo Aguilar Palacio de Convenciones de La Habana, Cuba, 19 23 Abril, 2010 Your Institution's

More information

SciVerse Scopus. 1. Scopus introduction and content coverage. 2. Scopus in comparison with Web of Science. 3. Basic functionalities of Scopus

SciVerse Scopus. 1. Scopus introduction and content coverage. 2. Scopus in comparison with Web of Science. 3. Basic functionalities of Scopus Prepared by: Jawad Sayadi Account Manager, United Kingdom Elsevier BV Radarweg 29 1043 NX Amsterdam The Netherlands J.Sayadi@elsevier.com SciVerse Scopus SciVerse Scopus 1. Scopus introduction and content

More information

Information Retrieval Spring Web retrieval

Information Retrieval Spring Web retrieval Information Retrieval Spring 2016 Web retrieval The Web Large Changing fast Public - No control over editing or contents Spam and Advertisement How big is the Web? Practically infinite due to the dynamic

More information

Finding open access articles using Google, Google Scholar, OAIster and OpenDOAR

Finding open access articles using Google, Google Scholar, OAIster and OpenDOAR Loughborough University Institutional Repository Finding open access articles using Google, Google Scholar, OAIster and OpenDOAR This item was submitted to Loughborough University's Institutional Repository

More information

From Scratch to the Web: Terminological Theses at the University of Innsbruck

From Scratch to the Web: Terminological Theses at the University of Innsbruck Peter Sandrini University of Innsbruck From Scratch to the Web: Terminological Theses at the University of Innsbruck Terminology Diploma Theses (TDT) have been well established in the training of translators

More information

Data analysis using Microsoft Excel

Data analysis using Microsoft Excel Introduction to Statistics Statistics may be defined as the science of collection, organization presentation analysis and interpretation of numerical data from the logical analysis. 1.Collection of Data

More information

Chapter 2 BACKGROUND OF WEB MINING

Chapter 2 BACKGROUND OF WEB MINING Chapter 2 BACKGROUND OF WEB MINING Overview 2.1. Introduction to Data Mining Data mining is an important and fast developing area in web mining where already a lot of research has been done. Recently,

More information

User-Centered Evaluation of a Discovery Layer System with Google Scholar

User-Centered Evaluation of a Discovery Layer System with Google Scholar Purdue University Purdue e-pubs Libraries Faculty and Staff Scholarship and Research Purdue Libraries 2013 User-Centered Evaluation of a Discovery Layer System with Google Scholar Tao Zhang Purdue University,

More information

Using the Internet and the World Wide Web

Using the Internet and the World Wide Web Using the Internet and the World Wide Web Computer Literacy BASICS: A Comprehensive Guide to IC 3, 3 rd Edition 1 Objectives Understand the difference between the Internet and the World Wide Web. Identify

More information

Enhanced retrieval using semantic technologies:

Enhanced retrieval using semantic technologies: Enhanced retrieval using semantic technologies: Ontology based retrieval as a new search paradigm? - Considerations based on new projects at the Bavarian State Library Dr. Berthold Gillitzer 28. Mai 2008

More information

For Attribution: Developing Data Attribution and Citation Practices and Standards

For Attribution: Developing Data Attribution and Citation Practices and Standards For Attribution: Developing Data Attribution and Citation Practices and Standards Board on Research Data and Information Policy and Global Affairs Division National Research Council in collaboration with

More information

Ranking Web of Repositories Metrics, results and a plea for a change. Isidro F. Aguillo Cybermetrics Lab CCHS - CSIC

Ranking Web of Repositories Metrics, results and a plea for a change. Isidro F. Aguillo Cybermetrics Lab CCHS - CSIC Ranking Web of Repositories Metrics, results and a plea for a change Isidro F. Aguillo Cybermetrics Lab CCHS - CSIC Isidro.aguillo@cchs.csic.es 1 Agenda 00:00 A complex scenario: Open Access Initiatives

More information

CHAPTER THREE INFORMATION RETRIEVAL SYSTEM

CHAPTER THREE INFORMATION RETRIEVAL SYSTEM CHAPTER THREE INFORMATION RETRIEVAL SYSTEM 3.1 INTRODUCTION Search engine is one of the most effective and prominent method to find information online. It has become an essential part of life for almost

More information

Ambiguity Handling in Mobile-capable Social Networks

Ambiguity Handling in Mobile-capable Social Networks Ambiguity Handling in Mobile-capable Social Networks Péter Ekler Department of Automation and Applied Informatics Budapest University of Technology and Economics peter.ekler@aut.bme.hu Abstract. Today

More information

Enhanced Performance of Search Engine with Multitype Feature Co-Selection of Db-scan Clustering Algorithm

Enhanced Performance of Search Engine with Multitype Feature Co-Selection of Db-scan Clustering Algorithm Enhanced Performance of Search Engine with Multitype Feature Co-Selection of Db-scan Clustering Algorithm K.Parimala, Assistant Professor, MCA Department, NMS.S.Vellaichamy Nadar College, Madurai, Dr.V.Palanisamy,

More information

ICT-U CAMEROON, P.O. Box 526 Yaounde, Cameroon. Schools and Programs DETAILED ICT-U PROGRAMS AND CORRESPONDING CREDIT HOURS

ICT-U CAMEROON, P.O. Box 526 Yaounde, Cameroon. Schools and Programs DETAILED ICT-U PROGRAMS AND CORRESPONDING CREDIT HOURS Website: http:// ICT-U CAMEROON, P.O. Box 526 Yaounde, Cameroon Schools and Programs DETAILED ICT-U PROGRAMS AND CORRESPONDING CREDIT HOURS Important note on English as a Second Language (ESL) and International

More information

Motivations for URL citations to open access library and information science articles 1

Motivations for URL citations to open access library and information science articles 1 Motivations for URL citations to open access library and information science articles 1 KAYVAN KOUSHA PhD Student, Department of Library and Information Science, University of Tehran, Jalal-Al-e-Ahmed,

More information

Received: 15/04/2012 Reviewed: 26/04/2012 Accepted: 30/04/2012

Received: 15/04/2012 Reviewed: 26/04/2012 Accepted: 30/04/2012 Exploring Deep Web Devendra N. Vyas, Asst. Professor, Department of Commerce, G. S. Science Arts and Commerce College Khamgaon, Dist. Buldhana Received: 15/04/2012 Reviewed: 26/04/2012 Accepted: 30/04/2012

More information

= a hypertext system which is accessible via internet

= a hypertext system which is accessible via internet 10. The World Wide Web (WWW) = a hypertext system which is accessible via internet (WWW is only one sort of using the internet others are e-mail, ftp, telnet, internet telephone... ) Hypertext: Pages of

More information

The future of UC&C on mobile

The future of UC&C on mobile SURVEY REPORT The future of UC&C on mobile Published by 2018 Introduction The future of UC&C on mobile report gives us insight into how operators and manufacturers around the world rate their unified communication

More information

Evaluating the Usefulness of Sentiment Information for Focused Crawlers

Evaluating the Usefulness of Sentiment Information for Focused Crawlers Evaluating the Usefulness of Sentiment Information for Focused Crawlers Tianjun Fu 1, Ahmed Abbasi 2, Daniel Zeng 1, Hsinchun Chen 1 University of Arizona 1, University of Wisconsin-Milwaukee 2 futj@email.arizona.edu,

More information

Running head: ASSESSING THE CHALLENGES OF LOCAL HISTORY DIGITIZATION 1. Assessing the Challenges of Local History Archive Digitization Projects

Running head: ASSESSING THE CHALLENGES OF LOCAL HISTORY DIGITIZATION 1. Assessing the Challenges of Local History Archive Digitization Projects Running head: ASSESSING THE CHALLENGES OF LOCAL HISTORY DIGITIZATION 1 Assessing the Challenges of Local History Archive Digitization Projects Alexander P. Merrill Wayne State University ASSESSING THE

More information

Evaluating Web Ranking Metrics for Saudi Universities

Evaluating Web Ranking Metrics for Saudi Universities Evaluating Web Ranking Metrics for Saudi Universities Ahmad Albhaishi 1, Heider A. Wahsheh 1, Tami Alghamdi 1 1 King Khalid University/ College of Computer Science, Computer Science Department Abha, Saudi

More information

1 von 5 13/10/2005 17:44 high graphics home search browse about whatsnew submit sitemap Factors affecting the quality of an information source The purpose of this document is to explain the factors affecting

More information

Consistent Measurement of Broadband Availability

Consistent Measurement of Broadband Availability Consistent Measurement of Broadband Availability By Advanced Analytical Consulting Group, Inc. September 2016 Abstract This paper provides several, consistent measures of broadband availability from 2009

More information

The ebuilders Guide to selecting a Web Designer

The ebuilders Guide to selecting a Web Designer The ebuilders Guide to selecting a Web Designer With the following short guide we hope to give you and your business a better grasp of how to select a web designer. We also include a short explanation

More information

The Tagging Tangle: Creating a librarian s guide to tagging. Gillian Hanlon, Information Officer Scottish Library & Information Council

The Tagging Tangle: Creating a librarian s guide to tagging. Gillian Hanlon, Information Officer Scottish Library & Information Council The Tagging Tangle: Creating a librarian s guide to tagging Gillian Hanlon, Information Officer Scottish Library & Information Council Introduction Scottish Library and Information Council (SLIC) advisory

More information

Economics of Information Networks

Economics of Information Networks Economics of Information Networks Stephen Turnbull Division of Policy and Planning Sciences Lecture 4: December 7, 2017 Abstract We continue discussion of the modern economics of networks, which considers

More information

Characteristics of Students in the Cisco Networking Academy: Attributes, Abilities, and Aspirations

Characteristics of Students in the Cisco Networking Academy: Attributes, Abilities, and Aspirations Cisco Networking Academy Evaluation Project White Paper WP 05-02 October 2005 Characteristics of Students in the Cisco Networking Academy: Attributes, Abilities, and Aspirations Alan Dennis Semiral Oncu

More information

1. The Best Practices Section < >

1. The Best Practices Section <   > DRAFT A Review of the Current Status of the Best Practices Project Website and a Proposal for Website Expansion August 25, 2009 Submitted by: ASTDD Best Practices Project I. Current Web Status A. The Front

More information

Meaning & Concepts of Databases

Meaning & Concepts of Databases 27 th August 2015 Unit 1 Objective Meaning & Concepts of Databases Learning outcome Students will appreciate conceptual development of Databases Section 1: What is a Database & Applications Section 2:

More information

You need to start your research and most people just start typing words into Google, but that s not the best way to start.

You need to start your research and most people just start typing words into Google, but that s not the best way to start. Academic Research Using Google Worksheet This worksheet is designed to have you examine using various Google search products for research. The exercise is not extensive but introduces you to things that

More information

Identifying user behavior in domain-specific repositories

Identifying user behavior in domain-specific repositories Information Services & Use 34 (2014) 249 258 249 DOI 10.3233/ISU-140745 IOS Press Identifying user behavior in domain-specific repositories Wilko van Hoek, Wei Shen and Philipp Mayr GESIS Leibniz Institute

More information

Digital Library on Societal Impacts Draft Requirements Document

Digital Library on Societal Impacts Draft Requirements Document Table of Contents Digital Library on Societal Impacts Draft Requirements Document Eric Scharff Introduction... 1 System Description... 1 User Interface... 3 Infrastructure... 3 Content... 4 Work Already

More information

A Study on Website Quality Models

A Study on Website Quality Models International Journal of Scientific and Research Publications, Volume 4, Issue 12, December 2014 1 A Study on Website Quality Models R.Anusha Department of Information Systems Management, M.O.P Vaishnav

More information

data elements (Delsey, 2003) and by providing empirical data on the actual use of the elements in the entire OCLC WorldCat database.

data elements (Delsey, 2003) and by providing empirical data on the actual use of the elements in the entire OCLC WorldCat database. Shawne D. Miksa, William E. Moen, Gregory Snyder, Serhiy Polyakov, Amy Eklund Texas Center for Digital Knowledge, University of North Texas Denton, Texas, U.S.A. Metadata Assistance of the Functional Requirements

More information

Research Report: Voice over Internet Protocol (VoIP)

Research Report: Voice over Internet Protocol (VoIP) Research Report: Voice over Internet Protocol (VoIP) Statement Publication date: 26 July 2007 Contents Section Page 1 Executive Summary 1 2 Background and research objectives 3 3 Awareness of VoIP 5 4

More information

WWW Hyperlink Networks

WWW Hyperlink Networks WWW Hyperlink Networks Robert Ackland Australian Demographic and Social Research Institute (ADSRI) The Australian National University robert.ackland@anu.edu.au http://voson.anu.edu.au Notes prepared for

More information

Data Curation Profile Human Genomics

Data Curation Profile Human Genomics Data Curation Profile Human Genomics Profile Author Profile Author Institution Name Contact J. Carlson N. Brown Purdue University J. Carlson, jrcarlso@purdue.edu Date of Creation October 27, 2009 Date

More information

Interim Report Technical Support for Integrated Library Systems Comparison of Open Source and Proprietary Software

Interim Report Technical Support for Integrated Library Systems Comparison of Open Source and Proprietary Software Interim Report Technical Support for Integrated Library Systems Comparison of Open Source and Proprietary Software Vandana Singh Assistant Professor, School of Information Science, University of Tennessee,

More information

Evaluating User Behavior on Data Collections in a Digital Library

Evaluating User Behavior on Data Collections in a Digital Library Evaluating User Behavior on Data Collections in a Digital Library Michalis Sfakakis 1 and Sarantos Kapidakis 2 1 National Documentation Centre / National Hellenic Research Foundation 48 Vas. Constantinou,

More information

The Knowledge Portal, or, the Vision of Easy Access to Information

The Knowledge Portal, or, the Vision of Easy Access to Information The Knowledge Portal, or, the Vision of Easy Access to Information Wolfram Neubauer and Arlette Piguet ETH Library and Collections, Swiss Federal Institute of Technology, Zurich, Switzerland Abstract:

More information

Scuola di dottorato in Scienze molecolari Information literacy in chemistry 2015 SCOPUS

Scuola di dottorato in Scienze molecolari Information literacy in chemistry 2015 SCOPUS SCOPUS ORIGINAL RESEARCH INFORMATION IN SCIENCE is published (stored) in PRIMARY LITERATURE it refers to the first place a scientist will communicate to the general audience in a publicly accessible document

More information

Lecture #3: PageRank Algorithm The Mathematics of Google Search

Lecture #3: PageRank Algorithm The Mathematics of Google Search Lecture #3: PageRank Algorithm The Mathematics of Google Search We live in a computer era. Internet is part of our everyday lives and information is only a click away. Just open your favorite search engine,

More information

Using Scopus. Scopus. To access Scopus, go to the Article Databases tab on the library home page and browse by title.

Using Scopus. Scopus. To access Scopus, go to the Article Databases tab on the library home page and browse by title. Using Scopus Databases are the heart of academic research. We would all be lost without them. Google is a database, and it receives almost 6 billion searches every day. Believe it or not, however, there

More information

The Internet and the Web. recall: the Internet is a vast, international network of computers

The Internet and the Web. recall: the Internet is a vast, international network of computers The Internet and the Web 1 History of Internet recall: the Internet is a vast, international network of computers the Internet traces its roots back to the early 1960s MIT professor J.C.R. Licklider published

More information

CSC105, Introduction to Computer Science I. Introduction and Background. search service Web directories search engines Web Directories database

CSC105, Introduction to Computer Science I. Introduction and Background. search service Web directories search engines Web Directories database CSC105, Introduction to Computer Science Lab02: Web Searching and Search Services I. Introduction and Background. The World Wide Web is often likened to a global electronic library of information. Such

More information

AN OVERVIEW OF SEARCHING AND DISCOVERING WEB BASED INFORMATION RESOURCES

AN OVERVIEW OF SEARCHING AND DISCOVERING WEB BASED INFORMATION RESOURCES Journal of Defense Resources Management No. 1 (1) / 2010 AN OVERVIEW OF SEARCHING AND DISCOVERING Cezar VASILESCU Regional Department of Defense Resources Management Studies Abstract: The Internet becomes

More information

A Balanced Introduction to Computer Science, 3/E David Reed, Creighton University 2011 Pearson Prentice Hall ISBN

A Balanced Introduction to Computer Science, 3/E David Reed, Creighton University 2011 Pearson Prentice Hall ISBN A Balanced Introduction to Computer Science, 3/E David Reed, Creighton University 2011 Pearson Prentice Hall ISBN 978-0-13-216675-1 Chapter 3 The Internet and the Web 1 History of Internet recall: the

More information

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani LINK MINING PROCESS Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani Higher Colleges of Technology, United Arab Emirates ABSTRACT Many data mining and knowledge discovery methodologies and process models

More information

Comparison of Generalized Webometrics to the Institutional Webometrics Ranking

Comparison of Generalized Webometrics to the Institutional Webometrics Ranking Volume-5, Issue-3, June-2015 International Journal of Engineering and Management Research Page Number: 209-214 Comparison of Generalized Webometrics to the Institutional Webometrics Ranking Deepak Patidar

More information

Consistent Measurement of Broadband Availability

Consistent Measurement of Broadband Availability Consistent Measurement of Broadband Availability FCC Data through 12/2015 By Advanced Analytical Consulting Group, Inc. December 2016 Abstract This paper provides several, consistent measures of broadband

More information

Module 1: Internet Basics for Web Development (II)

Module 1: Internet Basics for Web Development (II) INTERNET & WEB APPLICATION DEVELOPMENT SWE 444 Fall Semester 2008-2009 (081) Module 1: Internet Basics for Web Development (II) Dr. El-Sayed El-Alfy Computer Science Department King Fahd University of

More information

Integrating Lecture Recordings with Social Networks

Integrating Lecture Recordings with Social Networks Integrating Lecture Recordings with Social Networks Patrick Fox, Johannes Emden, Nicolas Neubauer and Oliver Vornberger Institute of Computer Science University of Osnabru ck Germany, 49069 Osnabru ck

More information

Information Retrieval May 15. Web retrieval

Information Retrieval May 15. Web retrieval Information Retrieval May 15 Web retrieval What s so special about the Web? The Web Large Changing fast Public - No control over editing or contents Spam and Advertisement How big is the Web? Practically

More information

The Tangled Web We Weave Managing the Social Identity Crisis on Mobile Devices

The Tangled Web We Weave Managing the Social Identity Crisis on Mobile Devices W3C Workshop on the Future of Social Networking January 15-16, 2009 Barcelona, Spain The Tangled Web We Weave Managing the Social Identity Crisis on Mobile Devices Greg Howard

More information

INFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE

INFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE 15 : CONCEPT AND SCOPE 15.1 INTRODUCTION Information is communicated or received knowledge concerning a particular fact or circumstance. Retrieval refers to searching through stored information to find

More information

Managing a large Academic CD-ROM Network

Managing a large Academic CD-ROM Network Managing a large Academic CD-ROM Network Wolfram Seidler and Otto Oberhauser The growing number of implementations shows that CD-ROM networking has finally come of age. The particular benefits of providing

More information

Information Push Service of University Library in Network and Information Age

Information Push Service of University Library in Network and Information Age 2013 International Conference on Advances in Social Science, Humanities, and Management (ASSHM 2013) Information Push Service of University Library in Network and Information Age Song Deng 1 and Jun Wang

More information

The explosion of computer and telecommunications technology in the. The Usability of On-line Archival Resources: The Polaris Project Finding Aid

The explosion of computer and telecommunications technology in the. The Usability of On-line Archival Resources: The Polaris Project Finding Aid The Usability of On-line Archival Resources: The Polaris Project Finding Aid Burt Altman and John R. Nemmers Abstract This case study examines how the Florida State University Libraries Claude Pepper Library

More information

Empirical Validation of Webometrics based Ranking of World Universities

Empirical Validation of Webometrics based Ranking of World Universities Empirical Validation of Webometrics based Ranking of World Universities R. K. Pandey University Institute of Computer Science and Applications (UICSA) R. D. University,Jabalpur Abstract Webometrics is

More information

2015 Search Ranking Factors

2015 Search Ranking Factors 2015 Search Ranking Factors Introduction Abstract Technical User Experience Content Social Signals Backlinks Big Picture Takeaway 2 2015 Search Ranking Factors Here, at ZED Digital, our primary concern

More information

The Complex Network Phenomena. and Their Origin

The Complex Network Phenomena. and Their Origin The Complex Network Phenomena and Their Origin An Annotated Bibliography ESL 33C 003180159 Instructor: Gerriet Janssen Match 18, 2004 Introduction A coupled system can be described as a complex network,

More information

Query Modifications Patterns During Web Searching

Query Modifications Patterns During Web Searching Bernard J. Jansen The Pennsylvania State University jjansen@ist.psu.edu Query Modifications Patterns During Web Searching Amanda Spink Queensland University of Technology ah.spink@qut.edu.au Bhuva Narayan

More information

IMPROVING CUSTOMER GENERATION BY INCREASING WEBSITE PERFORMANCE AND INTEGRATING IT SYSTEMS

IMPROVING CUSTOMER GENERATION BY INCREASING WEBSITE PERFORMANCE AND INTEGRATING IT SYSTEMS IMPROVING CUSTOMER GENERATION BY INCREASING WEBSITE PERFORMANCE AND INTEGRATING IT SYSTEMS S Ramlall*, DA Sanders**, H Powell* and D Ndzi ** * Motiontouch Ltd, Dunsfold Park, Cranleigh, Surrey GU6 8TB

More information

Cookies, fake news and single search boxes: the role of A&I services in a changing research landscape

Cookies, fake news and single search boxes: the role of A&I services in a changing research landscape IET White Paper Cookies, fake news and single search boxes: the role of A&I services in a changing research landscape November 2017 www.theiet.org/inspec 1 Introduction Searching for information on the

More information

Contractors Guide to Search Engine Optimization

Contractors Guide to Search Engine Optimization Contractors Guide to Search Engine Optimization CONTENTS What is Search Engine Optimization (SEO)? Why Do Businesses Need SEO (If They Want To Generate Business Online)? Which Search Engines Should You

More information