Results 11 - 20
of
43
Finding similar academic web sites with links, bibliometric couplings and colinks
- Information Processing & Management
, 2004
"... A common task in both Webmetrics and Web information retrieval is to identify a set of Web pages or sites that are similar in content. In this paper we assess the extent to which links, colinks and couplings can be used to identify similar Web sites. As an experiment, a random sample of 500 pairs of ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
A common task in both Webmetrics and Web information retrieval is to identify a set of Web pages or sites that are similar in content. In this paper we assess the extent to which links, colinks and couplings can be used to identify similar Web sites. As an experiment, a random sample of 500 pairs of domains from the UK academic Web were taken and human assessments of site similarity, based upon content type, were compared against ratings for the three concepts. The results show that using a combination of all three gives the highest probability of identifying similar sites, but surprisingly this was only a marginal improvement over using links alone. Another unexpected result was that high values for either colink counts or couplings were associated with only a small increased likelihood of similarity. The principal advantage of using couplings and colinks was found to be greater coverage in terms of a much larger number of pairs of sites being connected by these measures, instead of increased probability of similarity. In information retrieval terminology, this is improved recall rather than improved precision.
Methods for reporting on the targets of links from national systems of university Web sites
- Information Processing and Management
, 2003
"... Whilst hyperlinks within Web sites may be primarily created for navigation purposes, those between sites are a rich source of information about the content and use of the Web. As a result there is a need to derive descriptive statistics about them, both to help understand the underlying communicatio ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
Whilst hyperlinks within Web sites may be primarily created for navigation purposes, those between sites are a rich source of information about the content and use of the Web. As a result there is a need to derive descriptive statistics about them, both to help understand the underlying communication processes and so that policy makers can gain insights into the use of online information by those located within their constituency. It is known, however, that using the individual web link source page as the basic unit of counting is problematical because of the number and size of link anomalies. The challenge addressed in this paper is that of developing methods to assess techniques for counting links from groups of large university web sites (site outlinks). Two methods to assess the reliability of link counts are developed and applied to judge which of seven advanced document models are most appropriate in each case. The most generally applicable method used is an internal consistency test based upon a highly simplified model of web linking behaviour. The data used comes from crawls of UK, Australian and New Zealand universities. The standard domain advanced web document model emerges as the logical choice for comparison purposes within this set. Some descriptive statistics concerning Top Level Domain link targets are given and it is demonstrated that the choice of model can effect the final results.
Interpreting social science link analysis research: A theoretical framework
- Journal of the American Society for Information Science and Technology
, 2006
"... Link analysis in various forms is now an established technique in many different subjects, reflecting the perceived importance of links and that of the web. A critical but very difficult issue is how to interpret the results of social science link analyses. It is argued that the dynamic nature of th ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Link analysis in various forms is now an established technique in many different subjects, reflecting the perceived importance of links and that of the web. A critical but very difficult issue is how to interpret the results of social science link analyses. It is argued that the dynamic nature of the web, its lack of quality control and the online proliferation of copying and imitation mean that methodologies operating within a highly positivist, quantitative framework are ineffective. Conversely, the sheer variety of the web makes qualitative methodologies and pure reason very problematic to apply to large-scale studies. Methodology triangulation is consequently advocated, in combination with a warning that the web is incapable of giving definitive answers to large-scale link analysis research questions concerning social factors underlying link creation. Finally, it is claimed that whilst theoretical frameworks with which to guide research are appropriate, a Theory of Link Analysis is not possible.
An initial exploration of the link relationship between UK university Web sites
- ASLIB Proceedings
"... Aggregates of links are of interest to information scientists in the same way as citation counts are: as potential sources of data from which new knowledge can be mined. The recent discovery of a correlation between a web link count measure and the research quality of British universities is built u ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
Aggregates of links are of interest to information scientists in the same way as citation counts are: as potential sources of data from which new knowledge can be mined. The recent discovery of a correlation between a web link count measure and the research quality of British universities is built upon by applying a range of multivariate statistical techniques to counts of links between pairs of universities. This represents an initial attempt at developing an understanding of this phenomenon. Plausible results are extracted, including the high degree of similarity between the Scottish universities and limited evidence of a dichotomy between new and traditional universities. Outliers in the data were also identified by the techniques, some of which were verified by being tracked down to identifiable web phenomena. This is an important outcome because successful anomaly identification is a precondition to more effective analysis of this kind of data. The identification of groupings is encouraging evidence that web links between universities can be mined for significant results, although it is clear that more methodological development is needed if any but the simplest patterns are to be extracted. Finally, based upon the types of patterns extracted it is argued that none of the methods used are capable of fully analysing link structures on their own.
Webometrics and the SelfOrganization of the European Information Society < http://hyperion.math.upatras.gr/webometrics/> (last accessed 29
- Measuring the Web, Computer Networks and ISDN Systems, 28(7-11): 993
, 1999
"... ABSTRACT: Virtual space is a concrete material development of the information society. What could be revealed about this social structure? In parallel to bibliometrics and scientometrics, the first technique applied to books, the second to scientific articles, we explore webometrics as a methodology ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
ABSTRACT: Virtual space is a concrete material development of the information society. What could be revealed about this social structure? In parallel to bibliometrics and scientometrics, the first technique applied to books, the second to scientific articles, we explore webometrics as a methodology for the World-Wide Web. We will first present the technique and show how it can be used and then provide the European and the global information society as two examples in order to illustrate the methodology of webometrics. Moreover, we demonstrate the “triple-helix-ness,†that means the inter-relationship and connectivity between universities, governements and industries in both Europe and in the US. Among others, we conlcude that for the US governement is by far the most triple-helixed, while it seems that for Europe universities are the most triple-helixed. Webometrics and the SOEIS
A layered approach for investigating the topological structure of communities in the Web
- Journal of Documentation
, 2003
"... A layered approach for identifying communities in the Web is presented and explored by applying the Flake Exact Community Identification Algorithm to the UK academic Web. Although community or topic identification is a common task in information retrieval, a new perspective is developed by: (a) the ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
A layered approach for identifying communities in the Web is presented and explored by applying the Flake Exact Community Identification Algorithm to the UK academic Web. Although community or topic identification is a common task in information retrieval, a new perspective is developed by: (a) the application of Alternative Document Models, shifting the focus from individual pages to aggregated collections based upon Web directories, domains and entire sites; (b) the removal of internal site links; and (c) the adaptation of a new fast algorithm to allow fully automated community identification using all possible single starting points. The overall topology of the graphs in the three least aggregated layers was first investigated and found to include a large number of isolated points but, surprisingly, with most of the remainder being in one huge connected component, exact proportions varying by layer. The community identification process then found that the number of communities far exceeded the number of topological components, indicating that community identification is a potentially useful technique, even with random starting points. Both the number and size of communities identified was dependant on the parameter of the algorithm, with very different results being obtained in each case. In conclusion, the UK academic Web is embedded with layers of non-trivial communities and, if it is not unique in this, then there is the promise of (a) improved results for information retrieval algorithms that can exploit this additional structure, and (b) the application of the technique directly to partially automate Web metrics tasks such as that of finding all pages related to a given subject hosted by a single country’s universities.
The Web Impact Factor: a critical review
- The Electronic Library
"... Purpose – We analyse the link-based web site impact measure known as the Web Impact Factor (WIF). It is a quantitative tool for evaluating and ranking web sites, top-level domains and sub-domains. We also discuss the WIF's advantages and disadvantages, data collection problems, and validity and reli ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Purpose – We analyse the link-based web site impact measure known as the Web Impact Factor (WIF). It is a quantitative tool for evaluating and ranking web sites, top-level domains and sub-domains. We also discuss the WIF's advantages and disadvantages, data collection problems, and validity and reliability of WIF results. Design/methodology/approach – A key to webometric studies has been the use of largescale search engines, such as Yahoo and AltaVista that allow measurements to be made of the total number of pages in a web site and the total number of backlinks to the web site. These search engines provide similar possibilities for the investigation of links between web sites/pages to those provided by the academic journals citation databases from the Institute of Scientific Information (ISI). But the content of the Web is not of the same nature and quality as the databases maintained by the ISI. Findings – This paper reviews how the WIF has been developed and applied. It has been suggested that Web Impact Factors can be calculated as a way of comparing the attractiveness of web sites or domains on the Web. It is concluded that, while the WIF is arguably useful for quantitative intra-country comparison, application beyond this (i.e., to inter-country assessment) has little value. Originality/value – The paper attempts to make a critical review over literature on the WIF and associated indicators.
Does Topic Metadata Help With Web Search?
, 2005
"... It has been claimed that topic metadata can be used to improve the accuracy of text searches. Here, we test this claim by examining the contribution of metadata to effective searching within websites published by a university with a strong commitment to and substantial investment in metadata. We use ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
It has been claimed that topic metadata can be used to improve the accuracy of text searches. Here, we test this claim by examining the contribution of metadata to effective searching within websites published by a university with a strong commitment to and substantial investment in metadata. We use four sets of queries, a total of 463, extracted from the university's official query logs and from the university's site map. The results are clear: the available metadata is of little value in ranking answers to those queries. A follow-up experiment with the websites published in a particular government jurisdiction queries confirms that this conclusion is not specific to the particular university. Examination of the metadata present at the university reveals that, in addition to implementation deficiencies, there are inherent problems in trying to use subject and description metadata to enhance the searchability of websites. Our experiment
Can Google's PageRank be used to find the most important academic Web pages
- Journal of Documentation
"... Google’s PageRank is an influential algorithm that uses a model of Web use that is dominated by its link structure in order to rank pages by their estimated value to the Web community. This paper reports on the outcome of applying the algorithm to the Web sites of three national university systems i ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Google’s PageRank is an influential algorithm that uses a model of Web use that is dominated by its link structure in order to rank pages by their estimated value to the Web community. This paper reports on the outcome of applying the algorithm to the Web sites of three national university systems in order to test whether it is capable of identifying the most important Web pages. The results are also compared to simple inlink counts. It was discovered that the highest inlinked pages do not always have the highest PageRank, indicating that the two metrics are genuinely different, even for the top pages. More significantly, however, internal links dominated external links for the high ranks in either method and superficial reasons accounted for high scores in both cases. It is concluded that PageRank is not useful for identifying the top pages in a site and that it must be combined with a powerful text matching techniques in order to get the quality of information retrieval results provided by Google.
Transposition of the cocitation method with a view to classifying web pages
- Journal of the American Society for Information Science and Technology
, 2004
"... The Web is a huge source of information, and one of the main problems facing users is finding documents which correspond to their requirements. Apart from the problem of thematic relevance, the documents retrieved by search engines do not always meet the users ’ expectations. The document may be too ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
The Web is a huge source of information, and one of the main problems facing users is finding documents which correspond to their requirements. Apart from the problem of thematic relevance, the documents retrieved by search engines do not always meet the users ’ expectations. The document may be too general, or conversely too specialized, or of a different type from what the user is looking for, and so forth. We think that adding metadata to pages can considerably improve the process of searching for information on the Web. This article presents a possible typology for Web sites and pages, as well as a method for propagating metadata values, based on the study of the Web graph and more specifically the method of cocitation in this graph.

