Results 11 - 20
of
28
ii DEVELOPMENT AND EVALUATION OF TEXT- BASED INDEXING FOR LECTURE VIDEOS
, 2014
"... iii ..."
(Show Context)
Web Identity Translator Behavioral Advertising and Identity Privacy with WIT
"... ABSTRACT Online Behavioral Advertising (OBA) is an important revenue source for online publishers and content providers. However, the extensive user tracking required to enable OBA raises valid privacy concerns. Existing and proposed solutions either block all tracking, therefore breaking OBA entir ..."
Abstract
- Add to MetaCart
(Show Context)
ABSTRACT Online Behavioral Advertising (OBA) is an important revenue source for online publishers and content providers. However, the extensive user tracking required to enable OBA raises valid privacy concerns. Existing and proposed solutions either block all tracking, therefore breaking OBA entirely, or require significant changes on the current advertising infrastructure, making adoption hard. We propose Web Identity Translator (WIT), a new privacy service running as a proxy or middlebox. WIT stops the original tracking cookies from being set on the browser of users and instead substitutes them with private cookies it controls. By manipulating the mapping between tracking and private cookies, WIT permits transparent OBA to continue while simultaneously protecting the identity of users from attacks based on behavioral analysis of browsing patterns.
EasySOC: Making Web Service Outsourcing Easier. (M EasySOC: Making Web Service Outsourcing Easier 1
"... Abstract 6 Service-Oriented Computing has been widely recognized as a revolutionary paradigm for software development. Despite the important benefits this paradigm provides, current approaches for service-enabling applications still lead to high costs for outsourcing services with regard to two pha ..."
Abstract
- Add to MetaCart
Abstract 6 Service-Oriented Computing has been widely recognized as a revolutionary paradigm for software development. Despite the important benefits this paradigm provides, current approaches for service-enabling applications still lead to high costs for outsourcing services with regard to two phases of the software life cycle. During the implementation phase, developers have to invest much effort into manually discovering services and then providing code to invoke them. Mostly, the outcome of the second task is software containing service-aware code, therefore it is more difficult to modify and to test during the maintenance phase. This paper describes EasySOC, an approach that aims to decrease the costs of creating and maintaining service-oriented applications. EasySOC combines text mining, machine learning, and best practices from componentbased software development to allow developers to quickly discover and non-invasively invoke services. We evaluated the performance of the EasySOC discovery mechanism using 391 services. In addition, through a case study, we conducted a comparative analysis of the software technical quality achieved by employing EasySOC versus not using it.
Clustering and Topic Aggregation
, 2015
"... This Theses, Masters is brought to you for free and open access by the ..."
Gender Perception From Faces Using Boosted LBPH (Local Binary Patten Histograms)
"... Abstract—Automatic Gender classification from faces has several applications such as surveillance, human computer interaction, targeted advertisement etc. Humans can recognize gender from faces quite accurately but for computer vision it is a difficult task. Many studies have targeted this problem b ..."
Abstract
- Add to MetaCart
Abstract—Automatic Gender classification from faces has several applications such as surveillance, human computer interaction, targeted advertisement etc. Humans can recognize gender from faces quite accurately but for computer vision it is a difficult task. Many studies have targeted this problem but most of these studies used images of faces taken under constrained conditions. Real-world applications however require to process images from real-world, that have significant variation in lighting and pose, which makes the gender classification task very difficult. We have examined the problem of automatic gender classification from faces on real-world images. Using a face detector faces from images are extracted aligned and represented using Local binary pattern histogram. Discriminative features are selected using Adaboost and the boosted LBP features are used to train a support vector machine that provides a recognition rate of 93.29%.
Document Stream Clustering using GPUs
"... The Web is constantly generating streams of textual information in the form of News articles and Tweets. In order for Information Retrieval systems to make sense of all this data partitional clustering algorithms are used to create groups of similar documents. Traditional clustering algorithms, like ..."
Abstract
- Add to MetaCart
(Show Context)
The Web is constantly generating streams of textual information in the form of News articles and Tweets. In order for Information Retrieval systems to make sense of all this data partitional clustering algorithms are used to create groups of similar documents. Traditional clustering algorithms, like K-means, are not well suited for stream processing where the dataset is constantly changing as new documents are published. In this paper we present a clustering algorithm designed to work with streaming documents. These documents, described by their TF-IDF (term frequency- inverse document frequency) [15] term vectors, are incrementally generated appropriate clusters based on the cosine similarity metric. We provide an efficient implementation of this algorithm on a GPU using CUDA, that achieves speedups of over 43X compared to its serial CPU implementation and has the ability to cluster a document within just.01 seconds after its term vector is received, even when there are 1.6 million clusters. Our implementation is capable to scale to clustering 5.5 million documents using a single GTX 480 GPU in 16.1 hours and can easily be extended to run on a system containing large numbers of GPUs.
A NEW TERM WEIGHTING SCHEME FOR DOCUMENT CLUSTERING
"... Abstract — In this paper, we present a Cluster-Based Term weighting scheme (CBT) for document clustering algorithms based on Term Frequency- Inverse Document Frequency (T F − IDF). Our method assigns the term weights using the information obtained from the generated clusters and the collection. It i ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract — In this paper, we present a Cluster-Based Term weighting scheme (CBT) for document clustering algorithms based on Term Frequency- Inverse Document Frequency (T F − IDF). Our method assigns the term weights using the information obtained from the generated clusters and the collection. It identifies the terms that are specific to each cluster and increases their term weight based on their importance. We used the K-means partitional clustering algorithm to compare our method with three widely used term weighting schemes such as Norm − T F, T F − IDF, and T F − IDF − ICF. Our experimental results show that the new method outweighs the existing term weighting schemes and improves the result of a clustering algorithm.
Redeye: A Digital Library for Forensic Document Triage
"... Forensic document analysis has become an important aspect of investigation of many different kinds of crimes from money laundering to fraud and from cybercrime to smuggling. The current workflow for analysts includes powerful tools, such as Palantir and Analyst’s Notebook, for moving from evidence t ..."
Abstract
- Add to MetaCart
(Show Context)
Forensic document analysis has become an important aspect of investigation of many different kinds of crimes from money laundering to fraud and from cybercrime to smuggling. The current workflow for analysts includes powerful tools, such as Palantir and Analyst’s Notebook, for moving from evidence to actionable intelligence and tools for finding documents among the millions of files on a hard disk, such as Forensic Toolkit (FTK). Analysts often leave the process of sorting through collections of seized documents to filter out noise from actual evidence to highly labor-intensive manual efforts. This paper presents the Redeye Analysis Workbench, a tool to help analysts move from manual sorting of a collection of documents to performing intelligent document triage over a digital library. We will discuss the tools and techniques we build upon in addition to an in-depth discussion of our tool and how it addresses two major use cases we observed analysts performing. Finally, we also include a new layout algorithm for radial graphs that is used to visualize clusters of documents in our system.
Multi-year Content Analysis of User Facility Related Publications
"... Scientific user facilities provide resources and support that enable scientists to conduct experiments or simulations pertinent to their respective research. Consequently, it is critical to have an informed understanding of the impact and contributions that these facilities have on scientific discov ..."
Abstract
- Add to MetaCart
(Show Context)
Scientific user facilities provide resources and support that enable scientists to conduct experiments or simulations pertinent to their respective research. Consequently, it is critical to have an informed understanding of the impact and contributions that these facilities have on scientific discoveries. Leveraging insight into scientific publications that acknowledge the use of these facilities enables more informed decisions by facility management and sponsors in regard to policy, resource allocation, and influencing the direction of science as well as more effectively understand the impact of a scientific user facility. This work discusses preliminary results of mining scientific publications that utilized resources at the Oak Ridge Leadership Computing Facility (OLCF) at Oak Ridge National Laboratory (ORNL). These results show promise in identifying and leveraging multi-year trends and providing a higher resolution view of the impact that a scientific user facility may have on scientific discoveries.
A New Weighting Scheme and Discriminative Approach for Information Retrieval in Static and Dynamic Document Collections
"... Abstract—This paper introduces a new weighting scheme in information retrieval. It also proposes using the document centroid as a threshold for normalizing documents in a document collection. Document centroid normalization helps to achieve more effective information retrieval as it enables good dis ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—This paper introduces a new weighting scheme in information retrieval. It also proposes using the document centroid as a threshold for normalizing documents in a document collection. Document centroid normalization helps to achieve more effective information retrieval as it enables good discrimination between documents. In the context of a machine learning application, namely unsupervised document indexing and retrieval, we compared the effectiveness of the proposed weighting scheme to the ‘Term Frequency – Inverse Document Frequency ’ or TF-IDF, which is commonly used and considered as one of the best existing weighting schemes. The paper shows how the document centroid is used to remove less significant weights from documents and how this helps to achieve better retrieval effectiveness. Most of the existing weighting schemes in information retrieval research assume that the whole document collection is static. The results presented in this paper show that the proposed weighting scheme can produce higher retrieval effectiveness compared with the TF-IDF weighting scheme, in both static and dynamic document collections. The results also show the variation in information retrieval effectiveness that is achieved for static and dynamic document collections by using a specific weighting scheme. This type of comparison has not been presented in the literature before.