Results 1 
3 of
3
Active Exploration in Networks: Using Probabilistic Relationships for Learning and Inference
"... Many interesting domains in machine learning can be viewed as networks, with relationships (e.g., friendships) connecting items (e.g., individuals). The Active Exploration (AE) task is to identify all items in a network with a desired trait (i.e., positive labels) given only partial information abou ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Many interesting domains in machine learning can be viewed as networks, with relationships (e.g., friendships) connecting items (e.g., individuals). The Active Exploration (AE) task is to identify all items in a network with a desired trait (i.e., positive labels) given only partial information about the network. The AE process iteratively queries for labels or network structure within a limited budget; thus, accurate predictions prior to making each query is critical to maximizing the number of positives gathered. However, the targeted AE query process produces partially observed networks that can create difficulties for predictive modeling. In particular, we demonstrate that these partial networks can exhibit extreme label correlation bias, which makes it difficult for conventional relational learning methods to accurately estimate relational parameters. To overcome this issue, we model the joint distribution of possible edges and labels to improve learning and inference. Our proposed method, Probabilistic Relational Expectation Maximization (PREM), is the first AE approach to accurately learn the complex dependencies between attributes, labels, and structure to improve predictions. PREM utilizes collective inference over the missing relationships in the partial network to jointly infer unknown item traits. Further, we develop a linear inference algorithm to facilitate efficient use of PREM in large networks. We test our approach on four real world networks, showing that AE with PREM gathers significantly more positive items compared to stateoftheart methods.
Active learning for streaming networked data
 in CIKM, 2014
"... Mining highspeed data streams has become an important topic due to the rapid growth of online data. In this paper, we study the problem of active learning for streaming networked data. The goal is to train an accurate model for classifying networked data that arrives in a streaming manner by query ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Mining highspeed data streams has become an important topic due to the rapid growth of online data. In this paper, we study the problem of active learning for streaming networked data. The goal is to train an accurate model for classifying networked data that arrives in a streaming manner by querying as few labels as possible. The problem is extremely challenging, as both the data distribution and the network structure may change over time. The query decision has to be made for each data instance sequentially, by considering the dynamic network structure. We propose a novel streaming active query strategy based on structural variability. We prove that by querying labels we can monotonically decrease the structural variability and better adapt to concept drift. To speed up the learning process, we present a network sampling algorithm to sample instances from the data stream, which provides a way for us to handle large volume of streaming data. We evaluate the proposed approach on four datasets of different genres: Weibo, Slashdot, IMDB, and ArnetMiner. Experimental results show that our model performs much better (+510% by F1score on average) than several alternative methods for active learning over streaming networked data.
Discovering Valuable Items from Massive Data
"... Suppose there is a large collection of items, each with an associated cost and an inherent utility that is revealed only once we commit to selecting it. Given a budget on the cumulative cost of the selected items, how can we pick a subset of maximal value? This task generalizes several important p ..."
Abstract
 Add to MetaCart
(Show Context)
Suppose there is a large collection of items, each with an associated cost and an inherent utility that is revealed only once we commit to selecting it. Given a budget on the cumulative cost of the selected items, how can we pick a subset of maximal value? This task generalizes several important problems such as multiarm bandits, active search and the knapsack problem. We present an algorithm, GPSelect, which utilizes prior knowledge about similarity between items, expressed as a kernel function. GPSelect uses Gaussian process prediction to balance exploration (estimating the unknown value of items) and exploitation (selecting items of high value). We extend GPSelect to be able to discover sets that simultaneously have high utility and are diverse. Our preference for diversity can be specified as an arbitrary monotone submodular function that quantifies the diminishing returns obtained when selecting similar items. Furthermore, we exploit the structure of the model updates to achieve an order of magnitude (up to 40X) speedup in our experiments without resorting to approximations. We provide strong guarantees on the performance of GPSelect and apply it to three realworld case studies of industrial relevance: (1) Refreshing a repository of prices in a Global Distribution System for the travel industry, (2) Identifying diverse, bindingaffine peptides in a vaccine design task and (3) Maximizing clicks in a webscale recommender system by recommending items to users.