ACM TIST Special Issue on Visual Analytics
This paper presents a high-precision multi-modal approach for localizing moving cameras using monocular videos, which has wide potentials in many intelligent applications, e.g., robotics, autonomous vehicles, etc. Existing visual odometry methods often suffer from symmetric or repetitive scene patterns, e.g., windows on buildings or parking stalls. To address this issue, we introduce a robust camera localization method that contributes in two aspects. First, we formulate feature tracking, the critical step of visual odometry, as a hierarchical min-cost network flow optimization task, and regularize the formula with flow constraints, cross-scale consistencies, and motion heuristics. The proposed formula can adaptively select features or feature combinations over scale-space that are most distinctive, which is different from traditional methods that need to detect and group repetitive patterns in a separate step. Second, we further develop a joint formula for integrating dense visual odometry and sparse GPS readings in a shared reference coordinate. The fusion process is guided with high-order statistics knowledge to suppress the impacts of drifting issues. We evaluate the proposed method on both public video datasets and a newly created dataset that includes scenes full of repetitive patterns. Results with comparisons show that our method can clearly outperform the alternative methods and is effective for addressing repetitive pattern issues.
In urban transportation systems, transfer stations refer to hubs connecting a variety of bus and subway lines and, thus, are the most important nodes in transportation networks. The pervasive availability of large-scale travel traces of passengers, collected from automated fare collection (AFC) systems, has provided unprecedented opportunities for understanding citywide transfer patterns, which can benefit smart transportation, such as smart route recommendation to avoid crowded lines, and dynamic bus scheduling to enhance transportation efficiency. To this end, in this paper, we provide a systematic study of the measurement, patterns, and modeling of spatiotemporal dynamics of passenger transfers. Along this line, we develop a data-driven analytical system for modeling the transfer volumes of each transfer station. More specifically, we first identify and quantify the discriminative patterns of spatiotemporal dynamics of passenger transfers by utilizing heterogeneous sources of transfer related data for each station. Also, we develop a multi-task spatiotemporal learning model for predicting the transfer volumes of a specific station at a specific time period. Moreover, we further leverage the predictive model of passenger transfers to provide crowdedness-aware route recommendations. Finally, we conduct the extensive evaluations with a variety of real-world data. Experimental results demonstrate the effectiveness of our proposed modeling method and its applications for smart transportation.
Neural networks have become very popular in recent years because of the astonishing success of deep learning in various domains such as image and speech recognition. In many of these domains, specific architectures of neural networks, such as convolutional networks, seem to fit the particular structure of the problem domain very well, and can therefore perform in an astonishingly effective way. However, the success of neural networks is not universal across all domains. Indeed, for learning problems without any special structure, or in cases where the data is somewhat limited, neural networks are known not to perform well with respect to traditional machine learning methods such as random forests. In this paper, we show that a carefully designed neural network with random forest structure can have better generalization ability. In fact, this architecture is more powerful than random forests, because the back-propagation algorithm reduces to a more powerful and generalized way of constructing a decision tree. Furthermore, the approach is efficient to train and requires a small constant factor of the number of training examples. This efficiency allows the training of multiple neural networks in order to improve the generalization accuracy. Experimental results on real-world benchmark datasets demonstrate the effectiveness of the proposed enhancements for classification and regression.
Understanding real-world event participation behavior has been a subject of active research and can offer valuable insights for event-related recommendation and advertisement. The emergence of event-based social networks (EBSNs), which attract online users to host/attend offline events, has enabled exciting new research in this domain. However, most existing works focus on understanding or predicting individual users' event participation behavior or recommending events to individual users. Few study has addressed the problem of event popularity from the event organizer's point of view. In this work, we study the latent factors for determining event popularity using large-scale datasets collected from the popular Meetup.com EBSN in five major cities around the world. We analyze and model four contextual factors: spatial factor using location convenience, quality, popularity density and competitiveness; group factor using group member entropy and loyalty; temporal factor using temporal preference and weekly event patterns; and semantic factor using readability, sentiment, part-of-speech and text novelty. In addition, we have developed a group-based social influence propagation network to model group-specific influences on events. By combining the COntextual features and Social Influence NEtwork, our integrated prediction framework COSINE can capture the diverse influential factors of event participation and can be used by event organizers to predict/improve the popularity of their events. Detailed evaluations demonstrate that our COSINE framework achieves high accuracy for event popularity prediction in all five cities with diverse cultures and user event behaviors.
Predicting users' proficiencies is a critical component of AI-powered personal assistants. This paper introduces a novel approach for prediction based on users' diverse, noisy, and passively generated application usage histories. We propose a novel Bi-directional Recurrent Neural Network with multi-layer attention mechanism (m-ATT-BiRNN) to extract sequential patterns and distinguish informative traces from noise. Our model is able to attend to the most discriminative actions and sessions to make more accurate and directly interpretable predictions while requiring 50x less training data than the state-of-the-art sequential learning approach. We evaluate our model with two large scale datasets collected from 68K Photoshop users: a design skill dataset where the user skill is determined by the quality of the end products; and a software skill dataset where users self-disclose their software usage skill levels. The empirical results demonstrate our model's superior performance compared to existing user representation learning techniques that leverage action frequencies and sequential patterns. In addition, we qualitatively illustrate the model's significant interpretative power. The proposed approach is broadly relevant to applications that generate user time-series analytics.
To date, factorization machines (FM) have emerged as a powerful model in many applications. In this work, we study the training of FM with the logistic loss for binary classification, which is a non-linear extension of the linear model with the logistic loss (i.e., logistic regression). For the training of large-scale logistic regression, Newton methods have been shown to be an effective approach, but it is difficult to apply such methods to FM because of the non-convexity. We consider a modification of FM that is multi-block convex and propose an alternating minimization algorithm based on Newton methods. Some novel optimization techniques are introduced to reduce the running time. Our experiments demonstrate that the proposed algorithm is more efficient than stochastic gradient algorithms and coordinate descent methods. The parallelism of our method is also investigated for the acceleration in multi-threading environments.
We address the problem of exploring and comparing large collections of scored, directed networks for understanding inferred Bayesian networks used in biology. In this field, heuristic algorithms explore the space of possible network solutions, sampling this space based on algorithm parameters and a network score that encodes the statistical fit to the data. The goal of the analyst is to guide the heuristic search and decide how to determine a final consensus network structure, usually by selecting the top scoring network or constructing the consensus network from a collection of high scoring networks. BayesPiles, our visualisation tool, helps with understanding the structure of the solution space and supporting the construction of a final consensus network that is representative of the underlying data set. BayesPiles builds upon and extends MultiPiles to meet our domain requirements. We developed BayesPiles in conjunction with computational biologists who have used this tool on data sets used in their research. The biologists found our solution provides them with new insights and helps them achieve results that are representative of the underlying data.
Learning from very few samples is a challenge for machine learning tasks, such as text and image classification. Transfer learning attempts to address this problem by transferring prior knowledge from related domains to enhance the learning performance in the target domain. In previous transfer learning works, instance transfer learning algorithms mostly focus on selecting the source domain instances similar to the target domain instances for transfer. However, the selected instances usually do not directly contribute to the learning performance in the target domain.Hypothesis transfer learning algorithms focus on the model/parameter level transfer. They treat the source hypotheses as well-trained and transfer their knowledge in terms of parameters to learn the target hypothesis. Such algorithms directly optimize the target hypothesis by the observable performance improvements. However, they fail to consider the problem that instances contribute to the source hypotheses may be harmful for the target hypothesis, as instance transfer learning analyzed.To relieve the aforementioned problems, we propose a novel transfer learning algorithm which follows an analogical strategy. Particularly, the proposed algorithm first learns a revised source hypothesis with only instances contribute to the target hypothesis. Then, the proposed algorithm transfers both the revised source hypothesis and the target hypothesis (only trained with a few samples) to learn an analogical hypothesis. We denote our algorithm as Analogical Transfer Learning.Extensive experiments on one synthetic dataset and three real-world benchmark datasets demonstrate the superior performance of the proposed algorithm.
This paper aims to develop a new and robust approach to feature representation. Motivated by the success of Auto-Encoders, we frst theoretically analyze and summarize the general properties of all algorithms that are based on traditional Auto-Encoders: 1) The reconstruction error of the input can not be lower than a lower bound, which can be viewed as a guiding principle for reconstructing the input. Additionally, when the input is corrupted with noises, the reconstruction error of the corrupted input also can not be lower than a lower bound. 2) The reconstruction of a hidden representation achieving its ideal situation is the necessary condition for the reconstruction of the input to reach the ideal state. 3) Minimizing the Frobenius norm of the Jacobian matrix of the hidden representation has a defciency and may result in a much worse local optimum value. We believe that minimizing the reconstruction error of the hidden representation is more robust than minimizing the Frobenius norm of the Jacobian matrix of the hidden representation. Based on the above analysis, we propose a new model termed Double Denoising Auto-Encoders (DDAEs), which uses corruption and reconstruction on both the input and the hidden representation. We demonstrate that the proposed model is highly exible and extensible and has a potentially better capability to learn invariant and robust feature representations. We also show that for dealing with noises or inessential features, our model is more robust than Denoising Auto-Encoders (DAEs). Furthermore, we will detail how to train DDAEs with two di erent pre-training methods by optimizing the objective function in a combined and separate manner, respectively. Comparative experiments illustrate that the proposed model is signifcantly better for representation learning than the state-of-the-art models.
Nonnegative matrix factorization (NMF) is one widely used feature extraction technology in the tasks of image clustering and image classification. For the former task, various unsupervised NMF methods based on the data distribution structure information have been proposed. While for the later task, the label information of the dataset is one very important guiding. However, most previous proposed supervised NMF methods emphasis on imposing the discriminant constraints on the coefficient matrix. When dealing with new coming samples, the transpose or the pseudoinverse of the basis matrix is used to project these samples to the low dimension space. In this way, the label influence to the basis matrix is indirect. Although, there are also some methods try to constrain the basis matrix in NMF framework, either they only restrict within-class samples or impose improper constraint on the basis matrix. To Address these problems, in this paper a novel NMF framework named discriminative and orthogonal subspace constraints based nonnegative matrix factorization (DOSNMF) is proposed. In DOSNMF, the discriminative constraints are imposed on the projected subspace instead of the directly learned representation. In this manner, the discriminative information is directly connected with the projected subspace. At the same time, an orthogonal term is incorporated in DOSNMF to adjust the orthogonality of the learned basis matrix, which can ensure the orthogonality of the learned subspace and improve the sparseness of the basis matrix at the same time. This framework can be implemented in two ways. The first way is based on the manifold learning theory, in this way, two graphs, the intrinsic graph and the penalty graph, are constructed to capture the intraclass structure and the inter-class distinctness. In this way, both the manifold structure information and the discriminative information of the dataset are utilized. For convenience, we name this method as the name of the framework, i.e. DOSNMF. The second way is based on the Fishers criterion, we name it as Fishers criterion based DOSNMF (FDOSNMF). The object functions of DOSNMF and FDOSNMF can be easily optimized using multiplicative update (MU) rules. The new methods are tested on five datasets and compared with several supervised and unsupervised variants of NMF. The experimental results reveal the effectiveness of the proposed methods.
The paper provides new techniques for optimizing domain design for goal and plan recognition using plan libraries. We define two new problems: Goal Recognition Design for Plan Libraries (GRD-PL) and Plan Recognition Design (PRD). Solving the GRD-PL helps to infer which goal the agent is trying to achieve, while solving PRD can help to infer how the agent is going to achieve its goal. For each problem, we define a worst-case distinctiveness measure that is an upper bound on the number of observations that are necessary to unambiguously recognize the agent's goal or plan. The paper studies the relationship between these measures, showing that the worst-case distinctiveness of GRD-PL is a lower bound of the worst-case plan distinctiveness of PRD, and that they are equal under certain conditions. We provide two complete algorithms for minimizing the worst-case distinctiveness of plan libraries without reducing the agent's ability to complete its goals: One is a brute force search over all possible plans and one a constraint-based search that identifies plans that are most difficult to distinguish in the domain. These algorithms are evaluated in three hierarchical plan recognition settings from the literature. We were able to reduce the worst case distinctiveness of the domains using our approach, in some cases reaching 100% improvement within a predesignated time window. Our iterative algorithm outperforms the brute force approach by an order of magnitude in terms of runtime.
At the heart of multi-agent systems is the ability to cooperate in order to improve the performance of individual agents and/or the system as a whole. While a widespread assumption in the literature is that such cooperation is essentially unrestricted, in many realistic settings this assumption does not hold. A highly-influential approach for modelling such scenarios are graph-restricted games introduced by Myerson. In this approach, agents are represented by nodes in a graph, edges represent communication channels, and a group can generate an arbitrary value only if there exists a direct or indirect communication channel between every pair of agents within the group. Two fundamental solution concepts that were proposed for such games are the Myerson value and the Shapley value. While an algorithm has been developed to compute the Shapley value in arbitrary graph-restricted games, no such general-purpose algorithm has been developed for the Myerson value to date. With this in mind, we set to develop for such games a general-purpose algorithm to compute the Myerson value, and a more efficient algorithm to compute the Shapley value. Since the computation of either value involves enumerating all connected induced subgraphs of the games underlying graph, we start by developing an algorithm dedicated to this enumeration, and show empirically that it is faster than the state of the art in the literature. Finally, we present a sample application of both algorithms, in which we test the Myerson value and the Shapley value as advanced measures of node centrality in networks.
Recommender systems are common in the e-commerce platforms in recent years. Recommender systems are able to help users find preferential items among a large amount of products so that users' time is saved and sellers' profits are increased. Cross-domain recommender systems aim to recommend items based on users' different tastes across domains. While recommender systems usually suffer from the user cold-start problem the leads to unsatisfying recommendation performance, cross-domain recommendation can remedy such problem. This paper proposes a novel cross-domain recommendation model based on regression analysis, partial least squares regression (PLSR). The proposed recommendation models, PLSR-CrossRec and PLSR-Latent, are able to purely use source-domain ratings predict the ratings for cold-start users who never rated items in the target domains. Experiments conducted on the Epinions dataset with ten various domains' rating records demonstrate that PLSR-Latent can outperform several matrix factorization-based competing methods under a variety of cross-domain settings. The time efficiency of PLSR-Latent is also satisfactory.
Massive public resume data emerging on the Internet indicates individual-related characteristics in terms of profile and career experiences. Resume Analysis (RA) provides opportunities for many applications, such as recruitment trend predict, talent seeking and evaluation. Existing RA studies either largely rely on the knowledge of domain experts, or leverage classic statistical or data mining models to identify and filter explicit attributes based on pre-defined rules. However, they fail to discover the latent semantic information from semi-structured resume text, i.e., individual career progress trajectory and social-relations, which are otherwise vital to comprehensive understanding of peoples career evolving patterns. Besides, when dealing with massive resumes, how to properly visualize such semantic information to reduce the information load and to support better human cognition is also challenging. To tackle these issues, we propose a visual analytics system ResumeVis to mine and visualize resume data. Firstly, a text-mining based approach is presented to extract semantic information. Then, a set of visualizations are devised to represent the semantic information in multiple perspectives. By interactive exploration on ResumeVis performed by domain experts, the following tasks can be accomplished: to trace individual career evolving trajectory; to mine latent social-relations among individuals; and to hold the full picture of massive resumes' collective mobility. Case studies with over 2500 online officer resumes demonstrate the effectiveness of our system.
One-class support vector machine (OCSVM) has been widely used in the area of structural health monitoring, where only data from one class (i.e. healthy) are available. Incremental learning of OCSVM is critical for online applications in which huge data streams continuously arrive and the healthy data distribution may vary over time. This paper proposes a novel adaptive self-advised online OCSVM, which incrementally tunes the kernel parameter and decides whether a model update is required or not. As opposed to existing methods, this novel online algorithm does not rely on any fixed threshold, but it uses the slack variables in the OCSVM to determine which new data points should be included in the training set and trigger a model update. The algorithm also incrementally tunes the kernel parameter of OCSVM automatically based on the spatial locations of the edge and interior samples in the training data with respect to the constructed hyperplane of OCSVM. This new online OCSVM algorithm was extensively evaluated using synthetic data and real data from case studies in structural health monitoring. The results showed that the proposed method significantly improved the classification error rates, was able to assimilate the changes in the positive data distribution over the time, and maintained a high damage detection accuracy in all case studies.