Learning urban community structures refers to the efforts of quantifying, summarizing, and representing an urban community's (i) static structures, e.g., Point-Of-Interests (POIs) buildings and corresponding geographic allocations, and (ii) dynamic structures, e.g., human mobility patterns among POIs. By learning the community structures, we can better quantitatively represent urban communities and understand their evolutions in the development of cities. This can help us boost commercial activities, enhance public security, foster social interactions, and, ultimately, yield livable, sustainable and viable environments. However, due to the complex nature of urban systems, it is traditionally challenging to learn the structures of urban communities. To address this problem, in this paper, we propose a collective embedding framework to learn the community structure from multiple periodic spatial-temporal graphs of human mobility. Specifically, we first exploit a probabilistic propagation based approach to create a set of mobility graphs from periodic human mobility records. In these mobility graphs, the static POIs are regarded as vertexes, the dynamic mobility connectivity between POI pairs are regarded as edges, and the edge weights periodically evolve over time. A collective deep auto-encoder method is then developed to collaboratively learn the embeddings of POIs from multiple spatial-temporal mobility graphs. In addition, we develop a UGWA method (Unsupervised Graph based Weighted Aggregation), in order to align and aggregate the POI embeddings into the representation of the community structure. As an application, we apply the proposed embedding framework to rank high-rated residential communities to evaluate the performance of our proposed method. Extensive experimental results on real-world urban communities and human mobility data demonstrate the effectiveness of the proposed collective embedding framework.
This paper presents a high-precision multi-modal approach for localizing moving cameras using monocular videos, which has wide potentials in many intelligent applications, e.g., robotics, autonomous vehicles, etc. Existing visual odometry methods often suffer from symmetric or repetitive scene patterns, e.g., windows on buildings or parking stalls. To address this issue, we introduce a robust camera localization method that contributes in two aspects. First, we formulate feature tracking, the critical step of visual odometry, as a hierarchical min-cost network flow optimization task, and regularize the formula with flow constraints, cross-scale consistencies, and motion heuristics. The proposed formula can adaptively select features or feature combinations over scale-space that are most distinctive, which is different from traditional methods that need to detect and group repetitive patterns in a separate step. Second, we further develop a joint formula for integrating dense visual odometry and sparse GPS readings in a shared reference coordinate. The fusion process is guided with high-order statistics knowledge to suppress the impacts of drifting issues. We evaluate the proposed method on both public video datasets and a newly created dataset that includes scenes full of repetitive patterns. Results with comparisons show that our method can clearly outperform the alternative methods and is effective for addressing repetitive pattern issues.
High utility sequential pattern (HUSP) mining is an emerging topic in pattern mining, and only a few algorithms have been proposed to address it. In practice, most sequence databases usually grow over time, and it is inefficient for existing algorithms to mine HUSPs from scratch when databases grow with a small portion of updates. In view of this, we propose the IncUSP-Miner + algorithm to mine HUSPs incrementally. Specifically, to avoid redundant re-computations, we propose a tighter upper bound of the utility of a sequence, called TSU (standing for Tight Sequence Utility), and then design a novel data structure, called the candidate pattern tree, to buffer the sequences whose TSU values are greater than or equal to the minimum utility threshold in the original database. Accordingly, to avoid keeping a huge amount of utility information for each sequence, a set of concise utility information is designed to be stored in each tree node. To improve the mining efficiency, several strategies are proposed to reduce the amount of computation for utility update and the scopes of database scans. Moreover, several strategies are also proposed to properly adjust the candidate pattern tree for the support of multiple database updates. Experimental results on some real and synthetic datasets show that IncUSP-Miner + is able to efficiently mine HUSPs incrementally.
From reducing stress and loneliness, to boosting productivity and overall well-being, pets are believed to play a significant role in people's daily lives. Many traditional studies have identified that frequent interactions with pets could make individuals become healthier and more optimistic, and ultimately enjoy a happier life. However, most of those studies are not only restricted in scale, but also may carry biases by using subjective self-reports, interviews, and questionnaires as the major approaches. In this paper, we leverage large-scale data collected from social media and the state-of-the-art deep learning technologies to study this phenomenon in depth and breadth. Our study includes four major steps: 1) collecting timeline posts from around 20,000 Instagram users; 2) using face detection and recognition on 2-million photos to infer users' demographics, relationship status, and whether having children, 3) analyzing a user's degree of happiness based on images and captions via smiling classification and textual sentiment analysis; 3) applying transfer learning techniques to retrain the final layer of the Inception v3 model for pet classification; and 4) analyzing the effects of pets on happiness in terms of multiple factors of user demographics. Our main results have demonstrated the efficacy of our proposed method with many new insights. We believe this method is also applicable to other domains as a scalable, efficient, and effective methodology for modeling and analyzing social behaviors and psychological well-being.
Machine learning and artificial intelligence techniques have been applied to construct online portfolio selection strategies recently. A popular and state-of-the-art family of strategies is to explore the reversion phenomenon through online learning algorithms and statistical prediction models. Despite gaining promising results on some benchmark datasets, these strategies often adopt a single model based on a selection criterion (e.g., breakdown point) for predicting future price. However, such model selection is often unstable and may cause unnecessarily high variability in the final estimation, leading to poor prediction performance in real datasets and thus non-optimal portfolios. To overcome the drawbacks, in this paper, we propose to exploit the reversion phenomenon by using combination forecasting estimators, and design a novel online portfolio selection strategy, named Combination Forecasting Reversion (CFR), which outputs optimal portfolios based on the improved reversion estimator. We further present an efficient CFR implementation based on online Newton step (ONS) and online gradient descent (OGD) algorithms, and theoretically analyze regret bounds of the proposed algorithms, which guarantee that the online CFR model performs as good as the best CFR model in hindsight. We evaluate the proposed algorithms on various real markets with extensive experiments. Empirical results show that CFR can effectively overcome the drawbacks of existing reversion strategies and achieve the state-of-the-art performance.
The increased accessibility of urban sensor data and the popularity of social network applications is enabling the discovery of crowd mobility and personal communication patterns. However, studying the egocentric relationships of an individual (i.e., the egocentric relations) can be very challenging because available data may refer to direct contacts, such as phone calls between individuals, or indirect contacts, such as paired location presence. In this paper, we develop methods to integrate three facets extracted from heterogeneous urban data (timelines, calls and locations) through a progressive visual reasoning and inspection scheme. Our approach uses a detect-and-filter scheme, such that, prior to visual refinement and analysis, a coarse detection is performed to extract the target individual and construct the timeline of the target. It then detects spatio-temporal co-occurrences or call-based contacts to develop the egocentric network of the individual. The filtering stage is enhanced with a line-based visual reasoning interface that facilitates flexible and comprehensive investigation of egocentric relationships and connections in terms of time, space and social networks. The integrated system, RelationLines, is demonstrated using a dataset that contains taxi GPS data, cell-base mobility data, mobile calling data, microblog data and POI data of a city with millions of citizens. We conduct three case studies to examine the effectiveness and efficiency of our system.
Deep convolutional neural networks (CNNs) have achieved remarkable success in various fields. However, training an excellent CNN is practically a trial-and-error process that consumes a tremendous amount of time and computer resources. To accelerate the training process and reduce the number of trials, experts need to understand what has occurred in the training process and why the resulting CNN behaves as such. However, current popular training platforms, such as TensorFlow, only provide very little and general information, such as training/validation errors, which is far from enough to serve this purpose. To bridge this gap and help domain experts with their training tasks in a practical environment, we propose a visual analytics system, DeepTracker, to facilitate the exploration of the rich dynamics of CNN training processes and to identify the unusual patterns that are hidden behind the huge amount of training log. Specifically, we combine a hierarchical index mechanism and a set of hierarchical small multiples to help experts explore the entire training log from different levels of detail. We also introduce a novel cube-style visualization to reveal the complex correlations among multiple types of heterogeneous training data including neuron weights, validation images, and training iterations. Three case studies are conducted to demonstrate how DeepTracker provides its users with valuable knowledge in an industry-level CNN training process, namely in our case, training ResNet-50 on the ImageNet dataset. We show that our method can be easily applied to other state-of-the-art "very deep" CNN models.
For real-world learning tasks (e.g., classification), graph-based models are commonly used to fuse the information distributed in diverse data sources, which can be heterogeneous, redundant, and incomplete. These models represent the relations in different datasets as pairwise links. However, these links cannot deal with high-order relations which connect multiple objects (e.g., more than two patient groups admitted by the same hospital in 2014). In this paper, we propose a visual analytics approach for the classification of heterogeneous datasets using the hypergraph model. The hypergraph is an extension to traditional graphs in which a hyperedge connects multiple vertices instead of just two. We model various high-order relations in heterogeneous datasets as hyperedges and fuse different datasets with a uni ed hypergraph structure. The hypergraph learning algorithm is used for predicting the missing labels in the datasets. To allow users to inject their domain knowledge into the model-learning process, we augment the traditional learning algorithm in a number of ways. We also propose a set of visualizations which enable the user to construct the hypergraph structure and the parameters of the learning model interactively during the analysis. We demonstrate the capability of our approach via two real-world cases.
For 2-dimensional (2D) data, current clustering algorithms usually need to convert them to vectors in a pre-processing step, which, unfortunately, severely damages 2D spatial information and omits the inherent structures and correlations in the original data. In this paper, we develop a novel clustering method, which addresses these issues to enhance the clustering capability. The proposed method mutually enhances three goals, including seeking projections, learning manifolds, and constructing data representations, in a seamlessly integrated model. In particular, we seek two projection matrices with optimal number of directions to project the data into optimally low-rank, noise reduced, most expressive subspaces, in which manifolds are constructed and data representations are sought. The manifolds are adaptively updated according to the projections, and the new data representations are sought with respect to the projected data. Consequently, the learned manifolds are clean and more expressive, and the new data representations are representative and robust. Extensive experimental results have verified the effectiveness of the proposed method in clustering.
Recognizing human activities using supervised learning methods has been widely studied in the literature. However, for some applications like elderly care, what activities to be identied for analysis are very often unknown. In this paper, we focus on automatic extraction of behavioral patterns as the representations of activities from the trajectory data of an individual. The underlying challenges lie on the need to model the long-range dependency and spatio-temporal variations within the trajectory data. We propose to rst represent the trajectory data using a behavior-aware ow graph which is a probabilistic nite state automaton with its nodes and edges attributed with local behavioral features. We then identify the underlying subows as the behavioral patterns using the kernel k-means algorithm. With the activities automatically identied, we propose a novel nominal matrix factorization method under a Bayesian framework with Lasso to extract highly interpretable daily activity routines. The performance of the proposed methodology has been compared with a number of existing methods using both synthetic and publicly available real smart home data sets with promising results obtained. We also discuss how the proposed unsupervised methodology can be used to support exploratory behavior analysis for elderly care.
Recommendation applications can guide users in making important life choices by referring to the activities of similar peers. For example, students making academic plans may learn from the data of similar students, while patients and their physicians may explore data from similar patients to select the best treatment. Selecting an appropriate peer group has a strong impact on the value of the guidance that can result from analyzing the peer group data. In this paper, we describe a visual interface that helps users review the similarity and differences between a seed record and a group of similar records, and refine the selection. We introduce the LikeMeDonuts, Ranking Glyph, and History Heatmap visualizations. The interface was refined through three rounds of formative usability evaluation with 12 target users and its usefulness was evaluated by a case study with a student review manager using real student data. We describe three analytic workflows observed during use and summarize how users' input shaped the final design.
Nonnegative matrix factorization (NMF) is one widely used feature extraction technology in the tasks of image clustering and image classification. For the former task, various unsupervised NMF methods based on the data distribution structure information have been proposed. While for the later task, the label information of the dataset is one very important guiding. However, most previous proposed supervised NMF methods emphasis on imposing the discriminant constraints on the coefficient matrix. When dealing with new coming samples, the transpose or the pseudoinverse of the basis matrix is used to project these samples to the low dimension space. In this way, the label influence to the basis matrix is indirect. Although, there are also some methods try to constrain the basis matrix in NMF framework, either they only restrict within-class samples or impose improper constraint on the basis matrix. To Address these problems, in this paper a novel NMF framework named discriminative and orthogonal subspace constraints based nonnegative matrix factorization (DOSNMF) is proposed. In DOSNMF, the discriminative constraints are imposed on the projected subspace instead of the directly learned representation. In this manner, the discriminative information is directly connected with the projected subspace. At the same time, an orthogonal term is incorporated in DOSNMF to adjust the orthogonality of the learned basis matrix, which can ensure the orthogonality of the learned subspace and improve the sparseness of the basis matrix at the same time. This framework can be implemented in two ways. The first way is based on the manifold learning theory, in this way, two graphs, the intrinsic graph and the penalty graph, are constructed to capture the intraclass structure and the inter-class distinctness. In this way, both the manifold structure information and the discriminative information of the dataset are utilized. For convenience, we name this method as the name of the framework, i.e. DOSNMF. The second way is based on the Fishers criterion, we name it as Fishers criterion based DOSNMF (FDOSNMF). The object functions of DOSNMF and FDOSNMF can be easily optimized using multiplicative update (MU) rules. The new methods are tested on five datasets and compared with several supervised and unsupervised variants of NMF. The experimental results reveal the effectiveness of the proposed methods.
Hashing techniques have recently gained increasing research interests in multimedia studies. Most existing hashing methods only employ single feature for hash code learning. Multi-view data with each view corresponding to a type of feature generally provides more comprehensive information. How to efficiently integrate multiple views for learning compact hash codes still remains challenging. In this paper, we propose a novel unsupervised hashing method, dubbed multi-view discrete hashing (MvDH), by effectively exploring multi-view data. Specifically, MvDH performs matrix factorization to generate the hash codes as the latent representations shared by multiple views, during which spectral clustering is performed simultaneously. The joint learning of hash codes and cluster labels enables that MvDH can generate more discriminative hash codes, which are optimal for classification. An efficient alternating algorithm is developed to solve the proposed optimization problem with guaranteed convergence and low computational complexity. The binary codes are optimized via discrete cyclic coordinate descent (DCC) method to reduce the quantization errors. Extensive experimental results on three large-scale benchmark datasets demonstrate the superiorities of the proposed method over several state-of-the-art methods in terms of both accuracy and scalability.
Popular social media platforms could rapidly propagate vital information over social networks among a significant number of people. In this work we present D-Map+ (Diffusion Map), a novel visualization method to support exploration and analysis of social behaviors during such information diffusion and propagation on typical social media through a map metaphor. In D-Map+, users who participated in reposting (i.e., resending a message initially posted by others) one central user's posts (i.e., a series of original tweets) are collected and mapped to a hexagonal grid based on their behavior similarities and in chronological order of the repostings. With additional interaction and linking, D-Map+ is capable of providing visual profilings of the influential users, describing their social behaviors and analyzing the siginificant events evolution in social media. A comprehensive visual analysis system is developed to support interactive exploration with D-Map+. We evaluate our work with real world social media data and find interesting patterns among users. Key players, important information diffusion paths, and interactions among social communities can be identified.
Learning-to-Rank (LtR) solutions are commonly used in large-scale information retrieval systems such as Web search engines where high-quality documents need to be returned in response to a user query within a fraction of a second. The most effective LtR algorithms, e.g., »-MART, adopt a gradient boosting approach to build an additive ensemble of weighted regression trees. Since the required ranking effectiveness is achieved with very large ensembles, the impact on response time and query throughput of these solutions is not negligible. In this paper we propose X-CLEaVER, an iterative meta-algorithm able to build more efficient and effective ranking ensembles. X-CLEaVER interleaves the iterations of a given ensemble learning algorithm with pruning and re-weighting phases. First, redundant trees are removed from the ensemble generated, then the weights of the remaining trees are fine-tuned by optimizing the desired ranking loss function. We propose and analyse several pruning strategies and assess their bene ts showing that interleaving pruning and re-weighting phases during learning is more effective than applying a single post-learning optimization step. Experiments conducted using two publicly available LtR datasets show that X-CLEaVER is very effective in optimizing »-MART models both in terms of effectiveness and efficiency.
This paper presents a platform for interactive graph mining and relational learning called GraphVis. The platform combines interactive visual representations with state-of-the-art graph mining and relational machine learning techniques to aid in revealing important insights quickly as well as learning an appropriate and highly predictive model for a particular task (e.g., classification, link prediction, discovering the roles of nodes, finding influential nodes). Visual representations and interaction techniques and tools are developed for simple, fast, and intuitive real-time interactive exploration, mining, and modeling of graph data. In particular, we propose techniques for interactive relational learning (e.g., node/link classification), interactive link prediction and weighting, role discovery and community detection, higher-order network analysis (via graphlets, network motifs), among others. GraphVis also allows for the refinement and tuning of graph mining and relational learning methods for specific application domains and constraints via an end-to-end interactive visual analytic pipeline that learns, infers, and provides rapid interactive visualization with immediate feedback at each change/prediction in real-time. Other key aspects include interactive filtering, querying, ranking, manipulating, exporting, as well as tools for dynamic network analysis and visualization, interactive graph generators/models (including new block model approaches), and a variety of multi-level network analysis techniques.
Making machines understand human expressions enables various useful applications in human-machine interaction. In this paper, we present a novel facial expression recognition approach with 3D Mesh Convolutional Neural Network (3DMCNN) and a visual analytics guided 3DMCNN design and optimization scheme. From a RGBD camera, we first reconstruct a 3D face model of a subject with facial expressions and then compute the geometric properties of the surface. Instead of using regular Convolutional Neural Network (CNN) to learn intensities of the facial images, we convolve the geometric properties on the surface of the 3D model using 3DMCNN. We design a geodesic distance-based convolution method to overcome the difficulties raised from the irregular sampling of the face surface mesh. We further present an interactive visual analytics for the purpose of designing and modifying the networks to analyze the learned features and cluster similar nodes in 3DMCNN. By removing low activity nodes in the network, the performance of the network is greatly improved. We compare our method with the regular CNN-based method by interactively visualizing each layer of the networks and analyze the effectiveness of our method by studying representative cases. Testing on public datasets, our method achieves a higher recognition accuracy than traditional image-based CNN and other 3D CNNs. The proposed framework, including 3DMCNN and interactive visual analytics of the CNN, can be extended to other applications.
Smog causes low visibility on the road and it can impact the safety of traffic. Modeling traffic in smog will have a significant impact on realistic traffic simulation. Most of the existing traffic models assume that drivers have optimal vision in the simulations. These simulations are not suitable for modeling smog weather conditions. In this paper, we introduce the smog full velocity difference model (SMOG-FVDM) for a realistic simulation of traffic in smog weather conditions. In this model, we present a stadia model for drivers in smog weather. We then introduce it into the car-following traffic model through ``Psychological Force'' and ``Body Force'', and then introduce the SMOG-FVDM. Considering that there are lots of parameters in the SMOG-FVDM, we design a visual verification system based on the SMOG-FVDM to get an adequate solution, which can show visual simulation results in different road scenarios and different smog degrees by reconciling the parameters. Experiments results show that our model can give a realistic and efficient traffic simulation in smog weather conditions.
When looking at an image, humans shift their attention towards interesting regions, making sequences of eye fixations. When describing an image, they also come up with simple sentences that highlight the key elements in the scene. What is the correlation between where people look and what they describe in an image? To investigate this problem intuitively, we develop a visual analytics system CapVis to look into eye fixations and image captions, two types of subjective annotations that are relatively task-free and natural. From the annotations, we propose a word-weighting scheme to extract visual and verbal saliency ranks to compare against each other. In our approach, a number of low-level and semantic-level features relevant to the visual-verbal saliency consistency are proposed and visualized in multiple facts for better understanding of image content. Our method also shows the different ways human and computational model look and describe, which provides reliable information for the diagnosis of captioning model. Experiment also shows that the visualized feature can be integrated into a computational model, to effectively predict the consistency between the two modalities on image dataset with both types of annotations.
This paper deals with trajectory planning that is suitable for nonholonomic differentially driven wheeled mobile robots. The path is approximated with a spline which consist of multiple Bernstein-Bézier curves that are merged together in a way that continuous curvature of the spline is achieved. The paper presents the approach for optimization of velocity profile of Bernstein-Bézier spline subject to velocity and acceleration constraints. For the purpose of optimization velocity and turning points are introduced. Based on these singular points local segments are defined where local velocity profiles are optimized independently of each other. From the locally optimum velocity profiles the global optimum velocity profile is determined. The proposed optimization approach is experimentally evaluated and validated in simulation environment and on real mobile robots.