ACM Transactions on

Intelligent Systems and Technology (TIST)

Latest Articles

Combating Fake News: A Survey on Identification and Mitigation Techniques

The proliferation of fake news on social media has opened up new directions of research for timely identification and containment of fake news and mitigation of its widespread impact on public opinion. While much of the earlier research was focused on identification of fake news based on its contents or by exploiting users’ engagements with... (more)

Co-saliency Detection with Graph Matching

Recently, co-saliency detection, which aims to automatically discover common and salient objects appeared in several relevant images, has attracted increased interest in the computer vision community. In this article, we present a novel graph-matching based model for co-saliency detection in image pairs. A solution of graph matching is proposed to... (more)

Location-Specific Influence Quantification in Location-Based Social Networks

Location-based social networks (LBSNs) such as Foursquare offer a platform for users to share and be aware of each other’s physical movements.... (more)

Predicting Academic Performance for College Students: A Campus Behavior Perspective

Detecting abnormal behaviors of students in time and providing personalized intervention and guidance at the early stage is important in educational... (more)

Motion-Aware Compression and Transmission of Mesh Animation Sequences

With the increasing demand in using 3D mesh data over networks, supporting effective compression and efficient transmission of meshes has caught lots... (more)

Online Heterogeneous Transfer Learning by Knowledge Transition

In this article, we study the problem of online heterogeneous transfer learning, where the objective is to make predictions for a target data sequence... (more)

CNNs Based Viewpoint Estimation for Volume Visualization

Viewpoint estimation from 2D rendered images is helpful in understanding how users select viewpoints for volume visualization and guiding users to... (more)

A Semi-Boosted Nested Model With Sensitivity-Based Weighted Binarization for Multi-Domain Network Intrusion Detection

Effective network intrusion detection techniques are required to thwart evolving cybersecurity... (more)

A Local Mean Representation-based K-Nearest Neighbor Classifier

K-nearest neighbor classification method (KNN), as one of the top 10 algorithms in data mining, is a very simple and yet effective nonparametric... (more)


Recent TIST News: 

ACM Transactions on Intelligent Systems and Technology (TIST) is ranked as one of the best  journals in all ACM journals in terms of citations received per paper. Each paper published at TIST in the time span (from 2010 to 2018) has received 12.8 citations  on average in ACM Digital Library.  

ACM Transactions on Intelligent Systems and Technology (TIST) has been a success story.  Submissions to the journal have increase 76 percent from 2013 to 2015, from 278 original papers and revisions to 488.  Despite this increase, the journal acceptance rate has remained at a steady rate of approximately 24 percent. Furthermore, the TIST Impact Factor increased from 1.251 in 2014 to 3.19 in 2016.  

Journal Metric (2018)

  • - Impact Factor: 3.19
  • - 5-year Impact Factor: 10.47
  • - Avg. Citations in ACM DL: 12.8 

About TIST

ACM Transactions on Intelligent Systems and Technology (ACM TIST) is a scholarly journal that publishes the highest quality papers on intelligent systems, applicable algorithms and technology with a multi-disciplinary perspective. An intelligent system is one that uses artificial intelligence (AI) techniques to offer important services (e.g., as a component of a larger system) to allow integrated systems to perceive, reason, learn, and act intelligently in the real world. READ MORE

Forthcoming Articles

Measuring Conditional Independence by Independent Residuals for Causal Discovery

Local Learning Approaches for Finding Effects of a Specified Cause and Their Causal Paths

Causal networks are used to describe and to discover causal relationships among variables and data generating mechanisms. There have been many approaches for learning a global causal network of all observed variables. In many applications, we may be interested in finding what are the effects of a specified cause variable and what are the causal paths from the cause variable to its effects. Instead of learning a global causal network, we propose several local learning approaches for finding all effects (or descendants) of the specified cause variable and the causal paths from the cause variable to some effect variable of interest. We discuss the identifiability of the effects and the causal paths from observed data and prior knowledge. For the case that the causal paths are not identifiable, our approaches try to find a path set which contains the causal paths of interest.

Crowdsourcing Mechanism for Trust Evaluation in CPCS based on Intelligent Mobile Edge Computing

The combination of Cyber Physical Systems and Cloud Computing have received tremendous research interest and efforts from both academia and industry, which enables a new breed of applications and services and can fundamentally change the way that people interact with the physical world. However, due to the relative long distance between remote cloud and sensors, the Cloud Computing cannot provide real time service and fine-grained management for the end devices. Meanwhile, untrustworthy nodes may endanger the whole system. In this paper, we apply Intelligent Mobile Edge Computing to solve these problems. We first introduce the Mobile Crowdsourcing-Based Trust Evaluation Mechanism, where mobile edge users apply Artificial Intelligence to evaluate the trustfulness of sensor nodes. We then design two incentive mechanisms, i.e., Trustworthy Incentive Mechanism and Quality-Aware Trustworthy Incentive Mechanism. The first one aims to impel edge users to upload their real information about their capability and costs. The purpose of the second one is to motivate edge users to honestly conduct tasks and report results. Detailed theoretical analysis is performed to certify the effectiveness of the proposed mechanisms, which demonstrates the validity of Quality-Aware Trustworthy Incentive Mechanism from data trustfulness, effort trustfulness and quality trustfulness respectively. Extensive experiments are carried out to validate the proposed mechanisms. The results corroborate that the proposed mechanisms can efficiently stimulate mobile edge users to perform evaluation task honestly.

Short text analysis based on dual semantic extension and deep hashing in microblog

Short text analysis is a challenging task as far as the sparsity and limitation of semantics. The semantic extension approach learns the meaning of a short text by introducing external knowledge. However, for the randomness of short text descriptions in microblogs, traditional extension methods cannot accurately mine the semantics suitable for the microblog theme. Therefore, we use the prominent and refined hashtag information in microblogs, as well as complex social relationships to provide implicit guidance for semantic extension of short text. Specifically, we design a deep hash model based on social and conceptual semantic extension, which consists of dual semantic extension and deep hashing representation. In the extension method, the short text first is conceptualized to achieve the construction of hashtag graph under conceptual space. Then, the associated hashtags are generated by correlation calculation based on the integration of social relationships and concepts to extend the short text. In the deep hash model, we use the semantic hashing model to encode the abundant semantic features and form a compact and meaningful binary encoding. Finally, extensive experiments demonstrate that our method can learn and represent the short texts well by using more meaningful semantic signal. It can effectively enhance and guide the semantic analysis and understanding of short text in microblogs.

Deep Reinforcement Learning for Vehicular Edge Computing: An Intelligent Offloading System

The development of smart vehicles brings drivers and passengers a comfortable and safe environment. Various emerging applications are promising to enrich users? traveling experiences and daily life. However, how to execute computing-intensive applications on resource-constrained vehicles still faces huge challenges. In this paper, we construct an intelligent offloading system for vehicular edge computing by leveraging deep reinforcement learning. First, both the communication and computation states are modelled by finite Markov chains. Moreover, the task scheduling and resource allocation strategy is formulated as a joint optimization problem to maximize the revenue of network operators. Due to its complexity, the original problem is further divided into two sub-optimization problems. A two-sided matching scheme and a deep reinforcement learning approach are developed to schedule offloading requests and allocate network resources, respectively. Performance evaluations illustrate the effectiveness and superiority of our constructed system.

Edge-enabled Disaster Rescue: A Case Study of Searching for Missing People

In the aftermath of earthquakes, floods and other disasters, photos are increasingly playing more significant roles, such as finding missing people and assessing disasters, in disaster rescue and recovery efforts. These disaster photos are taken in real time by the crowd, unmanned aerial vehicles and wireless sensors. However, communications equipment is often damaged in disasters, and the very limited communication bandwidth restricts the upload of photos to the cloud center, seriously impeding disaster rescue endeavors. Based on edge computing, we propose Echo, a highly time-efficient disaster rescue framework. By utilizing the computing, storage and communication abilities of edge servers, disaster photos are preprocessed and analyzed in real time, and more specific visuals are immensely helpful for conducting emergency response and rescue. This paper takes the search for missing people as a case study to show that Echo can be more advantageous in terms of disaster rescue. To greatly conserve valuable communication bandwidth, only significantly associated images are extracted and uploaded to the cloud center for subsequent facial recognition. Furthermore, an adaptive photo detector is designed to utilize the precious and unstable communication bandwidth effectively, as well as ensuring the photo detection precision and recall rate. The effectiveness and efficiency of the proposed method are demonstrated by simulation experiments.

Multi-modal Curriculum Learning over Graphs

Curriculum Learning (CL) is a recently proposed learning paradigm that aims to achieve satisfactory performance by properly organizing the learning sequence from simple curriculum examples to more difficult ones. Up to now, few works have been done to explore CL for the data with graph structure. Therefore, this paper proposes a novel CL algorithm that can be utilized to guide the Label Propagation (LP) over graphs, of which the target is to ?learn? the labels of unlabeled examples on the graphs. Specifically, we assume that different unlabeled examples have different levels of difficulty for propagation, and their label learning should follow a simple-to-difficult sequence with the updated curriculums. Furthermore, considering that the practical data are often characterized by multiple modalities, every modality in our method is associated with a ?teacher? that not only evaluates the difficulties of examples from its own viewpoint, but also cooperates with other teachers to generate the overall simplest curriculum examples for propagation. By taking the curriculums suggested by the teachers as a whole, the common preference (i.e. commonality) of teachers on selecting the simplest examples can be discovered by a row-sparse matrix, and their distinct opinions (i.e. individuality) are captured by a sparse noise matrix. As a result, an accurate curriculum sequence can be established and the propagation quality can thus be improved. Theoretically, we prove that the propagation risk bound is closely related to the examples? difficulty information, and empirically, we show that our method can generate higher accuracy than the state-of-the-art CL approach and LP algorithms on various multi-modal tasks.

A Trust Computing based Security Routing Scheme for Cyber Physical Systems

Security is a pivotal issue for Cyber Physical Systems (CPS)'s development. The trusted computing base of CPS includes the complete protection mechanisms, like hardware, firmware, software, the combination of which is responsible for enforcing a system security policy. A Trust Detection Based Secured Routing (TDSR) scheme is proposed to establish secured routes from source nodes to data center under malicious environment to achieve a satisfactory security level in the Cyber Physical Systems (CPS). In the TDSR scheme, sensor node in the routing from data center sends detection routing to identify relay nodes? trust. Thus, the trust of node can be obtained, then, data packet is routed through trustworthy nodes to sink securely. In TDSR scheme, due to the detection routing is executed in those nodes who have abundant energy, the lifetime cannot be affected. Performance evaluation through simulation is carried out for success routing ratio, compromised node detection ratio, and detection routing overhead. We found that performance can be improved in TDSR compared to previous schemes.

Privacy-Aware Tag Recommendation for Accurate Image Privacy Prediction

Online images? tags are very important for indexing, sharing, and searching of images, as well as surfacing images with private or sensitive content, which needs to be protected. Social media sites such as Flickr generate these metadata from user-contributed tags. However, as the tags are at the sole discretion of users, these tags tend to be noisy and incomplete. In this paper, we present a privacy-aware approach to automatic image tagging, which aims at improving the quality of user annotations, while also preserving the images? original privacy sharing patterns. Precisely, we recommend potential tags for each target image by mining privacy-aware tags from the most similar images of the target image, which are obtained from a large collection. Experimental results show that, although the user-input tags comprise noise, our privacy-aware approach is able to predict accurate tags that can improve the performance of a downstream application on image privacy prediction, and outperforms an existing privacy-oblivious approach to image tagging. The results also show that, even for images that do not have any user tags, our proposed approach can recommend accurate tags. Crowd-sourcing the predicted tags exhibits the quality of our privacy-aware recommended tags. Our code, features, and the dataset used in experiments are available at:

Distributed Deep Forest and its Application to Automatic Detection of Cash-out Fraud

Internet companies are facing the need of handling large-scale machine learning applications on a daily basis and distributed implementation of machine learning algorithms which can handle extra-large scale tasks is widely needed. Deep forest is a recently proposed deep learning framework which uses tree ensembles as its building blocks and it has achieved highly competitive results on various domains of tasks. However, it has not been tested on extremely large scale tasks. In this work, based on our parameter server system, we developed the distributed version of deep forest. To meet the need for real-world tasks, many improvements are introduced to the original deep forest model, including MART as base learners for efficiency and effectiveness consideration, the cost based method for handling prevalent class-imbalanced data, MART based feature selection for high dimension data and different evaluation metrics for automatically determining of the cascade level. We tested the deep forest model on an extra-large scale task, i.e., automatic detection of cash-out fraud, with more than 100 millions of training samples. Experimental results showed that the deep forest model has the best performance according to the evaluation metrics from different perspectives even with very little effort for parameter tuning. This model can block fraud transactions in a large amount of money \footnote{detail is business confidential} each day. Even compared with the best-deployed model, the deep forest model can additionally bring into a significant decrease of economic loss each day.

Stable specification search in structural equation models with latent variables

In our previous study, we introduced stable specification search for cross-sectional data (S3C). It is an exploratory causal method that combines the concept of stability selection and multi-objective optimization to search for stable and parsimonious causal structures across the entire range of model complexities. S3C, however, is designed to model causal relations among observed variables. In this study, we extended S3C to S3C-Latent, to model causal relations between latent variables that are measured through observed proxies. We evaluated S3C-Latent on simulated data and compared the results to those of PC-MIMBuild, an extension of the PC algorithm, the state-of-the-art causal discovery method. The comparison showed that S3C-Latent achieved better performance. We also applied S3C-Latent to real-world data of children with attention deficit/hyperactivity disorder and data about measuring mental abilities among pupils. The results are consistent with those of previous studies.

Trajectory Data Classification: A Review

This paper comprehensively surveys the development of trajectory data classification. Considering the critical role of trajectory data classification in modern intelligent systems for surveillance security, abnormal behavior detection, crowd behavior analysis and traffic control, trajectory data classification has attracted growing attention. According to the availability of manual labels which is critical to the classification performances, the methods can be classified into three categories, i.e., unsupervised, semi-supervised and supervised. Furthermore, classification methods are divided into some sub-categories according to what extracted features are used. We provide a holistic understanding and deep insight into three types of trajectory data classification methods and presents some promising future directions.

Survey and cross-benchmark comparison of remaining time prediction methods in business process monitoring

Predictive business process monitoring methods exploit historical process execution logs to generate predictions about running instances (called cases) of a business process, such as the prediction of the outcome, next activity or remaining cycle time of a given process case. These insights could be used to support operational managers in taking remedial actions as business processes unfold, e.g. shifting resources from one case onto another to ensure this latter is completed on time. A number of methods to tackle the remaining cycle time prediction problem have been proposed in the literature. However, due to differences in their experimental setup, choice of datasets, evaluation measures and baselines, the relative merits of each method remain unclear. This article presents a systematic literature review and taxonomy of methods for remaining time prediction in the context of business processes, as well as a cross-benchmark comparison of 16 such methods based on 16 real-life datasets originating from different industry domains.

Spatial Ensemble Learning for Heterogeneous Geographic Data with Class Ambiguity

Class ambiguity refers to the phenomenon whereby samples with similar features belong to different classes at different locations. Given heterogeneous geographic data with class ambiguity, the spatial ensemble learning (SEL) problem aims to find a decomposition of the geographic area into disjoint zones such that class ambiguity is minimized and a local classifier can be learned in each zone. SEL problem is important for applications such as land cover mapping from heterogeneous earth observation data with spectral confusion. However, the problem is challenging due to its high computational cost (finding an optimal zone partition is NP-hard). Related work in ensemble learning either assumes an identical sample distribution (e.g., bagging, boosting, random forest) or decomposes multi-modular input data in the feature vector space (e.g., mixture of experts, multimodal ensemble), and thus cannot effectively minimize class ambiguity. In contrast, our spatial ensemble framework explicitly partitions input data in geographic space. Our approach first preprocesses data into homogeneous spatial patches and uses a greedy heuristic to allocate pairs of patches with high class ambiguity into different zones. We further extend our spatial ensemble learning framework with spatial dependency between nearby zones based on the spatial autocorrelation effect. Both theoretical analysis and experimental evaluations on two real world wetland mapping datasets show the feasibility of the proposed approach.

BAMB: A Balanced Markov Blanket Discovery Approach to Feature Selection

The discovery of Markov blanket (MB) for feature selection has attracted much attention in recent years since the MB of the class attribute is the optimal feature subset for feature selection. However, almost all existing MB discovery algorithms focus on either improving computational efficiency or boosting learning accuracy, instead of both. In this paper, we propose a novel MB discovery algorithm for balancing efficiency and accuracy, called BAMB (BAlanced Markov Blanket discovery). To achieve this goal, given a class attribute of interest, BAMB finds candidate PC (parents and children) and spouses and removes false positives from the candidate MB set in one go. Specifically, once a feature is successfully added to the current PC set, BAMB finds the spouses with regard to this feature, then uses the updated PC and spouse set to remove false positives from the current MB set. This makes the set of PC and spouses of the target as small as possible, and thus to achieve a trade-off between computational efficiency and learning accuracy. In the experiments, we use 10 real-world datasets and compare BAMB with 12 feature selection algorithms, including 8 state-of-the-art MB discovery algorithms and 4 other well-established feature selection methods. On prediction accuracy, BAMB outperforms 12 feature selection algorithms compared. On computational efficiency in MB discovery, BAMB is close to the IAMB algorithm while is much faster than the remaining seven MB discovery algorithms.

A Visual Analysis Approach for Understanding Durability Test Data of Automotive Products

In the current era of Industry 4.0, people are facing data-rich manufacturing environments. Visual analytics, as an important technology for explaining and understanding complex data, has been increasingly introduced into industrial data analysis scenarios. Taking the durability test of automotive starter as background, this paper proposes a visual analysis approach for understanding large-scale and long-term starter durability test data. Guided by detailed scenario and requirement analyses, we first propose a migration-adapted DBSCAN algorithm to identify starting modes and abnormal tests. This algorithm adopts a segmentation strategy and a group of matching and updating operations to achieve an efficient and accurate clustering analysis on the data. Next, we design and implement a visual analysis system that provides a set of user-friendly visual designs and lightweight interactions to help people gain data insights on test process overview, test data patterns and durability performance dynamics. Finnaly, we conduct a quantitative algorithm evaluation, a case study and a user interview by using real-world starter duarbility test datasets. The result demonstrates the effectiveness of the approach and its possible inspiration to the durability test data analysis of other similar industrial products.

Large-scale Frequent Episode Mining from Complex Event Sequences with Hierarchies

Frequent Episode Mining (FEM), which aims at mining frequent sub-sequences from single long event sequence, is one of the essential building blocks for sequence mining research field. Existing studies about FEM suffer from unsatisfied scalability when facing with complex sequences as it is an NP-complete problem for testing whether an episode occurs in a sequence. In this paper, we propose a scalable, distributed framework to support FEM on ?big? event sequences. As a rule of thumb, ?big? illustrates an event sequence is either very long or with masses of simultaneous events. Meanwhile, the events in this paper are arranged in a predefined hierarchy. It derives some abstractive events which can form episodes may not directly appear in the input sequence. Specifically, we devise an event-centered and hierarchy-aware partitioning strategy to allocate events from different levels of the hierarchy into local processes. We then present an efficient special-purpose algorithm to improve the local mining performance. We also extend our framework to support maximal and closed episode mining in the context of event hierarchy, and to the best of our knowledge, we are the first attempt to define and discover hierarchy-aware maximal and closed episodes. We implement the proposed framework on Apache Spark and conduct experiments on both synthetic and real-world datasets. Experimental results demonstrate the efficiency and scalability of the proposed approach and show that we can find practical patterns when taking event hierarchies into account.

DiffQue: Estimating Relative Difficulty of Questions in Community Question Answering Services

Automatic estimation of relative difficulty of a pair of questions is an important and challenging problem in community question answering (CQA) services. There are limited studies which addressed this problem. Past studies mostly leveraged expertise of users answering the questions and barely considered other properties of CQA services such as metadata of users and posts, temporal information and textual content. In this paper, we propose DiffQue, a novel system that maps this problem to a network-aided edge directionality prediction problem. DiffQue starts by constructing a novel network structure that captures different notions of difficulties among a pair of questions. It then measures the relative difficulty of two questions by predicting the direction of a (virtual) edge connecting these two questions in the network. It leverages features extracted from the network structure, metadata of users/posts and textual description of questions and answers. Experiments on datasets obtained from two CQA sites (further divided into four datasets) with human annotated ground-truth show that DiffQue outperforms four state-of-the-art methods by a significant margin (28.77% higher F1 score and 28.72% higher AUC than the best baseline). As opposed to the other baselines, (i) DiffQue appropriately responds to the training noise, (ii) DiffQue is capable of adapting multiple domains (CQA datasets), and (iii) DiffQue can efficiently handle ?cold start? problem which may arise due to the lack of information for newly posted questions or newly arrived users. We also design a prototype system of DiffQue and make it available at

ALERA: Accelerated Reinforcement Learning Driven Adaptation to Electro-Mechanical Degradation in Nonlinear Control Systems Using Encoded State Space Error Signatures

The successful deployment of autonomous real-time systems is contingent on their ability to recover from performance degradation of sensors, actuators and other electro-mechanical subsystems. This must be achieved with low latency and high coverage of multi-parameter failure mechanisms. In this paper, we introduce ALERA, a novel framework for real-time control law adaptation in nonlinear control systems assisted by system state encodings. These state encodings generate an error signal when the code properties are violated in the presence of failures. The core idea of this research is based on the premise that the time-domain error signal response contains perturbed system parameters? diagnostic information that can be used for quick control law adaptation to failure conditions. Quick adaptation is performed via reinforcement learning algorithms that relearn the control law of the perturbed system from a starting condition dictated by the diagnostic information. The fast (upto 80X faster than traditional reinforcement learning paradigms) performance recovery enabled by ALERA is demonstrated on an inverted pendulum balancing problem, a brake-by-wire system and a self-balancing robot.

Spatio-Temporal Adaptive Pricing for Balancing Mobility-on-Demand Networks

Pricing in mobility-on-demand (MOD) networks, such as Uber, Lyft and connected taxicabs, is done adaptively by leveraging the price responsiveness of drivers (supplies) and passengers (demands) to achieve such goals as maximizing drivers? incomes, improving riders? experience and sustaining platform operation. Existing pricing policies only respond to short-term demand fluctuations without accurate trip forecast and spatial demand-supply balancing, thus mismatching drivers to riders and resulting in loss of profit. We propose CAPrice , a novel adaptive pricing scheme for urban MOD networks. It uses a new spatio-temporal deep capsule network ( STCapsNet ) that accurately predicts ride demands and driver supplies with vectorized neuron capsules while accounting for comprehensive spatio-temporal and external factors. Given accurate perception of zone-to-zone traffic flows in a city, CAPrice formulates a joint optimization problem by considering spatial equilibrium to balance the platform, drivers and riders/passengers with proactive pricing ?signals.? We have conducted an extensive experimental evaluation upon over 4 × 10^8 MOD trips (Uber, Didi Chuxing, and connected taxicabs) in New York City, Beijing and Chengdu, validating the accuracy, effectiveness and profitability (often 20% ride prediction accuracy and 30% profit improvements over the state-of-the-arts) of CAPrice in managing urban MOD networks.

Take a Look Around: Using Street View and Satellite Images to Estimate House Prices

When an individual purchases a home, they simultaneously purchase its structural features, its accessibility to work, and the neighborhood amenities. Some amenities, such as air quality, are measurable whilst others, such as the prestige or the visual impression of a neighborhood, are difficult to quantify. Despite the well-known impacts intangible housing features have on house prices, limited attention has been given to systematically quantifying these difficult to measure amenities. Two issues have lead to this neglect. Not only do few quantitative methods exist that can measure the urban environment, but that the collection of such data is both costly and subjective. We show that street image and satellite image data can capture these urban qualities and improve the estimation of house prices. We propose a pipeline that uses a deep neural network model to automatically extract visual features from images to estimate house prices in London, UK.We make use of traditional housing features such as age, size and accessibility as well as visual features from Google Street View images and Bing aerial images in estimating the house price model. We find encouraging results where learning to characterize the urban quality of a neighborhood improves house price prediction, even when generalizing to previously unseen London boroughs. We explore the use of non-linear vs. linear methods to fuse these cues with conventional models of house pricing, and show how they interpretability of linear models allows us to directly extract proxy variables for visual desirability of neighborhoods that are both of interest in their own right, and could be used as inputs to other econometric methods. This is particularly valuable as once the network has been trained with the training data, it can be applied elsewhere, allowing us to generate vivid dense maps of the visual appeal of London streets.

Efficient User Guidance for Validating Participatory Sensing Data

Participatory sensing has become a new data collection paradigm that leverages the wisdom of the crowd for big data applications, without spending cost to buy dedicated sensors. It collects data from human sensors by using their own devices such as cell phone accelerometers, cameras, and GPS devices. This benefit comes with a drawback: human sensors are arbitrary and inherently uncertain due to the lack of quality guarantee. Moreover, participatory sensing data are time series that exhibit not only highly irregular dependencies on time but also high variance between sensors. To overcome these limitations, we formulate the problem of validating uncertain time series collected by participatory sensors. In this paper, we approach the problem by an iterative validation process on top of a probabilistic time series model. First, we generate a series of probability distributions from raw data by tailoring a state-of-the-art dynamical model, namely GARCH, for our joint time series setting. Second, we design a feedback process that consists of a adaptive aggregation model to unify the joint probabilistic time series and an efficient user guidance model to validate aggregated data with minimal effort. Through extensive experimentation, we demonstrate the efficiency and effectiveness of our approach on both real data and synthetic data. Highlights from our experiences include the fast running time of probabilistic model, the robustness of aggregation model to outliers, and the significant effort saving of guidance model.

Comparison and Modelling of Country-Level Micro-blog User Behaviour and Activity in Cyber-Physical-Social Systems using Weibo and Twitter Data

As the rapid development of social media technologies, cyber-physical-social system (CPSS) has been a hot topic in many industrial applications. The use of ?micro-blogging? service, such as Twitter, has rapidly become an influential way to share information. While recent studies have revealed that understanding and modelling micro-blog user behavior on massive users? behaviors data in social media in CPSS are very keen to success of many practical applications, a key challenge in the literature is that the diversity of geographic and cultures strongly affect micro-blog user behavior and activity. The motivation of this paper is to understand differences and similarities between the behaviors of users from different countries using social networking platforms, and to attempt to build up a Country-Level Micro-Blog User (CLMB) behavior and activity model for CPSS applications. We proposed a Country-Level Micro-Blog User (CLMB) behavior and activity model for analysis micro-blogging user?s behavior across different countries in the CPSS applications. This CLMB model has considered three important user behavior characteristics including content of micro-blogging, user emotion index and user relationship network. Based on the CUBM model, under the sample dataset, 16 countries with the largest number of representative and active users in the world were selected, and the characteristics of user microblog behavior in these 16 countries were analyzed. The experimental results show that for countries with small population and strong cohesiveness, users pay more attention to the social function of micro-blogging; on the contrary, in countries with large loose social groups, users use micro-blogging as a news dissemination platform to further analyze the micro-blogs of these countries. The blog's characterization data shows that users in countries whose social network structure exhibits reciprocity rather than hierarchy will use more linguistic elements to express happiness in micro-blogging.

An Analysis of Approaches Taken in the ACM RecSys Challenge 2018 for Automatic Music Playlist Continuation

The ACM Recommender Systems Challenge 2018 focused on the task of automatic music playlist continuation, which is a form of the more general task of sequential recommendation. Given a playlist of arbitrary length with some additional meta-data, the task was to recommend up to 500 tracks that fit the target characteristics of the original playlist. For the RecSys Challenge, Spotify released a dataset of one million user-generated playlists. Participants could compete in two tracks, i.e., main and creative tracks. Participants in the main track were only allowed to use the provided training set, however, in the creative track, the use of external public sources was permitted. In total, 113 teams submitted 1,228 runs to the main track; 33 teams submitted 239 runs to the creative track. The highest performing team in the main track achieved an R-precision of 0.2241, an NDCG of 0.3946, and an average number of recommended songs clicks of 1.784. In the creative track, an R-precision of 0.2233, an NDCG of 0.3939, and a click rate of 1.785 was obtained by the best team. This article provides an overview of the challenge, including motivation, task definition, dataset description, and evaluation. We further report and analyze the results obtained by the top performing teams in each track and explore the approaches taken by the winners. We finally summarize our key findings and list the open avenues and possible future directions in the area of automatic playlist continuation.

PlayeRank: data-driven performance evaluation and player ranking in soccer via a machine learning approach

The problem of evaluating the performance of soccer players is attracting the interest of many companies and the scientific community, thanks to the availability of massive data capturing all the events generated during a match (e.g., tackles, passes, shots, etc.). Unfortunately, there is no consolidated and widely accepted metric for measuring performance quality in all of its facets. In this paper, we design and implement PlayeRank, a data-driven framework that offers a principled multi-dimensional and role-aware evaluation of the performance of soccer players. We build our framework by deploying a massive dataset of soccer-logs and consisting of millions of match events pertaining to four seasons of 18 prominent soccer competitions. By comparing PlayeRank to known algorithms for performance evaluation in soccer, and by exploiting a dataset of players' evaluations made by professional soccer scouts, we show that PlayeRank significantly outperforms the competitors. We also explore the ratings produced by PlayeRank and discover interesting patterns about the nature of excellent performances and what distinguishes the top players from the others. At the end, we explore some applications of PlayeRank -- i.e. searching players and player versatility -- showing its flexibility and efficiency, which makes it worth to be used in the design of a scalable platform for soccer analytics.

Energy-efficient Static Tank Scheduling on VFI based NoC-HMPSoCs for Intelligent Edge Devices in Cyber-Physical Systems

The interlinked processing units in the modern Cyber-Physical Systems (CPS) creates a large network of connected computing embedded systems. Network-on-Chip (NoC) based multiprocessor system-on-chip (MPSoC) architecture is becoming a de-facto computing platform for real-time applications due to its higher performance and Quality-of-Service (QoS). The number of processors has increased significantly on the multiprocessor systems in CPS therefore, Voltage Frequency Island (VFI) recently adopted for effective energy management mechanism in the large scale multiprocessor chip designs. In this paper, we investigate energy and contention-aware static scheduling for tasks with precedence and deadline constraints on intelligent edge devices deploying heterogeneous VFI based NoC-MPSoCs with DVFS-enabled processors. Unlike the existing population-based optimization algorithms, we propose a novel population-based algorithm called ARSH-FATI that can dynamically switch between explorative and exploitative search modes at run-time. Our static scheduler ARHS-FATI collectively performs task mapping, scheduling, and voltage scaling. Consequently, its performance is superior to the existing state-of-the-art approach proposed for homogeneous VFI based NoC-MPSoCs. We also developed a communication contention-aware Earliest Edge Consistent Deadline First (EECDF) scheduling algorithm and gradient descent inspired voltage scaling algorithm called Energy Gradient Decent (EGD). We have introduced a notion of Energy Gradient (EG) that guides EGD in its search for islands voltage settings and minimize the total energy consumption. We conducted the experiments on 8 real benchmarks adopted from Embedded Systems Synthesis Benchmarks (E3S). Our static scheduling approach ARSH-FATI outperformed state-of-the-art technique and achieved an average energy-efficiency of ? 24% and ? 30% over CA-TMES-Search and CA-TMES-Quick respectively.

Efficient and Privacy-preserving Fog-assisted Health Data Sharing Scheme

Pervasive data collected from e-healthcare devices possesses significant medical value through data sharing with professional healthcare service providers. However, health data sharing poses several security issues such as access control and privacy leakage, as well as faces critical challenges to obtain efficient data analysis and services. In this paper, we propose an efficient and privacy-preserving fog-assisted health data sharing (PFHDS) scheme for e-healthcare systems. Specifically, we integrate the fog server to classify the shared data into different categories according to disease risks for efficient health data analysis. Meanwhile, we design an enhanced attribute- based encryption method through combination of a personal access policy on patients and a professional access policy on the fog server for effective medical service provision. Furthermore, we achieve significant encryption consumption reduction for patients by offloading a portion of the computation and storage burden from patients to the fog server. Security discussions show that PFHDS realizes data confidentiality and fine-grained access control with collusion resistance. Performance evaluations demonstrate cost-efficient encryption computation, storage and energy consumption.

Multi-view Fusion with Extreme Learning Machine for Clustering

Unlabeled, multi-view data presents a considerable challenge in many real-world data analysis tasks. However, these data are worth exploring because they often contain complementary information that improves the quality of the analysis results. Clustering with multi-view data is a particularly challenging problem as revealing the complex data structures between many feature spaces demands discriminative features that are specific to the task and, when too few of these features are present, performance suffers. Extreme learning machines (ELMs) are an emerging form of learning model that have shown an outstanding representation ability and superior performance in a range of different learning tasks. Motivated by the promise of this advancement, we have developed a novel multi-view fusion clustering framework based on an ELM, called MVEC. MVEC learns the embeddings from each individual view of the data via the ELM network, then constructs a single unified embedding according to the correlations and dependencies between each embedding and automatically weighting the contribution of each. This process exposes the underlying clustering structures embedded within multi-view data with a high degree of accuracy. A simple yet efficient solution is also provided to solve the optimization problem within MVEC. Experiments and comparisons on eight diverse benchmarks from different domains confirm MVEC's clustering accuracy.

Lightweight Convolution Neural Networks for Mobile Edge Computing in Transportation Cyber Physical Systems

Cloud computing extends Transportation Cyber-Physical Systems (T-CPS) with provision of enhanced computing and storage capability via offloading computing tasks to remote cloud servers. However, cloud computing cannot fulfill the requirements such as low latency and context awareness in T-CPS. The appearance of Mobile Edge Computing (MEC) can overcome the limitations of cloud computing via offloading the computing tasks at edge servers in approximation to users consequently reducing the latency and improving the context awareness. Although MEC has the potential in improving T-CPS, it is incapable of processing computational-intensive tasks such as deep learning algorithms due to the intrinsic storage and computing-capability constraints. Therefore, we design and develop a lightweight deep learning model to support MEC applications in T-CPS. In particular, we put forth a stacked convolutional neural network (CNN) consisting of factorization convolutional layers alternating with compression layers (namely lightweight CNN-FC). Extensive experimental results show that our proposed lightweight CNN-FC can greatly decrease the number of unnecessary parameters consequently reducing the model size while maintaining the high accuracy in contrast to conventional CNN models. In addition, we also evaluate the performance of our proposed model in a realistic MEC platform. Specifically, experimental results at this MEC platform show that our model can maintain the high accuracy while preserving the portable model size.

Using Sparse Representation to Detect Anomalies in Complex WSNs

In recent years, wireless sensor networks (WSNs) have become an active area of research for monitoring physical and environmental conditions Due to the interdependence of sensors, a functional anomaly in one sensor can cause a functional anomaly in another sensor, which can further lead to the malfunctioning of the entire sensor network. Existing research work has a way to analyse faulty sensor anomalies, but fails to show the effectiveness throughout the entire interdependent network system. The gap in research on sensor network dependency can be filled by the abnormal nodes of the sensor network. In this paper, a dictionary learning algorithm based on a non-negative constraint is developed, and further a sparse representation anomaly node detection method for sensor networks is proposed based on the dictionary learning. Compared with other anomaly detection approaches, our method is more robust. The abnormal nodes are dealt with and compared with four commonly used ways to verify the robustness of our proposed method. Furthermore, the experiments are conducted on the obtained abnormal nodes to prove the interdependence of multi-layer sensor networks and reveal the conditions and causes of a system crash.

Accounting for hidden common causes when inferring cause and effect from observational data

Hidden common causes make it difficult to infer causal relationships from observational data. Here, we consider a new method to account for a hidden common cause that infers its presence from the data. As with other approaches that can account for common causes, this approach is successful only in some cases. We describe such a case taken from the field of genomics, wherein one tries to identify which genomic markers causally influence a trait of interest.

Detecting causal relationships in simulation models using intervention-based counterfactual analysis

Central to explanatory simulation models is their capability to not just show that but also why particular things happen. Explanation is closely related with the detection of causal relationships and is, in a simulation context, typically done by means of controlled experiments. However, for complex simulation models, conventional 'blackbox' experiments may be too coarse-grained to cope with spurious relationships. We present an intervention-based causal analysis methodology that exploits the manipulability of computational models and detects and circumvents spurious effects. The core of the methodology is a formal model that maps basic causal assumptions to causal observations and allows for the identification of combinations of assumptions that have a negative impact on observability. First experiments indicate that the methodology can successfully deal with notoriously tricky situations involving asymmetric and symmetric overdetermination and detect fine-grained causal relationships between events in the simulation. As illustrated in the paper, the methodology can be easily integrated into an existing simulation environment.

Correlated Multi-label Classification with Incomplete Label Space and Class Imbalance

Multi-label classification is defined as the problem of identifying the multiple labels or categories of new observations based on labeled training data. Multi-labeled data has several challenges including class imbalance, label correlation, incomplete multi-label matrices, and noisy and irrelevant features. In this paper, we propose an integrated multi-label approach (ML-CIB) for simultaneously training the multi-label classification model and addressing the aforementioned challenges. The model learns a new label matrix as well as capturing new label correlations, because it is difficult to find a complete label vector for each instance in real-world data. We also propose a label regularization to handle the imbalanced multi-labeled issue in the new label, and L1 regularization norm is incorporated in the objective function to select the relevant sparse features. A multi-label feature selection (ML-CIB-FS) method is presented as a variant of the proposed ML-CIB to show the efficacy of the proposed method in selecting the relevant features. ML-CIB is formulated as a constrained objective function. We use the accelerated proximal gradient method to solve the proposed optimisation problem. Lastly, extensive experiments are conducted on fifteen regular-scale and large-scale imbalanced multi-labeled dataset. The promising results show that our method significantly outperforms the state-of-the-art.

RecRules: Recommending IF-THEN Rules for End-User Development

End users, nowadays, can personalize their smart devices and web applications by defining or reusing IF-THEN rules through dedicated End-User Development (EUD) tools. Within such tools, however, the spread of new devices and web applications as well as the growing number of shared rules result in an overload of information and in a difficulty to discover new rules and functionality. We envision that recommendation techniques could help end users to find rules shared by others, and to suggest new functionality to be adopted during the definition of their own rules. In this paper, we tackle the emerging problem of recommending IF-THEN rules to end users by presenting RecRules, a hybrid and semantic recommendation system. Through a mixed content and collaborative approach, the goal of RecRules is to recommend by functionality, i.e., to suggest rules based on their final purposes rather than on details like manufacturers and brands. The algorithm uses a semantic reasoning process to enrich rules with semantic information, with the aim of uncovering hidden connections between rules in terms of shared functionality. Then, it builds a collaborative semantic graph, and it exploits different types of path-based features to train a learning to rank algorithm and compute top-N recommendations. We evaluate RecRules through different experiments on real user data extracted from IFTTT, one of the most popular EUD tool. Results are promising: they show the effectiveness of our approach with respect to other state-of-the-art algorithms, and open the way for a new class of recommender systems for EUD based on the actual functionality end users need.

All ACM Journals | See Full Journal Index

Search TIST
enter search term and/or author name