ACM DL

ACM Transactions on

Intelligent Systems and Technology (TIST)

Menu
Latest Articles

Survey and Cross-benchmark Comparison of Remaining Time Prediction Methods in Business Process Monitoring

Predictive business process monitoring methods exploit historical process execution logs to generate... (more)

Multi-Modal Curriculum Learning over Graphs

Curriculum Learning (CL) is a recently proposed learning paradigm that aims to achieve satisfactory performance by properly organizing the learning sequence from simple curriculum examples to more difficult ones. Up to now, few works have been done to explore CL for the data with graph structure. Therefore, this article proposes a novel CL... (more)

Large-Scale Frequent Episode Mining from Complex Event Sequences with Hierarchies

Frequent Episode Mining (FEM), which aims at mining frequent sub-sequences from a single long event sequence, is one of the essential building blocks... (more)

Efficient User Guidance for Validating Participatory Sensing Data

Participatory sensing has become a new data collection paradigm that leverages the wisdom of the crowd for big data applications without spending cost... (more)

Spatio-temporal Adaptive Pricing for Balancing Mobility-on-Demand Networks

Pricing in mobility-on-demand (MOD) networks, such as Uber, Lyft, and connected taxicabs, is done adaptively by leveraging the price responsiveness of... (more)

Personalized Reason Generation for Explainable Song Recommendation

Personalized recommendation has received a lot of attention as a highly practical research topic. However, existing recommender systems provide the... (more)

DiffQue: Estimating Relative Difficulty of Questions in Community Question Answering Services

Automatic estimation of relative difficulty of a pair of questions is an important and challenging problem in community question answering (CQA) services. There are limited studies that addressed this problem. Past studies mostly leveraged expertise of users answering the questions and barely considered other properties of CQA services such as... (more)

ALERA: Accelerated Reinforcement Learning Driven Adaptation to Electro-Mechanical Degradation in Nonlinear Control Systems Using Encoded State Space Error Signatures

The successful deployment of autonomous real-time systems is contingent on their ability to recover from performance degradation of sensors, actuators, and other electro-mechanical subsystems with low latency. In this article, we introduce ALERA, a novel framework for real-time control law adaptation in nonlinear control systems assisted by system... (more)

NEWS

Recent TIST News: 

ACM Transactions on Intelligent Systems and Technology (TIST) is ranked as one of the best  journals in all ACM journals in terms of citations received per paper. Each paper published at TIST in the time span (from 2010 to 2018) has received 12.8 citations  on average in ACM Digital Library.  

ACM Transactions on Intelligent Systems and Technology (TIST) has been a success story.  Submissions to the journal have increase 76 percent from 2013 to 2015, from 278 original papers and revisions to 488.  Despite this increase, the journal acceptance rate has remained at a steady rate of approximately 24 percent. Furthermore, the TIST Impact Factor increased from 1.251 in 2014 to 3.19 in 2016.  


Journal Metric (2018)

  • - Impact Factor: 3.19
  • - 5-year Impact Factor: 10.47
  • - Avg. Citations in ACM DL: 12.8 

About TIST

ACM Transactions on Intelligent Systems and Technology (ACM TIST) is a scholarly journal that publishes the highest quality papers on intelligent systems, applicable algorithms and technology with a multi-disciplinary perspective. An intelligent system is one that uses artificial intelligence (AI) techniques to offer important services (e.g., as a component of a larger system) to allow integrated systems to perceive, reason, learn, and act intelligently in the real world. READ MORE

Forthcoming Articles

Measuring Conditional Independence by Independent Residuals for Causal Discovery

Crowdsourcing Mechanism for Trust Evaluation in CPCS based on Intelligent Mobile Edge Computing

The combination of Cyber Physical Systems and Cloud Computing have received tremendous research interest and efforts from both academia and industry, which enables a new breed of applications and services and can fundamentally change the way that people interact with the physical world. However, due to the relative long distance between remote cloud and sensors, the Cloud Computing cannot provide real time service and fine-grained management for the end devices. Meanwhile, untrustworthy nodes may endanger the whole system. In this paper, we apply Intelligent Mobile Edge Computing to solve these problems. We first introduce the Mobile Crowdsourcing-Based Trust Evaluation Mechanism, where mobile edge users apply Artificial Intelligence to evaluate the trustfulness of sensor nodes. We then design two incentive mechanisms, i.e., Trustworthy Incentive Mechanism and Quality-Aware Trustworthy Incentive Mechanism. The first one aims to impel edge users to upload their real information about their capability and costs. The purpose of the second one is to motivate edge users to honestly conduct tasks and report results. Detailed theoretical analysis is performed to certify the effectiveness of the proposed mechanisms, which demonstrates the validity of Quality-Aware Trustworthy Incentive Mechanism from data trustfulness, effort trustfulness and quality trustfulness respectively. Extensive experiments are carried out to validate the proposed mechanisms. The results corroborate that the proposed mechanisms can efficiently stimulate mobile edge users to perform evaluation task honestly.

Short text analysis based on dual semantic extension and deep hashing in microblog

Short text analysis is a challenging task as far as the sparsity and limitation of semantics. The semantic extension approach learns the meaning of a short text by introducing external knowledge. However, for the randomness of short text descriptions in microblogs, traditional extension methods cannot accurately mine the semantics suitable for the microblog theme. Therefore, we use the prominent and refined hashtag information in microblogs, as well as complex social relationships to provide implicit guidance for semantic extension of short text. Specifically, we design a deep hash model based on social and conceptual semantic extension, which consists of dual semantic extension and deep hashing representation. In the extension method, the short text first is conceptualized to achieve the construction of hashtag graph under conceptual space. Then, the associated hashtags are generated by correlation calculation based on the integration of social relationships and concepts to extend the short text. In the deep hash model, we use the semantic hashing model to encode the abundant semantic features and form a compact and meaningful binary encoding. Finally, extensive experiments demonstrate that our method can learn and represent the short texts well by using more meaningful semantic signal. It can effectively enhance and guide the semantic analysis and understanding of short text in microblogs.

Deep Reinforcement Learning for Vehicular Edge Computing: An Intelligent Offloading System

The development of smart vehicles brings drivers and passengers a comfortable and safe environment. Various emerging applications are promising to enrich users? traveling experiences and daily life. However, how to execute computing-intensive applications on resource-constrained vehicles still faces huge challenges. In this paper, we construct an intelligent offloading system for vehicular edge computing by leveraging deep reinforcement learning. First, both the communication and computation states are modelled by finite Markov chains. Moreover, the task scheduling and resource allocation strategy is formulated as a joint optimization problem to maximize the revenue of network operators. Due to its complexity, the original problem is further divided into two sub-optimization problems. A two-sided matching scheme and a deep reinforcement learning approach are developed to schedule offloading requests and allocate network resources, respectively. Performance evaluations illustrate the effectiveness and superiority of our constructed system.

Edge-enabled Disaster Rescue: A Case Study of Searching for Missing People

In the aftermath of earthquakes, floods and other disasters, photos are increasingly playing more significant roles, such as finding missing people and assessing disasters, in disaster rescue and recovery efforts. These disaster photos are taken in real time by the crowd, unmanned aerial vehicles and wireless sensors. However, communications equipment is often damaged in disasters, and the very limited communication bandwidth restricts the upload of photos to the cloud center, seriously impeding disaster rescue endeavors. Based on edge computing, we propose Echo, a highly time-efficient disaster rescue framework. By utilizing the computing, storage and communication abilities of edge servers, disaster photos are preprocessed and analyzed in real time, and more specific visuals are immensely helpful for conducting emergency response and rescue. This paper takes the search for missing people as a case study to show that Echo can be more advantageous in terms of disaster rescue. To greatly conserve valuable communication bandwidth, only significantly associated images are extracted and uploaded to the cloud center for subsequent facial recognition. Furthermore, an adaptive photo detector is designed to utilize the precious and unstable communication bandwidth effectively, as well as ensuring the photo detection precision and recall rate. The effectiveness and efficiency of the proposed method are demonstrated by simulation experiments.

A Trust Computing based Security Routing Scheme for Cyber Physical Systems

Security is a pivotal issue for Cyber Physical Systems (CPS)'s development. The trusted computing base of CPS includes the complete protection mechanisms, like hardware, firmware, software, the combination of which is responsible for enforcing a system security policy. A Trust Detection Based Secured Routing (TDSR) scheme is proposed to establish secured routes from source nodes to data center under malicious environment to achieve a satisfactory security level in the Cyber Physical Systems (CPS). In the TDSR scheme, sensor node in the routing from data center sends detection routing to identify relay nodes? trust. Thus, the trust of node can be obtained, then, data packet is routed through trustworthy nodes to sink securely. In TDSR scheme, due to the detection routing is executed in those nodes who have abundant energy, the lifetime cannot be affected. Performance evaluation through simulation is carried out for success routing ratio, compromised node detection ratio, and detection routing overhead. We found that performance can be improved in TDSR compared to previous schemes.

Forecasting Price Trend of Bulk Commodities Leveraging Cross-domain Open Data Fusion

Forecasting price trend of bulk commodities is important in international trade, not only for markets participants to schedule production and marketing plans, but also for government administrators to adjust policies. Previous studies can not support accurate fine-grained short-term prediction, since they mainly focus on coarse-grained long-term prediction using historical data. Recently, cross-domain open data provides possibilities to conduct fine-grained price forecasting, since they can be leveraged to extract various direct and indirect factors of the price. In this paper, we predict the price trend over upcoming days, by leveraging cross-domain open data fusion. More specifically, we formulate the price trend into three classes (rise, slight-change and fall), and then predict the specific class in which the price trend of the future day lies. We take three factors into consideration: (1) supply factor considering sources providing bulk commodities, (2) demand factor focusing on vessel transportation with reflection of short time needs, and (3) expectation factor encompassing indirect features (e.g. air quality) with latent influences. A hybrid classification framework is proposed for the price trend forecasting. Evaluation conducted on nine real-world cross-domain open datasets shows that our framework can forecast the price trend accurately, outperforming multiple state-of-the-art baselines.

Distributed Deep Forest and its Application to Automatic Detection of Cash-out Fraud

Internet companies are facing the need of handling large-scale machine learning applications on a daily basis and distributed implementation of machine learning algorithms which can handle extra-large scale tasks is widely needed. Deep forest is a recently proposed deep learning framework which uses tree ensembles as its building blocks and it has achieved highly competitive results on various domains of tasks. However, it has not been tested on extremely large scale tasks. In this work, based on our parameter server system, we developed the distributed version of deep forest. To meet the need for real-world tasks, many improvements are introduced to the original deep forest model, including MART as base learners for efficiency and effectiveness consideration, the cost based method for handling prevalent class-imbalanced data, MART based feature selection for high dimension data and different evaluation metrics for automatically determining of the cascade level. We tested the deep forest model on an extra-large scale task, i.e., automatic detection of cash-out fraud, with more than 100 millions of training samples. Experimental results showed that the deep forest model has the best performance according to the evaluation metrics from different perspectives even with very little effort for parameter tuning. This model can block fraud transactions in a large amount of money \footnote{detail is business confidential} each day. Even compared with the best-deployed model, the deep forest model can additionally bring into a significant decrease of economic loss each day.

Stable specification search in structural equation models with latent variables

In our previous study, we introduced stable specification search for cross-sectional data (S3C). It is an exploratory causal method that combines the concept of stability selection and multi-objective optimization to search for stable and parsimonious causal structures across the entire range of model complexities. S3C, however, is designed to model causal relations among observed variables. In this study, we extended S3C to S3C-Latent, to model causal relations between latent variables that are measured through observed proxies. We evaluated S3C-Latent on simulated data and compared the results to those of PC-MIMBuild, an extension of the PC algorithm, the state-of-the-art causal discovery method. The comparison showed that S3C-Latent achieved better performance. We also applied S3C-Latent to real-world data of children with attention deficit/hyperactivity disorder and data about measuring mental abilities among pupils. The results are consistent with those of previous studies.

Mixture of Joint Nonhomogeneous Markov Chains to Cluster and Model Water Consumption Behavior Sequences

The emergence of smart meters has fostered the collection of massive data that support a better understanding of consumer behaviors and better management of water resources and networks. The main focus of this paper is to analyze consumption behavior over time; thus, we first identify the main weekly consumption patterns. This approach allows each meter to be represented by a categorical series, where each category corresponds to a weekly consumption behavior. By considering the resulting consumption behavior sequences, we propose a new methodology based on a mixture of nonhomogeneous Markov models to cluster these categorical time series. Using this method, the meters are described by the Markovian dynamics of their cluster. The latent variable that controls cluster membership is estimated alongside the parameters of the Markov model using a novel classification expectation maximization (CEM) algorithm. A specific entropy measure is formulated to evaluate the quality of the estimated partition by considering the joint Markovian dynamics. The proposed clustering model can also be used to predict future consumption behaviors within each cluster. Numerical experiments using real water consumption data provided by a water utility in France and gathered over nineteen months are conducted to evaluate the performance of the proposed approach in terms of both clustering and prediction. The results demonstrate the effectiveness of the proposed method.

Secure Deduplication System with Active Key Update and Its Application in IoT

The rich cloud services in the Internet of Things create certain needs for edge computing, in which devices should well enough to handle storage tasks securely, reliability, and efficiently. When processing the storage requests from edge devices, a cloud server is supposed to eliminate duplicate copies of repeating data to reduce the amount of storage space and save bandwidth. However, to protect the data confidentiality while supporting such a deduplication in edge computing, we need to tackle two main challenges to encrypt data before uploading: (i) the fingerprint of the encrypted data should be indeterministic to prevent brute-force attacks; and (ii) the encryption key can be updated efficiently by power-constrained devices when a key leakage is happened on some edge nodes. In this paper, we propose a deduplication system which provides the active key update in a practical manner. We introduce random keys in the convergent encryption while retaining the deduplication over encrypted files across users, and then design a novel approach to transform a ciphertext encrypted with a revoked key into a ciphertext encrypted with a new key without on-line interactions between devices. The security analysis is given in terms of the proposed security model. The experimental analysis shows that the scheme is also practical.

BAMB: A Balanced Markov Blanket Discovery Approach to Feature Selection

The discovery of Markov blanket (MB) for feature selection has attracted much attention in recent years since the MB of the class attribute is the optimal feature subset for feature selection. However, almost all existing MB discovery algorithms focus on either improving computational efficiency or boosting learning accuracy, instead of both. In this paper, we propose a novel MB discovery algorithm for balancing efficiency and accuracy, called BAMB (BAlanced Markov Blanket discovery). To achieve this goal, given a class attribute of interest, BAMB finds candidate PC (parents and children) and spouses and removes false positives from the candidate MB set in one go. Specifically, once a feature is successfully added to the current PC set, BAMB finds the spouses with regard to this feature, then uses the updated PC and spouse set to remove false positives from the current MB set. This makes the set of PC and spouses of the target as small as possible, and thus to achieve a trade-off between computational efficiency and learning accuracy. In the experiments, we use 10 real-world datasets and compare BAMB with 12 feature selection algorithms, including 8 state-of-the-art MB discovery algorithms and 4 other well-established feature selection methods. On prediction accuracy, BAMB outperforms 12 feature selection algorithms compared. On computational efficiency in MB discovery, BAMB is close to the IAMB algorithm while is much faster than the remaining seven MB discovery algorithms.

A Visual Analysis Approach for Understanding Durability Test Data of Automotive Products

In the current era of Industry 4.0, people are facing data-rich manufacturing environments. Visual analytics, as an important technology for explaining and understanding complex data, has been increasingly introduced into industrial data analysis scenarios. Taking the durability test of automotive starter as background, this paper proposes a visual analysis approach for understanding large-scale and long-term starter durability test data. Guided by detailed scenario and requirement analyses, we first propose a migration-adapted DBSCAN algorithm to identify starting modes and abnormal tests. This algorithm adopts a segmentation strategy and a group of matching and updating operations to achieve an efficient and accurate clustering analysis on the data. Next, we design and implement a visual analysis system that provides a set of user-friendly visual designs and lightweight interactions to help people gain data insights on test process overview, test data patterns and durability performance dynamics. Finnaly, we conduct a quantitative algorithm evaluation, a case study and a user interview by using real-world starter duarbility test datasets. The result demonstrates the effectiveness of the approach and its possible inspiration to the durability test data analysis of other similar industrial products.

Discovering Interesting Sub-Paths with Statistical Significance from Spatio-temporal Datasets

Given a path in a spatial or temporal framework, we aim to find all contiguous sub-paths that are both interesting (e.g., abrupt changes) and statistically significant (i.e., persistent trends rather than local fluctuations). Discovering interesting sub-paths can provide meaningful information for a variety of domains including Earth science, environmental science and urban planning, etc. Existing methods are limited to detecting individual points of interest along an input path but cannot find interesting sub-paths. Our preliminary work provided a Sub-path Enumeration and Pruning (SEP) algorithm to detect interesting sub-path of arbitrary length. However, SEP is not effective in avoiding sub-paths that are random variations rather than meaningful trends, which hampers clear and proper interpretations of the results. In this paper, we extend our previous work by proposing a statistical significance test framework to eliminate these random variations. To compute the statistical significance, we first show a baseline Monte-Carlo method based on our previous work and then propose a Dynamic Search-and-Prune (D-SAP) algorithm to improve its computational efficiency. Our experiments show that the significance testing can greatly suppress the noisy detections in the output and D-SAP can greatly reduce the execution time.

Take a Look Around: Using Street View and Satellite Images to Estimate House Prices

When an individual purchases a home, they simultaneously purchase its structural features, its accessibility to work, and the neighborhood amenities. Some amenities, such as air quality, are measurable whilst others, such as the prestige or the visual impression of a neighborhood, are difficult to quantify. Despite the well-known impacts intangible housing features have on house prices, limited attention has been given to systematically quantifying these difficult to measure amenities. Two issues have lead to this neglect. Not only do few quantitative methods exist that can measure the urban environment, but that the collection of such data is both costly and subjective. We show that street image and satellite image data can capture these urban qualities and improve the estimation of house prices. We propose a pipeline that uses a deep neural network model to automatically extract visual features from images to estimate house prices in London, UK.We make use of traditional housing features such as age, size and accessibility as well as visual features from Google Street View images and Bing aerial images in estimating the house price model. We find encouraging results where learning to characterize the urban quality of a neighborhood improves house price prediction, even when generalizing to previously unseen London boroughs. We explore the use of non-linear vs. linear methods to fuse these cues with conventional models of house pricing, and show how they interpretability of linear models allows us to directly extract proxy variables for visual desirability of neighborhoods that are both of interest in their own right, and could be used as inputs to other econometric methods. This is particularly valuable as once the network has been trained with the training data, it can be applied elsewhere, allowing us to generate vivid dense maps of the visual appeal of London streets.

Market Clearing based Dynamic Multi-Agent Task Allocation

Realistic multi-agent team applications often feature dynamic environments with soft deadlines that penalize late execution of tasks. This puts a premium on quickly allocating tasks to agents, but finding the optimal allocation is NP-hard due to temporal and spatial constraints that require tasks to be executed sequentially by agents. We propose FMC_TA, a novel task allocation algorithm that allows tasks to be easily sequenced to yield high-quality solutions. FMC_TA first finds allocations that are fair (envy-free), balancing the load and sharing important tasks among agents, and efficient (Pareto optimal) in a simplified version of the problem. It computes such allocations in polynomial or pseudo-polynomial time (centrally or distributedly, respectively) using a Fisher market with agents as buyers and tasks as goods. It then heuristically schedules the allocations, taking into account inter-agent constraints on shared tasks. We empirically compare our algorithm to state-of-the-art incomplete methods, both centralized and distributed, on law enforcement problems inspired by real police logs. The results show a clear advantage for FMC_TA in total utility and in measures in which law enforcement authorities measure their own performance. The domination of the proposed algorithm is consistent when the problem scales.

Graph-based recommendation meets Bayes and similarity measures

Graph-based approaches provide an effective memory-based alternative to latent factor models for collaborative recommendation. Modern approaches rely on either sampling short walks or enumerating short paths starting from the target user in a user-item bipartite graph. While the effectiveness of random walk sampling heavily depends on the underlying path sampling strategy, path enumeration is sensitive to the strategy adopted for scoring each individual path. In this paper, we demonstrate how both strategies can be improved through Bayesian reasoning. In particular, we propose to improve random walk sampling by exploiting distributional aspects of itemss ratings on the sampled paths. Likewise, we extend existing path enumeration approaches to leverage categorical ratings and to scale the score of each path proportionally to the affinity of pairs of users and pairs of items on the path. Experiments on several publicly available datasets demonstrate the effectiveness of our proposed approaches compared to state-of-the-art graph-based recommenders.

Comparison and Modelling of Country-Level Micro-blog User Behaviour and Activity in Cyber-Physical-Social Systems using Weibo and Twitter Data

As the rapid development of social media technologies, cyber-physical-social system (CPSS) has been a hot topic in many industrial applications. The use of ?micro-blogging? service, such as Twitter, has rapidly become an influential way to share information. While recent studies have revealed that understanding and modelling micro-blog user behavior on massive users? behaviors data in social media in CPSS are very keen to success of many practical applications, a key challenge in the literature is that the diversity of geographic and cultures strongly affect micro-blog user behavior and activity. The motivation of this paper is to understand differences and similarities between the behaviors of users from different countries using social networking platforms, and to attempt to build up a Country-Level Micro-Blog User (CLMB) behavior and activity model for CPSS applications. We proposed a Country-Level Micro-Blog User (CLMB) behavior and activity model for analysis micro-blogging user?s behavior across different countries in the CPSS applications. This CLMB model has considered three important user behavior characteristics including content of micro-blogging, user emotion index and user relationship network. Based on the CUBM model, under the sample dataset, 16 countries with the largest number of representative and active users in the world were selected, and the characteristics of user microblog behavior in these 16 countries were analyzed. The experimental results show that for countries with small population and strong cohesiveness, users pay more attention to the social function of micro-blogging; on the contrary, in countries with large loose social groups, users use micro-blogging as a news dissemination platform to further analyze the micro-blogs of these countries. The blog's characterization data shows that users in countries whose social network structure exhibits reciprocity rather than hierarchy will use more linguistic elements to express happiness in micro-blogging.

An Analysis of Approaches Taken in the ACM RecSys Challenge 2018 for Automatic Music Playlist Continuation

The ACM Recommender Systems Challenge 2018 focused on the task of automatic music playlist continuation, which is a form of the more general task of sequential recommendation. Given a playlist of arbitrary length with some additional meta-data, the task was to recommend up to 500 tracks that fit the target characteristics of the original playlist. For the RecSys Challenge, Spotify released a dataset of one million user-generated playlists. Participants could compete in two tracks, i.e., main and creative tracks. Participants in the main track were only allowed to use the provided training set, however, in the creative track, the use of external public sources was permitted. In total, 113 teams submitted 1,228 runs to the main track; 33 teams submitted 239 runs to the creative track. The highest performing team in the main track achieved an R-precision of 0.2241, an NDCG of 0.3946, and an average number of recommended songs clicks of 1.784. In the creative track, an R-precision of 0.2233, an NDCG of 0.3939, and a click rate of 1.785 was obtained by the best team. This article provides an overview of the challenge, including motivation, task definition, dataset description, and evaluation. We further report and analyze the results obtained by the top performing teams in each track and explore the approaches taken by the winners. We finally summarize our key findings and list the open avenues and possible future directions in the area of automatic playlist continuation.

Strategic Attack & Defense in Security Diffusion Games

Security games model the confrontation between a defender protecting a set of targets and an attacker who tries to capture them. A variant of these games assumes security interdependence between targets, facilitating contagion of an attack. So far only stochastic spread of an attack has been considered. In this work, we introduce a version of security games where the attacker strategically drives the entire spread of attack and where interconnections between nodes affect their susceptibility to be captured. We find that the strategies effective in the settings without contagion or with stochastic contagion are no longer feasible when spread of attack is strategic. While in the former settings it was possible to efficiently find optimal strategies of the attacker, doing so in the latter setting turns out to be an NP-complete problem for an arbitrary network. However, for some simpler network structures, such as cliques, stars, and trees, we show that it is possible to efficiently find optimal strategies of both players. Next, for arbitrary networks, we study and compare the efficiency of various heuristic strategies. As opposed to previous works with no or stochastic contagion, we find that centrality-based defense is often effective when spread of attack is strategic.

PlayeRank: data-driven performance evaluation and player ranking in soccer via a machine learning approach

The problem of evaluating the performance of soccer players is attracting the interest of many companies and the scientific community, thanks to the availability of massive data capturing all the events generated during a match (e.g., tackles, passes, shots, etc.). Unfortunately, there is no consolidated and widely accepted metric for measuring performance quality in all of its facets. In this paper, we design and implement PlayeRank, a data-driven framework that offers a principled multi-dimensional and role-aware evaluation of the performance of soccer players. We build our framework by deploying a massive dataset of soccer-logs and consisting of millions of match events pertaining to four seasons of 18 prominent soccer competitions. By comparing PlayeRank to known algorithms for performance evaluation in soccer, and by exploiting a dataset of players' evaluations made by professional soccer scouts, we show that PlayeRank significantly outperforms the competitors. We also explore the ratings produced by PlayeRank and discover interesting patterns about the nature of excellent performances and what distinguishes the top players from the others. At the end, we explore some applications of PlayeRank -- i.e. searching players and player versatility -- showing its flexibility and efficiency, which makes it worth to be used in the design of a scalable platform for soccer analytics.

Energy-efficient Static Tank Scheduling on VFI based NoC-HMPSoCs for Intelligent Edge Devices in Cyber-Physical Systems

The interlinked processing units in the modern Cyber-Physical Systems (CPS) creates a large network of connected computing embedded systems. Network-on-Chip (NoC) based multiprocessor system-on-chip (MPSoC) architecture is becoming a de-facto computing platform for real-time applications due to its higher performance and Quality-of-Service (QoS). The number of processors has increased significantly on the multiprocessor systems in CPS therefore, Voltage Frequency Island (VFI) recently adopted for effective energy management mechanism in the large scale multiprocessor chip designs. In this paper, we investigate energy and contention-aware static scheduling for tasks with precedence and deadline constraints on intelligent edge devices deploying heterogeneous VFI based NoC-MPSoCs with DVFS-enabled processors. Unlike the existing population-based optimization algorithms, we propose a novel population-based algorithm called ARSH-FATI that can dynamically switch between explorative and exploitative search modes at run-time. Our static scheduler ARHS-FATI collectively performs task mapping, scheduling, and voltage scaling. Consequently, its performance is superior to the existing state-of-the-art approach proposed for homogeneous VFI based NoC-MPSoCs. We also developed a communication contention-aware Earliest Edge Consistent Deadline First (EECDF) scheduling algorithm and gradient descent inspired voltage scaling algorithm called Energy Gradient Decent (EGD). We have introduced a notion of Energy Gradient (EG) that guides EGD in its search for islands voltage settings and minimize the total energy consumption. We conducted the experiments on 8 real benchmarks adopted from Embedded Systems Synthesis Benchmarks (E3S). Our static scheduling approach ARSH-FATI outperformed state-of-the-art technique and achieved an average energy-efficiency of ? 24% and ? 30% over CA-TMES-Search and CA-TMES-Quick respectively.

Efficient and Privacy-preserving Fog-assisted Health Data Sharing Scheme

Pervasive data collected from e-healthcare devices possesses significant medical value through data sharing with professional healthcare service providers. However, health data sharing poses several security issues such as access control and privacy leakage, as well as faces critical challenges to obtain efficient data analysis and services. In this paper, we propose an efficient and privacy-preserving fog-assisted health data sharing (PFHDS) scheme for e-healthcare systems. Specifically, we integrate the fog server to classify the shared data into different categories according to disease risks for efficient health data analysis. Meanwhile, we design an enhanced attribute- based encryption method through combination of a personal access policy on patients and a professional access policy on the fog server for effective medical service provision. Furthermore, we achieve significant encryption consumption reduction for patients by offloading a portion of the computation and storage burden from patients to the fog server. Security discussions show that PFHDS realizes data confidentiality and fine-grained access control with collusion resistance. Performance evaluations demonstrate cost-efficient encryption computation, storage and energy consumption.

Multi-view Fusion with Extreme Learning Machine for Clustering

Unlabeled, multi-view data presents a considerable challenge in many real-world data analysis tasks. However, these data are worth exploring because they often contain complementary information that improves the quality of the analysis results. Clustering with multi-view data is a particularly challenging problem as revealing the complex data structures between many feature spaces demands discriminative features that are specific to the task and, when too few of these features are present, performance suffers. Extreme learning machines (ELMs) are an emerging form of learning model that have shown an outstanding representation ability and superior performance in a range of different learning tasks. Motivated by the promise of this advancement, we have developed a novel multi-view fusion clustering framework based on an ELM, called MVEC. MVEC learns the embeddings from each individual view of the data via the ELM network, then constructs a single unified embedding according to the correlations and dependencies between each embedding and automatically weighting the contribution of each. This process exposes the underlying clustering structures embedded within multi-view data with a high degree of accuracy. A simple yet efficient solution is also provided to solve the optimization problem within MVEC. Experiments and comparisons on eight diverse benchmarks from different domains confirm MVEC's clustering accuracy.

Lightweight Convolution Neural Networks for Mobile Edge Computing in Transportation Cyber Physical Systems

Cloud computing extends Transportation Cyber-Physical Systems (T-CPS) with provision of enhanced computing and storage capability via offloading computing tasks to remote cloud servers. However, cloud computing cannot fulfill the requirements such as low latency and context awareness in T-CPS. The appearance of Mobile Edge Computing (MEC) can overcome the limitations of cloud computing via offloading the computing tasks at edge servers in approximation to users consequently reducing the latency and improving the context awareness. Although MEC has the potential in improving T-CPS, it is incapable of processing computational-intensive tasks such as deep learning algorithms due to the intrinsic storage and computing-capability constraints. Therefore, we design and develop a lightweight deep learning model to support MEC applications in T-CPS. In particular, we put forth a stacked convolutional neural network (CNN) consisting of factorization convolutional layers alternating with compression layers (namely lightweight CNN-FC). Extensive experimental results show that our proposed lightweight CNN-FC can greatly decrease the number of unnecessary parameters consequently reducing the model size while maintaining the high accuracy in contrast to conventional CNN models. In addition, we also evaluate the performance of our proposed model in a realistic MEC platform. Specifically, experimental results at this MEC platform show that our model can maintain the high accuracy while preserving the portable model size.

Using Sparse Representation to Detect Anomalies in Complex WSNs

In recent years, wireless sensor networks (WSNs) have become an active area of research for monitoring physical and environmental conditions Due to the interdependence of sensors, a functional anomaly in one sensor can cause a functional anomaly in another sensor, which can further lead to the malfunctioning of the entire sensor network. Existing research work has a way to analyse faulty sensor anomalies, but fails to show the effectiveness throughout the entire interdependent network system. The gap in research on sensor network dependency can be filled by the abnormal nodes of the sensor network. In this paper, a dictionary learning algorithm based on a non-negative constraint is developed, and further a sparse representation anomaly node detection method for sensor networks is proposed based on the dictionary learning. Compared with other anomaly detection approaches, our method is more robust. The abnormal nodes are dealt with and compared with four commonly used ways to verify the robustness of our proposed method. Furthermore, the experiments are conducted on the obtained abnormal nodes to prove the interdependence of multi-layer sensor networks and reveal the conditions and causes of a system crash.

Estimating and Controlling the False Discovery Rate of the PC Algorithm Using Edge-Specific P-Values

Many causal discovery algorithms infer graphical structure from observational data. The PC algorithm in particular estimates a completed partially directed acyclic graph (CPDAG), or an acyclic graph containing directed edges identifiable with conditional independence testing. However, few groups have investigated strategies for estimating and controlling the false discovery rate (FDR) of the edges in the CPDAG. In this paper, we introduce PC with p-values (PC-p), a fast algorithm which robustly computes edge-specific p-values and then estimates and controls the FDR across the edges. PC-p specifically uses the p-values returned by many conditional independence (CI) tests to upper bound the p-values of more complex edge-specific hypothesis tests. The algorithm then estimates and controls the FDR using the bounded p-values and the Benjamini-Yekutieli FDR procedure. Modifications to the original PC algorithm also help PC-p accurately compute the upper bounds despite non-zero Type II error rates. Experiments show that PC-p yields more accurate FDR estimation and control across the edges in a variety of CPDAGs compared to alternative methods.

Detecting causal relationships in simulation models using intervention-based counterfactual analysis

Central to explanatory simulation models is their capability to not just show that but also why particular things happen. Explanation is closely related with the detection of causal relationships and is, in a simulation context, typically done by means of controlled experiments. However, for complex simulation models, conventional 'blackbox' experiments may be too coarse-grained to cope with spurious relationships. We present an intervention-based causal analysis methodology that exploits the manipulability of computational models and detects and circumvents spurious effects. The core of the methodology is a formal model that maps basic causal assumptions to causal observations and allows for the identification of combinations of assumptions that have a negative impact on observability. First experiments indicate that the methodology can successfully deal with notoriously tricky situations involving asymmetric and symmetric overdetermination and detect fine-grained causal relationships between events in the simulation. As illustrated in the paper, the methodology can be easily integrated into an existing simulation environment.

Correlated Multi-label Classification with Incomplete Label Space and Class Imbalance

Multi-label classification is defined as the problem of identifying the multiple labels or categories of new observations based on labeled training data. Multi-labeled data has several challenges including class imbalance, label correlation, incomplete multi-label matrices, and noisy and irrelevant features. In this paper, we propose an integrated multi-label approach (ML-CIB) for simultaneously training the multi-label classification model and addressing the aforementioned challenges. The model learns a new label matrix as well as capturing new label correlations, because it is difficult to find a complete label vector for each instance in real-world data. We also propose a label regularization to handle the imbalanced multi-labeled issue in the new label, and L1 regularization norm is incorporated in the objective function to select the relevant sparse features. A multi-label feature selection (ML-CIB-FS) method is presented as a variant of the proposed ML-CIB to show the efficacy of the proposed method in selecting the relevant features. ML-CIB is formulated as a constrained objective function. We use the accelerated proximal gradient method to solve the proposed optimisation problem. Lastly, extensive experiments are conducted on fifteen regular-scale and large-scale imbalanced multi-labeled dataset. The promising results show that our method significantly outperforms the state-of-the-art.

RecRules: Recommending IF-THEN Rules for End-User Development

End users, nowadays, can personalize their smart devices and web applications by defining or reusing IF-THEN rules through dedicated End-User Development (EUD) tools. Within such tools, however, the spread of new devices and web applications as well as the growing number of shared rules result in an overload of information and in a difficulty to discover new rules and functionality. We envision that recommendation techniques could help end users to find rules shared by others, and to suggest new functionality to be adopted during the definition of their own rules. In this paper, we tackle the emerging problem of recommending IF-THEN rules to end users by presenting RecRules, a hybrid and semantic recommendation system. Through a mixed content and collaborative approach, the goal of RecRules is to recommend by functionality, i.e., to suggest rules based on their final purposes rather than on details like manufacturers and brands. The algorithm uses a semantic reasoning process to enrich rules with semantic information, with the aim of uncovering hidden connections between rules in terms of shared functionality. Then, it builds a collaborative semantic graph, and it exploits different types of path-based features to train a learning to rank algorithm and compute top-N recommendations. We evaluate RecRules through different experiments on real user data extracted from IFTTT, one of the most popular EUD tool. Results are promising: they show the effectiveness of our approach with respect to other state-of-the-art algorithms, and open the way for a new class of recommender systems for EUD based on the actual functionality end users need.

All ACM Journals | See Full Journal Index

Search TIST
enter search term and/or author name