Measuring Conditional Independence by Independent Residuals for Causal Discovery
The combination of Cyber Physical Systems and Cloud Computing have received tremendous research interest and efforts from both academia and industry, which enables a new breed of applications and services and can fundamentally change the way that people interact with the physical world. However, due to the relative long distance between remote cloud and sensors, the Cloud Computing cannot provide real time service and fine-grained management for the end devices. Meanwhile, untrustworthy nodes may endanger the whole system. In this paper, we apply Intelligent Mobile Edge Computing to solve these problems. We first introduce the Mobile Crowdsourcing-Based Trust Evaluation Mechanism, where mobile edge users apply Artificial Intelligence to evaluate the trustfulness of sensor nodes. We then design two incentive mechanisms, i.e., Trustworthy Incentive Mechanism and Quality-Aware Trustworthy Incentive Mechanism. The first one aims to impel edge users to upload their real information about their capability and costs. The purpose of the second one is to motivate edge users to honestly conduct tasks and report results. Detailed theoretical analysis is performed to certify the effectiveness of the proposed mechanisms, which demonstrates the validity of Quality-Aware Trustworthy Incentive Mechanism from data trustfulness, effort trustfulness and quality trustfulness respectively. Extensive experiments are carried out to validate the proposed mechanisms. The results corroborate that the proposed mechanisms can efficiently stimulate mobile edge users to perform evaluation task honestly.
In the aftermath of earthquakes, floods and other disasters, photos are increasingly playing more significant roles, such as finding missing people and assessing disasters, in disaster rescue and recovery efforts. These disaster photos are taken in real time by the crowd, unmanned aerial vehicles and wireless sensors. However, communications equipment is often damaged in disasters, and the very limited communication bandwidth restricts the upload of photos to the cloud center, seriously impeding disaster rescue endeavors. Based on edge computing, we propose Echo, a highly time-efficient disaster rescue framework. By utilizing the computing, storage and communication abilities of edge servers, disaster photos are preprocessed and analyzed in real time, and more specific visuals are immensely helpful for conducting emergency response and rescue. This paper takes the search for missing people as a case study to show that Echo can be more advantageous in terms of disaster rescue. To greatly conserve valuable communication bandwidth, only significantly associated images are extracted and uploaded to the cloud center for subsequent facial recognition. Furthermore, an adaptive photo detector is designed to utilize the precious and unstable communication bandwidth effectively, as well as ensuring the photo detection precision and recall rate. The effectiveness and efficiency of the proposed method are demonstrated by simulation experiments.
Security is a pivotal issue for Cyber Physical Systems (CPS)'s development. The trusted computing base of CPS includes the complete protection mechanisms, like hardware, firmware, software, the combination of which is responsible for enforcing a system security policy. A Trust Detection Based Secured Routing (TDSR) scheme is proposed to establish secured routes from source nodes to data center under malicious environment to achieve a satisfactory security level in the Cyber Physical Systems (CPS). In the TDSR scheme, sensor node in the routing from data center sends detection routing to identify relay nodes? trust. Thus, the trust of node can be obtained, then, data packet is routed through trustworthy nodes to sink securely. In TDSR scheme, due to the detection routing is executed in those nodes who have abundant energy, the lifetime cannot be affected. Performance evaluation through simulation is carried out for success routing ratio, compromised node detection ratio, and detection routing overhead. We found that performance can be improved in TDSR compared to previous schemes.
Forecasting price trend of bulk commodities is important in international trade, not only for markets participants to schedule production and marketing plans, but also for government administrators to adjust policies. Previous studies can not support accurate fine-grained short-term prediction, since they mainly focus on coarse-grained long-term prediction using historical data. Recently, cross-domain open data provides possibilities to conduct fine-grained price forecasting, since they can be leveraged to extract various direct and indirect factors of the price. In this paper, we predict the price trend over upcoming days, by leveraging cross-domain open data fusion. More specifically, we formulate the price trend into three classes (rise, slight-change and fall), and then predict the specific class in which the price trend of the future day lies. We take three factors into consideration: (1) supply factor considering sources providing bulk commodities, (2) demand factor focusing on vessel transportation with reflection of short time needs, and (3) expectation factor encompassing indirect features (e.g. air quality) with latent influences. A hybrid classification framework is proposed for the price trend forecasting. Evaluation conducted on nine real-world cross-domain open datasets shows that our framework can forecast the price trend accurately, outperforming multiple state-of-the-art baselines.
The emergence of smart meters has fostered the collection of massive data that support a better understanding of consumer behaviors and better management of water resources and networks. The main focus of this paper is to analyze consumption behavior over time; thus, we first identify the main weekly consumption patterns. This approach allows each meter to be represented by a categorical series, where each category corresponds to a weekly consumption behavior. By considering the resulting consumption behavior sequences, we propose a new methodology based on a mixture of nonhomogeneous Markov models to cluster these categorical time series. Using this method, the meters are described by the Markovian dynamics of their cluster. The latent variable that controls cluster membership is estimated alongside the parameters of the Markov model using a novel classification expectation maximization (CEM) algorithm. A specific entropy measure is formulated to evaluate the quality of the estimated partition by considering the joint Markovian dynamics. The proposed clustering model can also be used to predict future consumption behaviors within each cluster. Numerical experiments using real water consumption data provided by a water utility in France and gathered over nineteen months are conducted to evaluate the performance of the proposed approach in terms of both clustering and prediction. The results demonstrate the effectiveness of the proposed method.
Many real world human behaviors can be modeled and characterized as sequential decision making processes, such as taxi driver?s choices of working regions and times. Each driver possesses unique preferences on the sequential choices over time and improves their working efficiency. Understanding the dynamics of such preferences helps accelerate the learning process of taxi drivers. Prior works on taxi operation management mostly focus on finding optimal driving strategies or routes, lacking in-depth analysis on what the drivers learned during the process and how they affect the performance of the driver. In this work, we make the first attempt to establish Dynamic Human Preference Analytics (DHPA). We inversely learn the taxi drivers? preferences from data and characterize the dynamics of such preferences over time. We extract two types of features, i.e., profile features and habit features, to model the decision space of drivers. Then through inverse reinforcement learning we learn the preferences of drivers with respect to these features. The results illustrate that self-improving drivers tend to keep adjusting their preferences to habit features to increase their earning efficiency, while keeping the preferences to profile features invariant. On the other hand, experienced drivers have stable preferences over time. The exploring drivers tend to randomly adjust the preferences over time.
Transfer learning aims to learn robust classifiers for the target domain by leveraging knowledge from a source domain. Since the source and the target domains are usually from different distributions, existing methods mainly focus on adapting the cross-domain marginal or conditional distributions. However, in real applications, the marginal and conditional distributions usually have different contributions to the domain discrepancy. Existing methods fail to quantitatively evaluate the different importance of these two distributions, which will result in unsatisfactory transfer performance. In this paper, we propose a novel concept called Dynamic Distribution Adaptation (DDA), which is capable of quantitatively evaluating the relative importance of each distribution. DDA can be easily incorporated into the framework of structural risk minimization to solve transfer learning problems. On the basis of DDA, we propose two novel learning algorithms: (1) Manifold Dynamic Distribution Adaptation (MDDA) for traditional transfer learning, and (2) Dynamic Distribution Adaptation Network (DDAN) for deep transfer learning. Extensive experiments demonstrate that MDDA and DDAN significantly improve the transfer learning performance and setup a strong baseline over the latest deep and adversarial methods on digits recognition, sentiment analysis, and image classification. More importantly, it is shown that marginal and conditional distributions have different contributions to the domain divergence, and our DDA is able to provide good quantitative evaluation of their relative importance which leads to better performance. We believe this observation can be helpful for future research in transfer learning.
Introduction to the ACM TIST Special Issue on Advances in Causal Discovery and Inference
The rich cloud services in the Internet of Things create certain needs for edge computing, in which devices should well enough to handle storage tasks securely, reliability, and efficiently. When processing the storage requests from edge devices, a cloud server is supposed to eliminate duplicate copies of repeating data to reduce the amount of storage space and save bandwidth. However, to protect the data confidentiality while supporting such a deduplication in edge computing, we need to tackle two main challenges to encrypt data before uploading: (i) the fingerprint of the encrypted data should be indeterministic to prevent brute-force attacks; and (ii) the encryption key can be updated efficiently by power-constrained devices when a key leakage is happened on some edge nodes. In this paper, we propose a deduplication system which provides the active key update in a practical manner. We introduce random keys in the convergent encryption while retaining the deduplication over encrypted files across users, and then design a novel approach to transform a ciphertext encrypted with a revoked key into a ciphertext encrypted with a new key without on-line interactions between devices. The security analysis is given in terms of the proposed security model. The experimental analysis shows that the scheme is also practical.
The discovery of Markov blanket (MB) for feature selection has attracted much attention in recent years since the MB of the class attribute is the optimal feature subset for feature selection. However, almost all existing MB discovery algorithms focus on either improving computational efficiency or boosting learning accuracy, instead of both. In this paper, we propose a novel MB discovery algorithm for balancing efficiency and accuracy, called BAMB (BAlanced Markov Blanket discovery). To achieve this goal, given a class attribute of interest, BAMB finds candidate PC (parents and children) and spouses and removes false positives from the candidate MB set in one go. Specifically, once a feature is successfully added to the current PC set, BAMB finds the spouses with regard to this feature, then uses the updated PC and spouse set to remove false positives from the current MB set. This makes the set of PC and spouses of the target as small as possible, and thus to achieve a trade-off between computational efficiency and learning accuracy. In the experiments, we use 10 real-world datasets and compare BAMB with 12 feature selection algorithms, including 8 state-of-the-art MB discovery algorithms and 4 other well-established feature selection methods. On prediction accuracy, BAMB outperforms 12 feature selection algorithms compared. On computational efficiency in MB discovery, BAMB is close to the IAMB algorithm while is much faster than the remaining seven MB discovery algorithms.
In the current era of Industry 4.0, people are facing data-rich manufacturing environments. Visual analytics, as an important technology for explaining and understanding complex data, has been increasingly introduced into industrial data analysis scenarios. Taking the durability test of automotive starter as background, this paper proposes a visual analysis approach for understanding large-scale and long-term starter durability test data. Guided by detailed scenario and requirement analyses, we first propose a migration-adapted DBSCAN algorithm to identify starting modes and abnormal tests. This algorithm adopts a segmentation strategy and a group of matching and updating operations to achieve an efficient and accurate clustering analysis on the data. Next, we design and implement a visual analysis system that provides a set of user-friendly visual designs and lightweight interactions to help people gain data insights on test process overview, test data patterns and durability performance dynamics. Finnaly, we conduct a quantitative algorithm evaluation, a case study and a user interview by using real-world starter duarbility test datasets. The result demonstrates the effectiveness of the approach and its possible inspiration to the durability test data analysis of other similar industrial products.
Given a path in a spatial or temporal framework, we aim to find all contiguous sub-paths that are both interesting (e.g., abrupt changes) and statistically significant (i.e., persistent trends rather than local fluctuations). Discovering interesting sub-paths can provide meaningful information for a variety of domains including Earth science, environmental science and urban planning, etc. Existing methods are limited to detecting individual points of interest along an input path but cannot find interesting sub-paths. Our preliminary work provided a Sub-path Enumeration and Pruning (SEP) algorithm to detect interesting sub-path of arbitrary length. However, SEP is not effective in avoiding sub-paths that are random variations rather than meaningful trends, which hampers clear and proper interpretations of the results. In this paper, we extend our previous work by proposing a statistical significance test framework to eliminate these random variations. To compute the statistical significance, we first show a baseline Monte-Carlo method based on our previous work and then propose a Dynamic Search-and-Prune (D-SAP) algorithm to improve its computational efficiency. Our experiments show that the significance testing can greatly suppress the noisy detections in the output and D-SAP can greatly reduce the execution time.
In this paper, we propose a novel representation learning framework, namely TRajectory EMBedding via Road networks (Trembr), to learn trajectory embeddings (low dimensional feature vectors) for use in a variety of trajectory applications. The novelty of Trembr lies in 1) the design of a recurrent neural network (RNN) based encoder-decoder model, namely Traj2Vec, that encodes spatial and temporal properties inherent in trajectories into trajectory embeddings, while exploiting the underlying road networks to constrain the learning process, and 2) the design of a neural network based model, namely Road2Vec, to learn road segment embeddings in road networks that captures various relationships amongst road segments in preparation for trajectory representation learning. In addition to model design, several unique technical issues raising in Trembr, including data preparation in Road2Vec, the road segment relevance-aware loss and the network topology constraint in Traj2Vec, are examined. To validate our ideas, we learn trajectory embeddings using multiple large-scale real-world trajectory datasets, and use them in three tasks, including trajectory similarity measure, travel time prediction and destination prediction. Empirical results show that Trembr soundly outperforms the state-of-the-art trajectory representation learning models, trajectory2vec and t2vec, by at least one order of magnitude in terms of mean rank in trajectory similarity measure, 23.3\% to 41.7\% of mean absolute error (MAE) in travel time prediction, and 39.6\% to 52.4\% of MAE in destination prediction.
Realistic multi-agent team applications often feature dynamic environments with soft deadlines that penalize late execution of tasks. This puts a premium on quickly allocating tasks to agents, but finding the optimal allocation is NP-hard due to temporal and spatial constraints that require tasks to be executed sequentially by agents. We propose FMC_TA, a novel task allocation algorithm that allows tasks to be easily sequenced to yield high-quality solutions. FMC_TA first finds allocations that are fair (envy-free), balancing the load and sharing important tasks among agents, and efficient (Pareto optimal) in a simplified version of the problem. It computes such allocations in polynomial or pseudo-polynomial time (centrally or distributedly, respectively) using a Fisher market with agents as buyers and tasks as goods. It then heuristically schedules the allocations, taking into account inter-agent constraints on shared tasks. We empirically compare our algorithm to state-of-the-art incomplete methods, both centralized and distributed, on law enforcement problems inspired by real police logs. The results show a clear advantage for FMC_TA in total utility and in measures in which law enforcement authorities measure their own performance. The domination of the proposed algorithm is consistent when the problem scales.
Graph-based approaches provide an effective memory-based alternative to latent factor models for collaborative recommendation. Modern approaches rely on either sampling short walks or enumerating short paths starting from the target user in a user-item bipartite graph. While the effectiveness of random walk sampling heavily depends on the underlying path sampling strategy, path enumeration is sensitive to the strategy adopted for scoring each individual path. In this paper, we demonstrate how both strategies can be improved through Bayesian reasoning. In particular, we propose to improve random walk sampling by exploiting distributional aspects of itemss ratings on the sampled paths. Likewise, we extend existing path enumeration approaches to leverage categorical ratings and to scale the score of each path proportionally to the affinity of pairs of users and pairs of items on the path. Experiments on several publicly available datasets demonstrate the effectiveness of our proposed approaches compared to state-of-the-art graph-based recommenders.
As the rapid development of social media technologies, cyber-physical-social system (CPSS) has been a hot topic in many industrial applications. The use of ?micro-blogging? service, such as Twitter, has rapidly become an influential way to share information. While recent studies have revealed that understanding and modelling micro-blog user behavior on massive users? behaviors data in social media in CPSS are very keen to success of many practical applications, a key challenge in the literature is that the diversity of geographic and cultures strongly affect micro-blog user behavior and activity. The motivation of this paper is to understand differences and similarities between the behaviors of users from different countries using social networking platforms, and to attempt to build up a Country-Level Micro-Blog User (CLMB) behavior and activity model for CPSS applications. We proposed a Country-Level Micro-Blog User (CLMB) behavior and activity model for analysis micro-blogging user?s behavior across different countries in the CPSS applications. This CLMB model has considered three important user behavior characteristics including content of micro-blogging, user emotion index and user relationship network. Based on the CUBM model, under the sample dataset, 16 countries with the largest number of representative and active users in the world were selected, and the characteristics of user microblog behavior in these 16 countries were analyzed. The experimental results show that for countries with small population and strong cohesiveness, users pay more attention to the social function of micro-blogging; on the contrary, in countries with large loose social groups, users use micro-blogging as a news dissemination platform to further analyze the micro-blogs of these countries. The blog's characterization data shows that users in countries whose social network structure exhibits reciprocity rather than hierarchy will use more linguistic elements to express happiness in micro-blogging.
Facility relocation (FR) problem, which aims to optimize the placement of facilities to accommodate the changes of users' locations, has a broad spectrum of applications. Despite the significant progress made by existing solutions to the FR problem, they all assume each user is stationary and represented as a single point. Unfortunately, in reality, objects (e.g., people, animals) are mobile. Consequently, these efforts may fail to identify superior solution to the FR problem. In this paper, for the first time, we take into account movement history of users and introduce a novel FR problem, called MOTION-FR, to address the above limitation. Specifically, we present a framework called FROST to address it. FROST comprises of two exact algorithms, index-based and index-free. The former is designed to address the scenario when facilities and objects are known apriori whereas the latter solves the MOTION-FR problem by jettisoning this assumption. Further, we extend the index-based algorithm to solve the general k-MOTION-FR problem, which aims to relocate k inferior facilities. We devise an approximate solution due to NP-hardness of the problem. Experimental study over both real-world and synthetic datasets demonstrates the superiority of our framework in comparison to state-of-the-art FR techniques in efficiency and effectiveness.
The ACM Recommender Systems Challenge 2018 focused on the task of automatic music playlist continuation, which is a form of the more general task of sequential recommendation. Given a playlist of arbitrary length with some additional meta-data, the task was to recommend up to 500 tracks that fit the target characteristics of the original playlist. For the RecSys Challenge, Spotify released a dataset of one million user-generated playlists. Participants could compete in two tracks, i.e., main and creative tracks. Participants in the main track were only allowed to use the provided training set, however, in the creative track, the use of external public sources was permitted. In total, 113 teams submitted 1,228 runs to the main track; 33 teams submitted 239 runs to the creative track. The highest performing team in the main track achieved an R-precision of 0.2241, an NDCG of 0.3946, and an average number of recommended songs clicks of 1.784. In the creative track, an R-precision of 0.2233, an NDCG of 0.3939, and a click rate of 1.785 was obtained by the best team. This article provides an overview of the challenge, including motivation, task definition, dataset description, and evaluation. We further report and analyze the results obtained by the top performing teams in each track and explore the approaches taken by the winners. We finally summarize our key findings and list the open avenues and possible future directions in the area of automatic playlist continuation.
Security games model the confrontation between a defender protecting a set of targets and an attacker who tries to capture them. A variant of these games assumes security interdependence between targets, facilitating contagion of an attack. So far only stochastic spread of an attack has been considered. In this work, we introduce a version of security games where the attacker strategically drives the entire spread of attack and where interconnections between nodes affect their susceptibility to be captured. We find that the strategies effective in the settings without contagion or with stochastic contagion are no longer feasible when spread of attack is strategic. While in the former settings it was possible to efficiently find optimal strategies of the attacker, doing so in the latter setting turns out to be an NP-complete problem for an arbitrary network. However, for some simpler network structures, such as cliques, stars, and trees, we show that it is possible to efficiently find optimal strategies of both players. Next, for arbitrary networks, we study and compare the efficiency of various heuristic strategies. As opposed to previous works with no or stochastic contagion, we find that centrality-based defense is often effective when spread of attack is strategic.
The interlinked processing units in the modern Cyber-Physical Systems (CPS) creates a large network of connected computing embedded systems. Network-on-Chip (NoC) based multiprocessor system-on-chip (MPSoC) architecture is becoming a de-facto computing platform for real-time applications due to its higher performance and Quality-of-Service (QoS). The number of processors has increased significantly on the multiprocessor systems in CPS therefore, Voltage Frequency Island (VFI) recently adopted for effective energy management mechanism in the large scale multiprocessor chip designs. In this paper, we investigate energy and contention-aware static scheduling for tasks with precedence and deadline constraints on intelligent edge devices deploying heterogeneous VFI based NoC-MPSoCs with DVFS-enabled processors. Unlike the existing population-based optimization algorithms, we propose a novel population-based algorithm called ARSH-FATI that can dynamically switch between explorative and exploitative search modes at run-time. Our static scheduler ARHS-FATI collectively performs task mapping, scheduling, and voltage scaling. Consequently, its performance is superior to the existing state-of-the-art approach proposed for homogeneous VFI based NoC-MPSoCs. We also developed a communication contention-aware Earliest Edge Consistent Deadline First (EECDF) scheduling algorithm and gradient descent inspired voltage scaling algorithm called Energy Gradient Decent (EGD). We have introduced a notion of Energy Gradient (EG) that guides EGD in its search for islands voltage settings and minimize the total energy consumption. We conducted the experiments on 8 real benchmarks adopted from Embedded Systems Synthesis Benchmarks (E3S). Our static scheduling approach ARSH-FATI outperformed state-of-the-art technique and achieved an average energy-efficiency of ? 24% and ? 30% over CA-TMES-Search and CA-TMES-Quick respectively.
Pervasive data collected from e-healthcare devices possesses significant medical value through data sharing with professional healthcare service providers. However, health data sharing poses several security issues such as access control and privacy leakage, as well as faces critical challenges to obtain efficient data analysis and services. In this paper, we propose an efficient and privacy-preserving fog-assisted health data sharing (PFHDS) scheme for e-healthcare systems. Specifically, we integrate the fog server to classify the shared data into different categories according to disease risks for efficient health data analysis. Meanwhile, we design an enhanced attribute- based encryption method through combination of a personal access policy on patients and a professional access policy on the fog server for effective medical service provision. Furthermore, we achieve significant encryption consumption reduction for patients by offloading a portion of the computation and storage burden from patients to the fog server. Security discussions show that PFHDS realizes data confidentiality and fine-grained access control with collusion resistance. Performance evaluations demonstrate cost-efficient encryption computation, storage and energy consumption.
Unlabeled, multi-view data presents a considerable challenge in many real-world data analysis tasks. However, these data are worth exploring because they often contain complementary information that improves the quality of the analysis results. Clustering with multi-view data is a particularly challenging problem as revealing the complex data structures between many feature spaces demands discriminative features that are specific to the task and, when too few of these features are present, performance suffers. Extreme learning machines (ELMs) are an emerging form of learning model that have shown an outstanding representation ability and superior performance in a range of different learning tasks. Motivated by the promise of this advancement, we have developed a novel multi-view fusion clustering framework based on an ELM, called MVEC. MVEC learns the embeddings from each individual view of the data via the ELM network, then constructs a single unified embedding according to the correlations and dependencies between each embedding and automatically weighting the contribution of each. This process exposes the underlying clustering structures embedded within multi-view data with a high degree of accuracy. A simple yet efficient solution is also provided to solve the optimization problem within MVEC. Experiments and comparisons on eight diverse benchmarks from different domains confirm MVEC's clustering accuracy.
Cloud computing extends Transportation Cyber-Physical Systems (T-CPS) with provision of enhanced computing and storage capability via offloading computing tasks to remote cloud servers. However, cloud computing cannot fulfill the requirements such as low latency and context awareness in T-CPS. The appearance of Mobile Edge Computing (MEC) can overcome the limitations of cloud computing via offloading the computing tasks at edge servers in approximation to users consequently reducing the latency and improving the context awareness. Although MEC has the potential in improving T-CPS, it is incapable of processing computational-intensive tasks such as deep learning algorithms due to the intrinsic storage and computing-capability constraints. Therefore, we design and develop a lightweight deep learning model to support MEC applications in T-CPS. In particular, we put forth a stacked convolutional neural network (CNN) consisting of factorization convolutional layers alternating with compression layers (namely lightweight CNN-FC). Extensive experimental results show that our proposed lightweight CNN-FC can greatly decrease the number of unnecessary parameters consequently reducing the model size while maintaining the high accuracy in contrast to conventional CNN models. In addition, we also evaluate the performance of our proposed model in a realistic MEC platform. Specifically, experimental results at this MEC platform show that our model can maintain the high accuracy while preserving the portable model size.
In recent years, wireless sensor networks (WSNs) have become an active area of research for monitoring physical and environmental conditions Due to the interdependence of sensors, a functional anomaly in one sensor can cause a functional anomaly in another sensor, which can further lead to the malfunctioning of the entire sensor network. Existing research work has a way to analyse faulty sensor anomalies, but fails to show the effectiveness throughout the entire interdependent network system. The gap in research on sensor network dependency can be filled by the abnormal nodes of the sensor network. In this paper, a dictionary learning algorithm based on a non-negative constraint is developed, and further a sparse representation anomaly node detection method for sensor networks is proposed based on the dictionary learning. Compared with other anomaly detection approaches, our method is more robust. The abnormal nodes are dealt with and compared with four commonly used ways to verify the robustness of our proposed method. Furthermore, the experiments are conducted on the obtained abnormal nodes to prove the interdependence of multi-layer sensor networks and reveal the conditions and causes of a system crash.
Many causal discovery algorithms infer graphical structure from observational data. The PC algorithm in particular estimates a completed partially directed acyclic graph (CPDAG), or an acyclic graph containing directed edges identifiable with conditional independence testing. However, few groups have investigated strategies for estimating and controlling the false discovery rate (FDR) of the edges in the CPDAG. In this paper, we introduce PC with p-values (PC-p), a fast algorithm which robustly computes edge-specific p-values and then estimates and controls the FDR across the edges. PC-p specifically uses the p-values returned by many conditional independence (CI) tests to upper bound the p-values of more complex edge-specific hypothesis tests. The algorithm then estimates and controls the FDR using the bounded p-values and the Benjamini-Yekutieli FDR procedure. Modifications to the original PC algorithm also help PC-p accurately compute the upper bounds despite non-zero Type II error rates. Experiments show that PC-p yields more accurate FDR estimation and control across the edges in a variety of CPDAGs compared to alternative methods.
Central to explanatory simulation models is their capability to not just show that but also why particular things happen. Explanation is closely related with the detection of causal relationships and is, in a simulation context, typically done by means of controlled experiments. However, for complex simulation models, conventional 'blackbox' experiments may be too coarse-grained to cope with spurious relationships. We present an intervention-based causal analysis methodology that exploits the manipulability of computational models and detects and circumvents spurious effects. The core of the methodology is a formal model that maps basic causal assumptions to causal observations and allows for the identification of combinations of assumptions that have a negative impact on observability. First experiments indicate that the methodology can successfully deal with notoriously tricky situations involving asymmetric and symmetric overdetermination and detect fine-grained causal relationships between events in the simulation. As illustrated in the paper, the methodology can be easily integrated into an existing simulation environment.