Total: 36

We introduce normalized nonnegative models (NNM) for explorative data analysis. NNMs are partial convexifications of models from probability theory. We demonstrate their value at the example of item recommendation. We show that NNM-based recommender systems satisfy three criteria that all recommender systems should ideally satisfy: high predictive power, computational tractability, and expressive representations of users and items. Expressive user and item representations are important in practice to succinctly summarize the pool of customers and the pool of items. In NNMs, user representations are expressive because each user's preference can be regarded as normalized mixture of preferences of stereotypical users. The interpretability of item and user representations allow us to arrange properties of items (e.g., genres of movies or topics of documents) or users (e.g., personality traits) hierarchically.

Classical inconsistency-tolerant query answering relies on selecting maximal components of an ABox/database which are consistent with the ontology. However, some rules in ontologies might be unreliable if they are extracted from ontology learning or written by unskillful knowledge engineers. In this paper we present a framework of handling inconsistent existential rules under stable model semantics, which is defined by a notion called rule repairs to select maximal components of the existential rules. Surprisingly, for R-acyclic existential rules with R-stratified or guarded existential rules with stratified negations, both the data complexity and combined complexity of query answering under the rule repair semantics remain the same as that under the conventional query answering semantics. This leads us to propose several approaches to handle the rule repair semantics by calling answer set programming solvers. An experimental evaluation shows that these approaches have good scalability of query answering under rule repairs on realistic cases.

Utilizing trajectories for modeling human mobility often involves extracting descriptive features for each individual, a procedure heavily based on experts' knowledge. In this work, our objective is to minimize human involvement and exploit the power of community in learning `features' for individuals from their location traces. We propose a probabilistic graphical model that learns distribution of latent concepts, named motifs, from anonymized sequences of user locations. To handle variation in user activity level, our model learns motif distributions from sequence-level location co-occurrence of all users. To handle the big variation in location popularity, our model uses an asymmetric prior, conditioned on per-sequence features. We evaluate the new representation in a link prediction task and compare our results to those of baseline approaches.

We investigate the problem of learning description logic (DL) ontologies in Angluin et al.’s framework of exact learning via queries posed to an oracle. We consider membership queries of the form “is a tuple a of individuals a certain answer to a data retrieval query q in a given ABox and the unknown target ontology?” and completeness queries of the form “does a hypothesis ontology entail the unknown target ontology?” Given a DL L and a data retrieval query language Q, we study polynomial learnability of ontologies in L using data retrieval queries in Q and provide an almost complete classification for DLs that are fragments of EL with role inclusions and of DL-Lite and for data retrieval queries that range from atomic queries and EL/ELI-instance queries to conjunctive queries. Some results are proved by non-trivial reductions to learning from subsumption examples.

One of the key uses of causes is to explain why things happen. Explanations of specific events, like an individual's heart attack on Monday afternoon or a particular car accident, help assign responsibility and inform our future decisions. Computational methods for causal inference make use of the vast amounts of data collected by individuals to better understand their behavior and improve their health. However, most methods for explanation of specific events have provided theoretical approaches with limited applicability. In contrast we make two main contributions: an algorithm for explanation that calculates the strength of token causes, and an evaluation based on simulated data that enables objective comparison against prior methods and ground truth. We show that the approach finds the correct relationships in classic test cases (causal chains, common cause, and backup causation) and in a realistic scenario (explaining hyperglycemic episodes in a simulation of type 1 diabetes).

We model knowledge graphs for their completion by encoding each entity and relation into a numerical space. All previous work including Trans(E, H, R, and D) ignore the heterogeneity (some relations link many entity pairs and others do not) and the imbalance (the number of head entities and that of tail entities in a relation could be different) of knowledge graphs. In this paper, we propose a novel approach TranSparse to deal with the two issues. In TranSparse, transfer matrices are replaced by adaptive sparse matrices, whose sparse degrees are determined by the number of entities (or entity pairs) linked by relations. In experiments, we design structured and unstructured sparse patterns for transfer matrices and analyze their advantages and disadvantages. We evaluate our approach on triplet classification and link prediction tasks. Experimental results show that TranSparse outperforms Trans(E, H, R, and D) significantly, and achieves state-of-the-art performance.

Strategic argumentation provides a simple model of disputation. We investigate it in the context of Dung's abstract argumentation. We show that strategic argumentation under the grounded semantics is resistant tocorruption -- specifically, collusion and espionage — in a sense similar to Bartholdi et al's notion of a voting scheme resistant to manipulation. Under the stable semantics, strategic argumentation is resistant to espionage, but its resistance to collusion varies according to the aims of the disputants. These results are extended to a variety of concrete languages for argumentation.

Knowledge graph embedding aims to represent entities and relations in a large-scale knowledge graph as elements in a continuous vector space. Existing methods, e.g., TransE and TransH, learn embedding representation by defining a global margin-based loss function over the data. However, the optimal loss function is determined during experiments whose parameters are examined among a closed set of candidates. Moreover, embeddings over two knowledge graphs with different entities and relations share the same set of candidate loss functions, ignoring the locality of both graphs. This leads to the limited performance of embedding related applications. In this paper, we propose a locally adaptive translation method for knowledge graph embedding, called TransA, to find the optimal loss function by adaptively determining its margin over different knowledge graphs. Experiments on two benchmark data sets demonstrate the superiority of the proposed method, as compared to the-state-of-the-art ones.

Several inconsistency-tolerant semantics have been introduced for querying inconsistent description logic knowledge bases. This paper addresses the problem of explaining why a tuple is a (non-)answer to a query under such semantics. We define explanations for positive and negative answers under the brave, AR and IAR semantics. We then study the computational properties of explanations in the lightweight description logic DL-Lite_R. For each type of explanation, we analyze the data complexity of recognizing (preferred) explanations and deciding if a given assertion is relevant or necessary. We establish tight connections between intractable explanation problems and variants of propositional satisfiability (SAT), enabling us to generate explanations by exploiting solvers for Boolean satisfaction and optimization problems. Finally, we empirically study the efficiency of our explanation framework using the well-established LUBM benchmark.

Most description logics (DL) query languages allow instance retrieval from an ABox. However, SPARQL is a schema query language allowing access to the TBox (in addition to the ABox). Moreover, its entailment regimes enable to take into account knowledge inferred from knowledge bases in the query answering process. This provides a new perspective for the containment problem. In this paper, we study the containment of SPARQL queries over OWL EL axioms under entailment. OWL EL is the language used by many large scale ontologies and is based on EL++. The main contribution is a novel approach to rewriting queries using SPARQL property paths and the μ-calculus in order to reduce containment test under entailment into validity check in the μ-calculus.

Timed Failure Propagation Graphs (TFPGs) are used in the design of safety-critical systems as a way of modeling failure propagation, and to evaluate and implement diagnostic systems. TFPGs are a very rich formalism: they allow to model Boolean combinations of faults and events, also dependent on the operational modes of the system and quantitative delays between them. TFPGs are often produced manually, from a given dynamic system of greater complexity, as abstract representations of the system behavior under specific faulty conditions. In this paper we tackle two key difficulties in this process: first, how to make sure that no important behavior of the system is overlooked in the TFPG, and that no spurious, non-existent behavior is introduced; second, how to devise the correct values for the delays between events. We propose a model checking approach to automatically validate the completeness and tightness of a TFPG for a given infinite-state dynamic system, and a procedure for the automated synthesis of the delay parameters. The proposed approach is evaluated on a number of synthetic and industrial benchmarks.

Qualitative spatio-temporal reasoning is an active research area in Artificial Intelligence. In many situations there is a need to reason about intertemporal qualitative spatial relations, i.e. qualitative relations between spatial regions at different time-points. However, these relations can never be explicitly observed since they are between regions at different time-points. In applications where the qualitative spatial relations are partly acquired by for example a robotic system it is therefore necessary to infer these relations. This problem has, to the best of our knowledge, not been explicitly studied before. The contribution presented in this paper is two-fold. First, we present a spatio-temporal logic MSTL, which allows for spatio-temporal stream reasoning. Second, we define the concept of a landmark as a region that does not change between time-points and use these landmarks to infer qualitative spatio-temporal relations between non-landmark regions at different time-points. The qualitative spatial reasoning is done in RCC-8, but the approach is general and can be applied to any similar qualitative spatial formalism.

Boolean functions in Answer Set Programming have proven a useful modelling tool. They are usually specified by means of aggregates or external atoms. A crucial step in computing answer sets for logic programs containing Boolean functions is verifying whether partial interpretations satisfy a Boolean function for all possible values of its undefined atoms. In this paper, we develop a new methodology for showing when such checks can be done in deterministic polynomial time. This provides a unifying view on all currently known polynomial-time decidability results, and furthermore identifies promising new classes that go well beyond the state of the art. Our main technique consists of using an ordering on the atoms to significantly reduce the necessary number of model checks. For many standard aggregates, we show how this ordering can be automatically obtained.

One of the better studied properties for operators in judgment aggregation is independence, which essentially dictates that the collective judgment on one issue should not depend on the individual judgments given on some other issue(s) in the same agenda. Independence, although considered a desirable property, is too strong, because together with mild additional conditions it implies dictatorship. We propose here a weakening of independence, named agenda separability: a judgment aggregation rule satisfies it if, whenever the agenda is composed of several independent sub-agendas, the resulting collective judgment sets can be computed separately for each sub-agenda and then put together. We show that this property is discriminant, in the sense that among judgment aggregation rules so far studied in the literature, some satisfy it and some do not. We briefly discuss the implications of agenda separability on the computation of judgment aggregation rules.

Hashing techniques are powerful for approximate nearest neighbour (ANN) search.Existing quantization methods in hashing are all focused on scalar quantization (SQ) which is inferior in utilizing the inherent data distribution.In this paper, we propose a novel vector quantization (VQ) method named affinity preserving quantization (APQ) to improve the quantization quality of projection values, which has significantly boosted the performance of state-of-the-art hashing techniques.In particular, our method incorporates the neighbourhood structure in the pre- and post-projection data space into vector quantization.APQ minimizes the quantization errors of projection values as well as the loss of affinity property of original space.An effective algorithm has been proposed to solve the joint optimization problem in APQ, and the extension to larger binary codes has been resolved by applying product quantization to APQ.Extensive experiments have shown that APQ consistently outperforms the state-of-the-art quantization methods, and has significantly improved the performance of various hashing techniques.

We consider a new formulation of abduction in which degrees of "plausibility" of explanations, along with the rules of the domain, are learned from concrete examples (settings of attributes). Our version of abduction thus falls in the "learning to reason" framework of Khardon and Roth. Such approaches enable us to capture a natural notion of "plausibility" in a domain while avoiding the extremely difficult problem of specifying an explicit representation of what is "plausible." We specifically consider the question of which syntactic classes of formulas have efficient algorithms for abduction. We find that the class of k-DNF explanations can be found in polynomial time for any fixed k; but, we also find evidence that even weak versions of our abduction task are intractable for the usual class of conjunctions. This evidence is provided by a connection to the usual, inductive PAC-learning model proposed by Valiant. We also consider an exception-tolerant variant of abduction. We observe that it is possible for polynomial-time algorithms to tolerate a few adversarially chosen exceptions, again for the class of k-DNF explanations. All of the algorithms we study are particularly simple, and indeed are variants of a rule proposed by Mill.

This paper is aimed as a contribution to the use of formal modal languages in Artificial Intelligence. We introduce a multi-modal version of Second-order Propositional Modal Logic (SOPML), an extension of modal logic with propositional quantification, and illustrate its usefulness as a specification language for knowledge representation as well as temporal and spatial reasoning. Then, we define novel notions of (bi)simulation and prove that these preserve the interpretation of SOPML formulas. Finally, we apply these results to assess the expressive power of SOPML.

Understanding the dynamics of argumentation frameworks (AFs) is important in the study of argumentation in AI. In this work, we focus on the so-called extension enforcement problem in abstract argumentation. We provide a nearly complete computational complexity map of fixed-argument extension enforcement under various major AF semantics, with results ranging from polynomial-time algorithms to completeness for the second-level of the polynomial hierarchy. Complementing the complexity results, we propose algorithms for NP-hard extension enforcement based on constrained optimization. Going beyond NP, we propose novel counterexample-guided abstraction refinement procedures for the second-level complete problems and present empirical results on a prototype system constituting the first approach to extension enforcement in its generality.

Ontology-based data access (OBDA) is a novel paradigm facilitating access to relational data, realized by linking data sources to an ontology by means of declarative mappings. DL-Lite_R, which is the logic underpinning the W3C ontology language OWL 2 QL and the current language of choice for OBDA, has been designed with the goal of delegating query answering to the underlying database engine, and thus is restricted in expressive power. E.g., it does not allow one to express disjunctive information, and any form of recursion on the data. The aim of this paper is to overcome these limitations of DL-Lite_R, and extend OBDA to more expressive ontology languages, while still leveraging the underlying relational technology for query answering. We achieve this by relying on two well-known mechanisms, namely conservative rewriting and approximation, but significantly extend their practical impact by bringing into the picture the mapping, an essential component of OBDA. Specifically, we develop techniques to rewrite OBDA specifications with an expressive ontology to "equivalent" ones with a DL-Lite_R ontology, if possible, and to approximate them otherwise. We do so by exploiting the high expressive power of the mapping layer to capture part of the domain semantics of rich ontology languages. We have implemented our techniques in the prototype system OntoProx, making use of the state-of-the-art OBDA system Ontop and the query answering system Clipper, and we have shown their feasibility and effectiveness with experiments on synthetic and real-world data.

We study the complexity of exchanging probabilistic data between ontology-based probabilistic databases. We consider the Datalog+/- family of languages as ontology and ontology mapping languages, and we assume different compact encodings of the probabilities of the probabilistic source databases via Boolean events. We provide an extensive complexity analysis of the problem of deciding the existence of a probabilistic (universal) solution for a given probabilistic source database relative to a (probabilistic) data exchange problem for the different languages considered.

This paper focuses on LTL on finite traces (LTLf) for which satisfiability is known to be PSPACE-complete. However, little is known about the computational properties of fragments of LTLf. In this paper we fill this gap and make the following contributions. First, we identify several LTLf fragments for which the complexity of satisfiability drops to NP-complete or even P, by considering restrictions on the temporal operators and Boolean connectives being allowed. Second, we study a semantic variant of LTLf, which is of interest in the domain of business processes, where models have the property that precisely one propositional variable evaluates true at each time instant. Third, we introduce a reasoner for LTLf and compare its performance with the state of the art.

The widespread adoption of Linked Data has been driven by the increasing demand for information exchange between organisations, as well as by data publishing regulations in domains such as health care and governance. In this setting, sensitive information is at risk of disclosure since published data can be linked with arbitrary external data sources. In this paper we lay the foundations of privacy-preserving data publishing (PPDP) in the context of Linked Data. We consider anonymisations of RDF graphs (and, more generally, relational datasets with labelled nulls) and define notions of safe and optimal anonymisations. Safety ensures that the anonymised data can be published with provable protection guarantees against linking attacks, whereas optimality ensures that it preserves as much information from the original data as possible, while satisfying the safety requirement. We establish the complexity of the underpinning decision problems both under open-world semantics inherent to RDF and a closed-world semantics, where we assume that an attacker has complete knowledge over some part of the original data.

We have earlier shown that the standard mappings from action languages B and C to logic programs under answer set semantics can be captured by sets of properties on transition systems. In this paper, we consider action language BC and show that a standard mapping from BC action descriptions to logic programs can be similarly captured when the action rules in the descriptions do not have consistency conditions.

Introduced by Darwiche (2011), sentential decision diagrams (SDDs) are essentially as tractable as ordered binary decision diagrams (OBDDs), but tend to be more succinct in practice. This makes SDDs a prominent representation language, with many applications in artificial intelligence and knowledge compilation. We prove that SDDs are more succinct than OBDDs also in theory, by constructing a family of boolean functions where each member has polynomial SDD size but exponential OBDD size. This exponential separation improves a quasipolynomial separation recently established by Razgon (2014), and settles an open problem in knowledge compilation (Darwiche, 2011).

Only knowing captures the intuitive notion that the beliefs of an agent are precisely those that follow from its knowledge base. It has previously been shown to be useful in characterizing knowledge-based reasoners, especially in a quantified setting. While this allows us to reason about incomplete knowledge in the sense of not knowing whether a formula is true or not, there are many applications where one would like to reason about the degree of belief in a formula. In this work, we propose a new general first-order account of probability and only knowing that admits knowledge bases with incomplete and probabilistic specifications. Beliefs and non-beliefs are then shown to emerge as a direct logical consequence of the sentences of the knowledge base at a corresponding level of specificity.