From Taxonomies and Schemas to Knowledge Graphs – Recognizing and tackling the pitfalls of large-scale semantic modeling

Half-Day Workshop @ Connected Data London 2019



What is this about

Ever since Google announced that “their knowledge graph allowed searching for things, not strings”, the term “knowledge graph” has been widely adopted, both by the academia and industry, to denote any graph-like network of interrelated typed entities and concepts that can be used to integrate, share and exploit data and knowledge.

This idea of interconnected data under common semantics is actually much older and the term is a rebranding of several other concepts and research areas (semantic networks, knowledge bases, ontologies, semantic web, linked data etc). Google popularized this idea and made it more visible to the public and the industry, the result being several prominent companies, including Airbnb, Amazon, Diffbot, LinkedIn, and Uber, developing and using their own knowledge graphs for data integration, data analytics, semantic search, question answering and other cognitive applications.

To paraphrase a famous quote, though, “With Great Popularity Comes Great Responsibility”. As knowledge graphs become larger in size and scope, and are used by bigger and more diverse audiences, their ability to represent semantic information that is accurate and consensual is stressed. Typical semantic modeling mistakes that in small-scale taxonomies and ontologies are controllable and perhaps not so harmful, in large-scale knowledge graphs can become really problematic and pretty hard to contain.

This tutorial takes participants into an investigative journey in the semantics of knowledge graphs and will teach them how to recognize modeling pitfalls that undermine their quality and value. More importantly, it will provide them with concrete strategies and techniques for avoiding these pitfalls in their own work, both as developers and consumers of knowledge graphs.

What will you learn

The tutorial will consist of three parts. The first part will be lecture-based and will ensure that all participants share some common terminology and mindset when talking about knowledge graphs and semantics. This is important as practitioners from different backgrounds and communities (semantic web, databases, taxonomies, linguistics, etc) are accustomed to different terminologies for the same or very similar semantic modeling elements.

The second part will be highly collaborative and interactive. The participants will form small teams, each of which will be assigned a “semantics crime scene”, namely (a part of) a public knowledge graph with problematic semantics. Each team will then need to identify and share with the other teams the graph’s problems, their possible causes, and potential ways these problems could have been avoided.

The third part will summarize the teams’ findings and provide guidelines and best practices for better semantic data modeling

Workshop Material