Skip to main content

Knowledge Graph Overview

Entanglement is a fully version-controlled knowledge graph built for, among other uses, storing, sharing, and reusing answers to complex data fusion questions. Many of the nodes within the graph hold references to external data and APIs, making the experience of exploring the graph go beyond simple graph analytics.

Entanglement itself is built atop a scalable, cloud-native graph database capable of scaling to millions or even billions of nodes. The user interface allows users to search for nodes by name or attribute, with a helpful type-ahead feature. Once a node is returned, you can traverse the graph from there by expanding a node to find its nearest neighbors.

Projects

The Global Graph

Users interact with Entanglement within the context of a Project. Each project contains a graph that can optionally be linked to other projects to which the user has access. By default, every User has read access to a Project called “global” (shown above) and is initialized as the default active Project. Users can create new Projects either through the Python API, or through the Geodesic Web App, shown below. A user has both read and write access to any Projects they create and can share read or write access to them them with other users. This all but eliminates data silos, as any member of your organization can join your project, instantly access any datasets within, and see at a glance what other information is connected.

Project Example:

Geodesic Project UI Geodesic Create Project

Ontology

Every node in the Entanglement graph is an Object. Two Objects are connected together by a Predicate to form a Connection. A Graph can be formed by combining a list of Objects and Connections. We will go into more detail later about what makes an Object an Object, for now we will focus on how we name things - the ontology.

Rather than build an overly prescriptive ontology, we chose to use an extremely flexible ontology suited to the needs of Geodesic but flexible enough to contain most, if not all, other ontologies. We expect the base ontology will be extended over time.

Classes

All Object in Entanglement must have, at a minimum, a class and a name. The class defines “what” an Object is semantically. A full list of classes is listed below:

  • Dataset - a queryable/useable dataset

  • Modelset - usable analytics/AI/ML model

  • Entity - a person, place or thing. (Think: a specific facility, a specific individual, etc)

  • Concept - an abstract idea, not a physical entity. For example, the concept “corn” doesn’t refer to a kernel of corn, a crop of corn, a corn field, or anything physical, but merely the idea that there is something called “corn”.

  • Observable - a physical property or attribute that can be observed. This is typically something like “heat”, “red” (as in, the color red), “crop-height”, etc. Observables are typically things that can be observed by an Entity or Dataset.

  • Property - an attribute that is not directly observable. This is typically the answer to some specific question.

  • Event - something that happened at a time, and optionally, a place. An event can have its own location or reference an Entity

  • Link - a link to an external resource, such as a webpage or perhaps a URI to another database.

If you find that this list of classes does not fit the nodes you are trying to add, please contact SeerAI at contact@seerai.space and we will either provide guidance or consider adding a class to the ontology.

Qualifiers

Despite a small list of classes, we expect nearly any ontology can “fit” into this structure. However, the structure is very general and requires additional information to represent the rich knowledge that other ontologies provide. For that, we provide the qualifiers domain, category, and type. All of these are optional and default to a wildcard (*) to represent lack of knowledge or unimportance in further categorization. These are entirely up to the user, but future versions of Entanglement will enforce some level of consistency between them through Natural Language Processing. The description of these qualifiers is listed below:

  • domain - refers to the overarching grouping that this node fits into. For example, a Dataset or Entity that exposes or collects remote sensing data such as satellite imagery might have the domain “remote-sensing”. An Entity representing a farm might have the domain “agriculture”. An Event representing the occurence of a crime might have the domain “crime” or perhaps “law-enforcement”.

  • category - refers to how things are grouped together within a domain. For example, a Dataset with domain “remote-sensing” may be space-based and have the category “earth-observation”. An Entity representing a farm in the domain “agriculture” might have a category “corn” or even be left as “*”. An Event representing a crime in the domain “law-enforcement” might have a category “violence”.

  • type - you may see where this is going. Type mostly refers to “what” specifically an Object class “is”. A satellite remote sensing Dataset or Entity might have type “satellite”. A farm might have the type “farm”, a crime might have the type crime. If you put any qualifiers (besides required name), type is a good one to specify as it disambiguates objects with relative ease.

  • name - this is the only required qualifier. A name is very specifically “which” thing an Object is. For example, the Dataset we’ve been refering to might be named “landsat-c2l2” refering to the level 2, collection 2 Landsat dataset hosted by USGS. The farm might be named “whispering-pines”, and the crime might be tagged with a specific crime id “crime-13C-00123” that we can use as the name (Entanglement assigns a unique uid to everything).

These qualifiers are important, as the combination of them uniquely identifies an Object . In fact, the basic string representation of an object in Python is: "<class>:<domain>:<category>:<type>:<name>" and is used by the underyling Graph as well as the Entanglement back end to uniquely identify an object within a Project.

Traits/Predicates

A Predicate refers to a named edge between two Objects. Predicates concisely desribe a relationship between objects with one or more all-lowercase words combined with a hyphen (-). Similar to Objects, a Predicate may have the same qualifiers as well. Predicates are grouped into Traits, which define what an object can do. More on that later. Some examples of Predicates are can-observe, correlates-with, owns, supplies, etc. They should be concise and clearly define what the relationship is.

There are two main schools of thought for organizing node classes when defining an ontology - Composition vs Inheritance. With inheritance, one defines a hierarchy of what an Object “is”. Composition defines what traits an Object “has”.

Inheritance Examples:

a Police Officer is a Person which is an Entity

a Hurricane is a Servere Weather Event which is an Event

a Dog is a Canine which is a type of Animal which is an Entity

Already you can see how an inheritance-based ontology gets quickly hard to manage. Especially because an Object could be many different things. That’s a trap you run into with deep inheritance-based ontologies, and it makes it very challenging to absorb multiple ontologies within.

We take a different approach. The only inheritance we have is that everything is an Object. We add Traits to the small list of Object classes. A Trait is a modifier to an Object that endows it with certain Predicate connections.

Objects

The Python API makes working with Objects fairly simple and the Geodesic Web App user interface dramatically simplifies how you can interact with the graph.

Edit Graph UI Add Node UI

In addition to all of this, an Object is allowed several other top level attributes:

  • description - a free text description of the object. This is indexed for search, so it’s a great place to put any keywords that enable discovery of this object

  • geometry - a geometry field for this Object. This indexes this Object for queries such as spatial intersections. All geometries can be stored.

  • item - a field that stores arbitrary JSON, so feel free to put anything (except secrets!) here up to 1 MB in size. This field is used frequently to store extra metadata about an Object.

For Events, we can have additional fields:

  • datetime - a single time instant (for example the time the event occurred)

  • start_datetime - a time to represent the start of an Event or the start of a range if the actual Event time is uncertain

  • end_datetime - a time to represent the end of an Event or the end of a range if the actual Event time is uncertain.

Connections

A Connection is a triple composed of a subject, a predicate and an object. Connections are just as important as Objects, since they actually supply a huge amount of information. If we had no Connections we could think of each Object as a record in a database; they are valuable on their own, but nothing special. The value comes from their connections. Consider the following: If I have 2 Objects, I have two pieces of information. If I draw a connection between them, I have 3. If I have 3 Objects, I can draw a connection between each pair, giving up to a total of 6 pieces of information. For 4 Objects, I could have 10 pieces of information, and so on. The number of possible Connections grows quadratically with the the number of nodes. Storing and maintaining these connections means our available information grows MUCH faster than if we just focused on the Objects alone.

Version Control

Entanglement is fully version-controlled. Every time you add, modify, or delete an Object or Connection, that change is saved. When you execute a query against Entanglement, the default behavior is to query the latest state of the graph. You can also optionally supply a datetime to an Objects query to retrieve the objects as the were at that point in time. The same is true for Connections.