.. image:: _static/img/entanglement_light.png
    :width: 150
    :align: center


.. _entanglement-overview:

Entanglement Overview
=====================

Entanglement is a fully version-controlled graph analytics tool built for, among other uses, encoding use cases
and answers to complex data fusion questions as a traversable knowledge graph. Many of the nodes within the graph
hold references to external data and APIs, making the experience of exploring the graph go beyond simple graph
analytics. 

Entanglement itself is built atop a scalable, cloud-native graph database capable of scaling to millions or even
billions of nodes. Currently, Entanglement supports basic graph queries such as retrieving nodes based on a 
free text search query or looking them up based on their ID. More specific queries will be supported in the future,
but most nodes should be found through the search functionality. Once a node is returned, you can traverse the 
graph from there by expanding a node to find its nearest neighbors. Typically, when querying the graph, you search 
with a few keywords to limit the results.

Projects
--------

.. image:: _static/img/graph_project.png
    :width: 75%
    :align: center

Everything in Entanglement is in one large graph. This includes all previous versions of nodes and connections. This large 
graph is walled off into subgraphs, each of which we we call a :class:`~geodesic.account.projects.Project`. By default, 
every User has read access to a :class:`~geodesic.account.projects.Project` called "global" and is initialized as the 
default active :class:`~geodesic.account.projects.Project`. A user may create a new 
:class:`~geodesic.account.projects.Project` by either instantiating the class directly with a name and alias as minimum
and calling :meth:`~geodesic.account.projects.Project.create()` on that, or by using the built-in helper function
:func:`~geodesic.account.projects.create_project()`. A user has both read and write access to any ``Projects`` they 
create and can share read or write access to them them with other users.

Example:

>>> import geodesic
>>> project = geodesic.Project(name="my-test-project", alias="Test", description="a Geodesic project")
>>> project
{'name': 'my-test-project', 'alias': 'Test', 'description': 'a Geodesic project'}
>>> project.create()
>>> project
{'name': 'my-test-project', 'alias': 'Test', 'description': 'a Geodesic project', 'uid': '<autogenerated id>', 'owner': '<your subject>'}
>>> my_other_project = geodesic.create_project(name="my-other-project", alias="Other Project", description="other Geodesic project")
>>> my_other_project
{'name': 'my-other-project', 'alias': 'Other Project', 'description': 'other Geodesic project', 'uid': '<autogenerated id>', 'owner': '<your subject>'}

For more information on ``Project``, please see the documentation on the `account <account_overview.rst>`_ page or in the 
:py:mod:`geodesic.account.projects` page.

Ontology
--------

Every node in the Entanglement graph is an :class:`~geodesic.entanglement.object.Object`. Two
:class:`~geodesic.entanglement.object.Object` are connected together by a
:class:`~geodesic.entanglement.object.Predicate` to form a :class:`~geodesic.entanglement.object.Connection`. A 
:class:`~geodesic.entanglement.graph.Graph` can be formed by combining a list of 
:class:`~geodesic.entanglement.object.Object` and :class:`~geodesic.entanglement.object.Connection`. We will go into
more detail later about what makes an :class:`~geodesic.entanglement.object.Object` an 
:class:`~geodesic.entanglement.object.Object`, for now we will focus on how we name things - the ontology. 

Rather than build
an overly prescriptive ontology, we chose to use an extremely flexible ontology suited to the needs of Geodesic but
flexible enough to contain most, if not all, other ontologies. We expect the base ontology will be extended over time.

Classes
^^^^^^^

All :class:`~geodesic.entanglement.object.Object` in Entanglement must have, at a minimum, a ``class``. The ``class``
defines "what" an :class:`~geodesic.entanglement.object.Object` is semantically. They are non-specific by design and
further specfication of an :class:`~geodesic.entanglement.object.Object` can be done by adding specific qualifiers 
``domain``, ``category``, ``type``, and ``name``. They go from less specific to more specific. Only ``name`` is
required, but more on those later. A full list of ``classes`` is listed below:

* ``Dataset`` - a queryable/useable dataset
* ``Model`` - usable analytics/AI/ML model
* ``Entity`` - a person, place or thing. (Think: a specific facility, a specific individual, *etc*)
* ``Concept`` - an abstract idea, not a physical entity. For example, the concept "corn" doesn't refer to a kernel of
  corn, a crop of corn, a corn field, or anything physical, but merely the idea that there is something called 
  "corn". 
* ``Observable`` - a physical property or attribute that can be observed. This is typically something like
  "heat", "red" (as in, the color red), "crop-height", *etc*. ``Observables`` are something that can be observed by an 
  ``Entity`` or ``Dataset``. This is something that some sensor or methodology is capable of observing.
* ``Property`` - an attribute that is not directly observable. This is typically the answer to some specific question. 
  We will give a more concret example later.
* ``Event`` - something that happened at a time, and optionally, a place. An event can have its own location or 
  reference an ``Entity``
* ``Link`` - a link to an external resource, such as a webpage or perhaps a URI to another database.

If you find that this list of ``classes`` does not fit the nodes you are trying to add, please contact SeerAI at
contact@seerai.space and we will either provide guidance or consider adding a ``class`` to the ontology.

Qualifiers
^^^^^^^^^^

Despite a small list of ``classes``, we expect nearly any ontology can "fit" into this structure. However, it is very
non-specific and requires additional information to represent the rich knowledge that other ontologies provide. For
that, we provide the qualifiers ``domain``, ``category``, and ``type``. All of these are optional and default to a
wildcard (*) to represent lack of knowledge or unimportance in further categorization. These are entirely up to the 
user, but future versions of Entanglement will enforce some level of consistency between them through 
Natural Language Processing. The description of these qualifiers is listed below:

* ``domain`` - refers to the overarching grouping that this node fits into.  For example, a ``Dataset`` or ``Entity`` 
  that exposes or collects remote sensing data such as satellite imagery might have the ``domain`` "remote-sensing". An 
  ``Entity`` representing a farm might have the ``domain`` "agriculture". An ``Event`` representing the occurence of a 
  crime might have the ``domain`` "crime" or perhaps "law-enforcement". 
* ``category`` - refers to how things are grouped together *within* a ``domain``. For example, a ``Dataset`` with 
  ``domain`` "remote-sensing" may be space-based and have the ``category`` "earth-observation". An ``Entity`` 
  representing a farm in the ``domain`` "agriculture" might have a ``category`` "corn" or even be left as "*". An 
  ``Event`` representing a crime in the domain "law-enforcement" might have a ``category`` "violence". 
* ``type`` - you may see where this is going. ``type`` mostly refers to "what" specifically an ``Object`` ``class`` 
  "is". A satellite remote sensing ``Dataset`` or ``Entity`` might have type "satellite". A farm might have
  the ``type`` "farm", a crime might have the type ``crime``. If you put any qualifiers (besides required ``name``),
  ``type`` is a good one to specify as it disambiguates objects with relative ease. 
* ``name`` - this is the only required qualifier. A ``name`` is very specifically "which" thing an ``Object`` is. For
  example, the ``Dataset`` we've been refering to might be named "landsat-c2l2" refering to the level 2, collection 2
  Landsat dataset hosted by USGS. The farm might be named "whispering-pines", and the crime might be tagged with a 
  specific crime id "crime-13C-00123" that we can use as the ``name`` (Entanglement assigns a unique ``uid`` to 
  everything)

These qualifiers are important, as the combination of them uniquely identifies an 
:class:`~geodesic.entanglement.object.Object` . In fact, the basic string representation of an object in Python is:
``"<class>:<domain>:<category>:<type>:<name>"`` and is used by the underyling 
:class:`~geodesic.entanglement.graph.Graph` as well as the Entanglement back end to uniquely identify an object within 
a :class:`~geodesic.account.projects.Project` . You'll also note that all examples are lowercase, hyphen (-) delimited.
Because these qualifies can be used in a query, we enforce that they are all lowercase alphanumeric. The only allowed 
special character a hyphen (-). No spaces. You can validate using the following regular 
expression: ``r"^(?:[a-z]+[a-z\d-]*|\*)$"`` (:py:data:`~geodesic.entanglement.object.qualifier_re`). ``names`` follow 
this as well, except they cannot be ``*``.

Traits/Predicates
^^^^^^^^^^^^^^^^^

A :class:`~geodesic.entanglement.object.Predicate` refers to a named edge between two ``Objects``. ``Predicates`` 
concisely desribe a relationship between objects with one or more all-lowercase words combined with a hyphen (-). 
Similar to ``Objects``, a :class:`~geodesic.entanglement.object.Predicate` may have the same qualifiers as well.
``Predicates`` are grouped into ``Traits``, which define what an object can do. More on that later. Some examples of 
``Predicates`` are ``can-observe``, ``correlates-with``, ``owns``, ``supplies``, *etc*. They should be concise and 
clearly define what the relationship is. 

There are two main schools of thought for organizing node classes when defining an ontology - Composition vs 
Inheritance. With inheritance, one defines a hierarchy of what an ``Object`` "is". Composition defines what traits 
an ``Object`` "has". 

Inheritance Examples:

* a `Police Officer` is a `Person` which is an `Entity`
* a `Hurricane` is a `Servere Weather Event` which is an `Event`
* a `Dog` is a `Canine` which is a type of `Animal` which is an `Entity`

Already you can see how an inheritance-based ontology gets quickly hard to manage. Especially because an ``Object`` 
could be many different things. That's a trap you run into with deep inheritance-based ontologies, and it makes it
very challenging to absorb multiple ontologies within. 

We take a different approach. The only inheritance we have is that everything is an ``Object``. We add ``Traits`` to the 
small list of ``Object`` ``classes``. A ``Trait`` is a modifier to an ``Object`` that endows it with certain 
``Predicate`` connections.

Objects
^^^^^^^

.. image:: _static/img/object_hierarchy.png
    :width: 75%
    :align: center

A full ``Object`` hierarchy is shown above. Don't be alarmed if it looks complicated. The Python API makes working with 
``Objects`` fairly simple and future user interfaces will dramatically simplify how you can interact with the graph. 
In addition to all of this, an Object is allowed several other top level attributes:

* ``description`` - a free text description of the object. This is indexed for search, so it's a great place to put any
  keywords that enable discovery of this object
* ``geometry`` - a geometry field for this ``Object``. This indexes this ``Object`` for queries such as 
  spatial intersections. One small deficiency in our backend that will be resolved at a future date: only points and 
  polygons will be indexed for search. All geometries can be stored. 
* ``item`` - a field that stores arbitrary JSON, so feel free to put anything (except secrets!) here up to 1 MB in size.
  This field is used frequently to store extra metadata about an ``Object``. 

For ``Events``, we can have additional fields:

* ``datetime`` - a single time instant (for example the time the event occurred)
* ``start_datetime`` - a time to represent the start of an ``Event`` or the start of a range if the actual ``Event`` 
  time is uncertain
* ``end_datetime`` - a time to represent the end of an ``Event`` or the end of a range if the actual ``Event`` time is
  uncertain.

Connections
^^^^^^^^^^^

A :class:`~geodesic.entanglement.object.Connection` is a triple composed of a ``subject`` a ``predicate`` and an 
``object``. This is not to be confused with an `RDF <https://www.w3.org/RDF>`_
triple as each component can contain rich metadata, including edge attributes that are stored on a ``Predicate``. 
``Connections`` are just as important as ``Objects``, since they actually supply a huge amount of information. If we had
no ``Connections`` we could think of each ``Object`` as a record in a database; they are valuable on their own, but 
nothing special. The value comes from their connections. Consider the following: If I have 2 ``Objects``, I have two 
pieces of information. If I draw a connection between them, I have 3. If I have 3 ``Objects``, I can draw a connection 
between each pair, giving up to a total of 6 pieces of information. For 4 ``Objects``, I could have 10 pieces of 
information, and so on. The number of possible ``Connections`` grows quadratically with the the number of nodes. Storing
and maintaining these connections means our available information grows MUCH faster than if we just focused on the 
``Objects`` alone. 

Version Control
^^^^^^^^^^^^^^^

Entanglement is fully version-controlled. Every time you add, modify, or delete an ``Object`` or ``Connection``, that 
change is saved. When you execute a query against Entanglement, the default behavior is to query the latest state of 
the graph. You can also optionally supply a datetime to an ``Objects`` query to retrieve the objects as the were at that
point in time. The same is true for ``Connections``. 

.. note::
    While you CAN use data versioning to keep a time series of values for a particular node, it's not recommended to
    do so. We recommend storing data that changes infrequently in Entanglement. For example, I might have a node 
    representing "corn price", but we would not recommend updating that node with the daily value with version control,
    and alternatively suggest making the corn price node point to an external dataset intended for dynamic data. Use 
    your best judgement on this. 

Basic Operations
^^^^^^^^^^^^^^^^

Now that we've covered some of the design principles behind Entanglement


Enough about the theory and design, let's get practical. On the next page, you'll find examples of how to actually use 
Entanglement.