Global Data Plane

Motivation

As we discuss in a recent paper: The Cloud is not Enough: Saving IoT from the Cloud, the widespread practice of constructing Swarm applications by directly connecting with the cloud comes with a variety of downsides.   With the GDP, we seek an infrastructure that enables important new use-cases for the cloud while still integrating smoothly with existing Cloud infrastructure.

Vision

The Global Data Plane (GDP) provides a data-centric glue for swarm applications.  The basic primitive is that of a secure single-writer append-only log. Data inputs are timestamped and rearranged by timestamp. Data can be securely committed to the log in a variety of ways, including via a external consistent transactional model. Data within the log can be read (either randomly or by subscription), thereby permitting a variety of data models, including (eventually) a SQL query model.  Further, data within a log can be preserved for the long term.

A Case for the Universal DataPlane PDF

The Model

The GDP consists of append-only logs and a routing layer.  Each log is named by an opaque 256-bit number (a "GDPname") having no direct connection with the location of the data.  Logs are append-only and consist of a series of records; records consist of a record number, a commit timestamp (for coherency), and variable-sized opaque data.  It is important that the data be opaque since (in the longer term) all data should be encrypted, and the GDP will not hold the keys.  Logs can be replicated and migrated.

Routing allows any node in the GDP to find at least one copy of any named entity (with the GDPname).  Entities can be logs, but ultimately they may include services and users (so there is one shared namespace for everything).

The GDP implements mechanism, but not policy, which is mediated by a separate Control Plane.  For example, the GDP handles the mechanics of replication and migration, but the choice of when and where to replicate is made by a higher level service that resides in the Control Plane on the basis of on-the-fly performance monitoring or other criteria.

Access control is based on cryptography.  Write (append) access control is based on valid writers signing the message, with the GDP itself holding the public keys of authorized writers for verification.  Read/Subscribe access control actually does not exist; the privacy of the data depends on the data being encrypted.

Higher level services may be layered on the GDP; for example, a service might combine the results of multiple logs into another log, or copy log data to specialized databases.  In these cases the original logs are the "base truth", with everything else being a form of cache.  Note however that these services are not part of the GDP, but rather are users of it.

The GDP needs to be self-healing and resistant to attack.  Network partitions might in severe cases result in inaccessible data, but single node failure should not, and in no case should the GDP suffer catastrophic failure, even in the face of some nodes being compromised.

The Present and the Future

The prototype implementation of the GDP has limited functionality: a single instance daemon with basic read/subscribe/publish primitives.  Data is completely opaque (this is a feature, since the intent is that all data will be encrypted), and the only metadata is a record number and a commit timestamp.  There is no access control on either read or write.

There are several short- or medium-term projects that are either in progress or will start soon.  See https://gdp.cs.berkeley.edu/redmine/projects/gdp/wiki/GDP_Task_List for details.

GDP Code and Documentation:

The Global Dataplane Code initial prototype is available on the U.C. Berkeley EECS repository at one of these two URIs:

https://repo.eecs.berkeley.edu/git/projects/swarmlab/gdp.git
repoman@repo.eecs.berkeley.edu:projects/swarmlab/gdp.git

Associated Faculty:

  • John Kubiatowicz
  • Edward Lee

Staff:

  • Ken Lutz
  • Eric Allman
  • Rick Pratt

Students:

  • Griffin Potrock
  • Nitesh Mor