Version 4 (modified by chase, 8 years ago)


Cosmic Todo List

This list can serve as a basis for planning/prioritizing work going forward. It starts at an attempt at an exhaustive list (July 2011).

Administration and robustness

  • reset inventory delegations at the broker: forget old delegations, or try to renew them
  • AM full restart or hard reset: clean the substrate, and reissue delegations. Rejects old tickets and leases?
  • AM full restart: how to rebuild delegations?
  • slice reclamation on AM and broker: time out an empty slice
  • remoting the management interface
  • SM query/discovery of reservations in the slice; poll/request lease status
  • AM query interfaces: slices per user, leases per slice, identify by IP or vlan
  • GMOC monitoring feed: faithful AMs publish an asynchronous stream "feed" of sliver create/renewal notifications to ST.

Features to test, verify, cleanup, document

  • extension packages
    • AM side: handlers for new resource types
    • SM side: controller/handler/view
  • SM controller classes and interfaces
  • SM-side handler and stitching, e.g., for elastic Condor
  • pushbutton Euca site deploys with NDL cookbook, including IP connectivity, canned xCAT images, etc.
  • Error reporting/logging from NDL processors
  • error reporting/logging from handlers
  • error reporting distinguishes authorization failure vs. resource limit vs. internal error vs. user error
  • Ticket validation, including signature failure and oversubscription, and rejection path
  • Ticket rejection and cleanup at SM and broker
  • Stitch token validation
  • Multiple pools and multiple delegations per pool on an AM; register through portal
  • Failed renewal and cleanup in SM
  • Full SM-side slice abort (e.g., due to partial failure)
  • broker renews delegations
  • broker absorbs updated delegations
  • AM probe detects resource failure, updates containing lease
  • interdomain path computation across multiple brokers
  • group allocation
  • openflow handler

Advance reservations

  • Resource counts/vectors and integration with controller calendars and SM/broker policy for SPARQL (e.g., VLAN tag tracking)
  • broker-based advance reservations in conjunction with inter-domain stitching
  • auto-launch at the reservation time
  • scheduling of a group allocation around the bottleneck resource


  • PG as an aggregate: requires new handler and PG cert pass-through from XMLRPC controller. Wait for PG-ABAC?
  • GUSH (in progress with Jeannie Albrecht)


  • NDL requests and manifest using the new edge standard. Propertylist cleanup and doc

Broker resource policies: wish list

  • generic web view for admin approval of slices and reservations
  • broker resource menu with fixed prices and budgets
  • attribute-based shares or quotas, with policy plugin view to manipulate them
  • congestion pricing
  • stackable broker policies
  • euca instance size selection (small/medium/large properties/handlers)
  • map global type attributes specified by SM into candidate pool IDs (site selection)
  • VM placement (site selection) based on data location

Identity management and portal

  • multi-user web portal with per-slice access control, requester ID in outgoing requests, proper authtoken slamming
  • shib: needed the user ID registered locally to get in on the web portal.
  • users can upload certs, which are stored indexed by user

Enhancements to ABAC-based authorization (when integration is complete)

  • per-site policies: ACLs by idp.attribute (test case: RENCI cluster)
  • signed security attributes on images transported by Image Proxy
  • GMOC back door: attribute-based authorization for slice shutdown
  • SA module and/or actor

Handler infrastructure

  • Config handler invocation: threadpool, synchronization, and new scripting support
  • modify() entry point
  • sliver restart as example of modify
  • two-pass stitching and dynamic stitching
  • dynamic interposition/withdrawal of perfsonar
  • handler-driven sliver stitching: storage volume create/attach
  • small/medium/large (independent of broker policy)
  • in-progress additions to handler catalog: EBS, sunfish storage, xcat, I2/ION

Extensions to staged core workflow

  • controllers as a separate stage
  • auth checks as a separate stage
  • resourceSet and below as a separate stage

Pushbutton slices/demos

  • Hadoop
  • elastic Condor w/local DAGman
  • triangle and/or star, intradomain and interdomain cases
  • Harold's three-tier cloudscale
  • netfence

Ideas to discuss

  • integrate puppet configuration service to AM
  • integrate nagios monitoring service to AM
  • move inter-domain path computation into the broker
  • broker pollicy for bin-packing computons (small/medium/large)
  • multiple SMs per slice: SM owner per-reservation, not per-slice

Proposed new component: secure image repo with simple web interface

  • content-addressable fetch by HTTP and bittorrent (by any ImageProxy? that knows URL and hash)
  • generate/retire a random token to allow a user to request approval to put images in the repo
  • user form to request approval to put images in the repo: requires access token
  • admin interface to approve image put, set storage quota for image elements, generate random image token
  • put image or update image, named by image token (server generates/checks hashes)
  • images optionally discoverable and browsable by short description, if allowed by user
  • we need to keep three maps: user tokens to image tokens, image tokens to image elements and storage consumed, and content hashes to their objects