Version 6 (modified by ibaldin, 7 years ago)

--

Common ORCA error handling scenarios

When creating slices using ORCA, errors may occur. These errors occur during one of two phases:

  1. Slice embedding and resource allocation, when ORCA examines the slice requests and decides on the embedding of virtual resources into the substrate.
  2. Resource provisioning, when ORCA starts issuing provisioning commands to different parts of the substrate

Errors can be persistent or transient. Transient error typically go away if you close and resubmit the slice request. Persistent errors require modifying the slice request or how the experimenter is submitting it.

Errors from the first category broadly fall into two types:

  1. Invalid requests, when ORCA is unable to understand the slice description. These are persistent errors and require either communicating with a different SM or modifying the request. Common scenarios include:
    • Referring to a non-existent resource domain in your request
    • Trying to create a slice that uses resources from multiple sites/racks by talking to the SM that only has limited resource visibility (e.g. a rack SM)
  2. Insufficient resources, when ORCA is unable to find the resources to satisfy the request. This is a transient error. Two solutions exist
    • Waiting and resubmitting the request at a later time, when the resources may be available
    • Reducing the level of binding in your request to allow ORCA more freedom to decide which domains the slice can get the resources from

Errors from the second category occur when the underlying substrate cannot create a resource instance. Common reasons are:

  1. OpenStack? errors
    • Transient error, e.g. "resources failed to join: Error during join for unit: 3E2D1B98 [1]: unable to create instance: exit code 1"
    • Persistent error, e.g. your VM instance size is too small for the image
  2. ImageProxy errors
    • Transient error, e.g. "ImageProxy unable to retrieve image: org.apache.axis2.AxisFault: Two services can not have same name, a service with IMAGEPROXY1709684447 already exists in the system" (this is a result of a known Axis2 bug and cannot be fixed currently)
    • Persistent error, e.g. the slice request specifies an incorrect URL or hash of the image metafile