Ticket #361 (closed defect: fixed)

Opened 5 years ago

Last modified 5 years ago

Upgrading TAMU, UCD and WVN to ORCA5

Reported by: ibaldin Owned by: jonmills
Priority: major Milestone:
Component: External: Testing and Redeployment Version: baseline
Keywords: Cc: jonmills, ckh, vjo, yxin, anirban, pruth, claris

Description (last modified by ibaldin) (diff)

Rack upgrade overview

  • RDF change (for hybrid mode) - declaring on correct ports
    • vlan-data
    • vlan-storage
    • of-data
  • Quantum neuca plugin config and quantum db (for hybrid mode)
  • xcat properties (for hybrid mode)
  • Change net am VLAN (only net am, not vm am) control to multi-homed (orca.plugins.ben.control.NdlInterfaceVLANControl instead of orca.policy.core.VlanControl?). See rci config.xml
  • Change VLAN handler to quantum-vlan (also see rci config.xml)
  • Update DB schema (safe to drop and restore; I believe Victor made changes, so inventory dump no longer needed)
  • Update 8264 firmware to 7.9.x and put it in hybrid mode if necessary (match RCI config to the extent possible)
  • Make sure meso-scale VLANs are plumbed to the OpenFlow? side of the switch (i.e. appear on the port of the VLAN side that is the uplink port for the OpenFlow? side)
  • Make sure meso-scale VLANs are created in Quantum with a tag 'of-data'

See also https://geni-orca.renci.org/trac/wiki/releases/Eastsound-5.0%22

Also per-site:

  • UCD
    • link flapping issue on management network
    • GENI stitching RDF change
      • VLAN range: 1650-1669
      • remoteLinkId: urn:publicid:IDN+al2s.internet2.edu+interface+sdn-sw.sunn.net.internet2.edu:e5/1:ucd-eg
    • Certificate issue (needs host-based cert instead of 'xmlrpc-controller')
  • TAMU
    • GENI Stitching RDF change
      • VLAN range: 940-949
      • remoteLinkId: urn:publicid:IDN+al2s.internet2.edu+interface+sdn-sw.houh.net.internet2.edu:e7/1:tamu-eg
    • Certificate issue (needs host-based cert)
  • WVN
    • Certificate issue (needs host-based cert)
    • Bare-metal provisioning needs attention

Things to test in each rack

1. Basic topology embedding (with VLANs instead of OpenFlow?)
2. Storage operation (with bare-metal and VMs)
3. OpenFlow? slices (with TAMU and UCD only). Start a controller on your laptop then create a broadcast link with a few nodes, declare the reservation as OpenFlow? in Reservation Properties and pass in the URL of your controller (typically tcp:host-name:6633)
4. Static VLAN attachment

Change History

  Changed 5 years ago by ibaldin

  • description modified (diff)

  Changed 5 years ago by ibaldin

  • cc jonmills, ckh, vjo, yxin, anirban, pruth added
  • owner changed from ibaldin to yxin
  • component changed from Don't Know to External: Testing and Redeployment
  • description modified (diff)

  Changed 5 years ago by ibaldin

  • description modified (diff)

  Changed 5 years ago by jonmills

  • description modified (diff)

  Changed 5 years ago by ibaldin

  • owner changed from yxin to jonmills
  • status changed from new to assigned
  • description modified (diff)

All three racks have been updated to ORCA 5.0, with caveats (see below). No testing has been performed as yet.
UCD headnode was updated from CentOS 6.4 to CentOS 6.5 without issue. TAMU & WVN headnodes were updated from a very early release CentOS 6.5 kernel to the latest CentOS 6.5 kernel, both without issue.

Rack upgrade overview

  • RDF change (for hybrid mode) - declaring on correct ports
    • vlan-data
    • vlan-storage
    • of-data

Completed for all 3 racks.

Quantum neuca plugin config and quantum db (for hybrid mode)

Completed for all 3 racks.

xcat properties (for hybrid mode)

Completed for all 3 racks.

Change net am VLAN (only net am, not vm am) control to multi-homed (orca.plugins.ben.control.NdlInterfaceVLANControl instead of orca.policy.core.VlanControl??). See rci config.xml

Completed for all 3 racks.

Change VLAN handler to quantum-vlan (also see rci config.xml)

Completed, all 3 racks.

Update DB schema (safe to drop and restore; I believe Victor made changes, so inventory dump no longer needed)

Completed, all 3 racks.

Update 8264 firmware to 7.9.x and put it in hybrid mode if necessary (match RCI config to the extent possible)

Not attempted yet.

Unclear about what needs to happen with the certificate situation. Is this as simple as re-creating the xmlrpc.jks file per site? Or is this referring to something else entirely?

Also, ticket makes no mention of creating the wrapper-overrides.conf file in /opt/orca/conf and /opt/orca-controller/conf. This is complete for all 3 racks.

Also, for AM and SM orca.properties, the line:

"admin.container.database.class=orca.shirako.container.db.MySqlShirakoContainerDatabase??" must be commented out, or else the containers won't load. This is remedied on all 3 racks.

There is some confusion about the of-data/vlan-data/vlan-storage ports for TAMU baremetal, since that is a non-standard one-off. Requires research.

UCD and WVN have received some updates to their RDF from Yufeng; requires validation.

  Changed 5 years ago by ibaldin

I was not aware of the need to change admin.container.database.class

Does anyone know when or why this changed?

  Changed 5 years ago by vjo

It happened a while ago, and was done by Aydan.
My apologies for not mentioning that, or the wrapper-overrides.conf.

The change for the database.class happened here:
https://geni-orca.renci.org/trac/changeset/5518

about a year ago.

As to the xmlrpc.jks, this is what should be done:
cd /etc/orca/controller-11080/config/
keytool -genkey -alias jetty -keyalg RSA -validity 3650 -keystore xmlrpc.jks

  Changed 5 years ago by ibaldin

So it seems that the database class changed name, and perhaps has a sane default. Question is should it be commented out or replaced with the new valid name?

  Changed 5 years ago by jonmills

Here is the interface map for baremetal workers at TAMU:

xcat.interface.map=of-data:p4p1,vlan-data:p2p1,vlan-storage:p2p1

p4p1 == 10Gb (openflow)
p2p1 == Mellanox 40Gb (vlan)

tamu-w9: {

p4p1 => Port 49,
p2p1 => Port 9
}

tamu-w10: {

p4p1 => Port 50,
p2p1 => Port 13,
}

The switch config on 8264.tamu.xo has been updated to reflect these changes, and the TAMU baremetal stateless image is being updated to work with this also.

RDF must be updated to reflect these changes also.

  Changed 5 years ago by anirban

  • cc claris added

  Changed 5 years ago by jonmills

The IBM G8264 switches at TAMU & UCD have been updated to match the firmware on RCI rack (7.9.10.0).

No changes to WVN, obviously -- that is a Cisco switch.

  Changed 5 years ago by ibaldin

  • description modified (diff)

  Changed 5 years ago by yxin

RDF were changed for icd, tamu, and svn:
1. hybrid mode ports
2. geni stitching

  Changed 5 years ago by ibaldin

  • description modified (diff)

  Changed 5 years ago by ibaldin

  • description modified (diff)

  Changed 5 years ago by anirban

VM's not coming up at UCD. Imageproxy exception:

ImageProxy? unable to retrieve image: org.apache.axis2.AxisFault?: Connection refused; nested exception is:

java.net.ConnectException?: Connection refused; nested exception is:
org.apache.axis2.AxisFault?: Connection refused; nested exception is:
java.net.ConnectException?: Connection refused

  Changed 5 years ago by ibaldin

JM restarted it

  Changed 5 years ago by ibaldin

I added OF shared vlans into quantum on both UCD and TAMU (WVN doesn't have it)

  Changed 5 years ago by ibaldin

  • description modified (diff)

  Changed 5 years ago by anirban

For TAMU, a slice with a vm attached to a storage, the storage is not mounted.

Possibly, this is due to tgtd / (iscsi software) not running or configured on the storage "device". The discover seemed to work fine, but logging in did not work. TAMU is not IBM_DS.

vjo/jonathan, can you please take a look.

  Changed 5 years ago by jonmills

Notice: I've just updated TAMU, UCD, and WVN to latest available ORCA5 from SVN.

  Changed 5 years ago by jonmills

Update on WVN bare metal:

WVN bare metal nodes boot very, very reliably. They will pass traffic against each other, or against any VM.

Storage provisioning won't work owing to two situations: 1) there's a small issue with the handler right now; and 2) even if you manually plumb a vlan 1009 interface on a bare metal node with a correct IP address, you can't ping the NetApp? at WVNet -- this implies a misconfiguration of the NetApp?.

A final issue is puzzling. Although the bare metal nodes boot correctly, I'm unable to SSH to them via their public IP address.

  Changed 5 years ago by ibaldin

Prior to returning these racks back into the pool we should remember to revert config.xml changes delegating bare-metal nodes to local broker.

  Changed 5 years ago by anirban

I did some testing on the UCD rack.

TS2.2 square topology: works fine
Node attached to Nodegroup: works fine
openflow: works fine
vm attached to storage and another nodegroup: works fine
baremetal, vm, multiple storage, broadcast network: Issue with interfaces not coming up. See ticket #366

Tried modify, extend, recovery multiple times. Slices held up fine for these actions.

Claris, RCI and UCD is up for grabs in 5 mins for DAR testing.

Moving to testing topologies on the TAMU rack now.

  Changed 5 years ago by anirban

I ran the same tests at the TAMU rack.

TS2.2 square topology: works fine
Node attached to Nodegroup: works fine
vm attached to storage and another nodegroup: works fine

Tried modify, extend, recovery multiple times. Slices held up fine for these actions.

baremetal, vm, multiple storage, broadcast network: All dataplane interfaces came up fine (unlike at UCD). There are two issues with this slice at TAMU
1. The storage is not mounted both for vm and the baremetal node (NOTE: storage was mounted fine in the slice with vm attached to storage and another nodegroup)
2. The IP address on the storage interfaces are same for the vm and the baremetal nodes
For the VM:
eth2 Link encap:Ethernet HWaddr fe:16:3e:00:b4:97

inet addr:10.104.0.6 Bcast:10.104.0.255 Mask:255.255.255.0
inet6 addr: fe80::fc16:3eff:fe00:b497/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2610 errors:0 dropped:0 overruns:0 frame:0
TX packets:4296 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:467378 (456.4 KiB) TX bytes:391252 (382.0 KiB)

For the baremetal:
p2p1.1009 Link encap:Ethernet HWaddr F4:52:14:2F:39:D0

inet addr:10.104.0.6 Bcast:10.104.0.255 Mask:255.255.255.0
inet6 addr: fe80::f652:14ff:fe2f:39d0/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:37 errors:0 dropped:0 overruns:0 frame:0
TX packets:44 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:2588 (2.5 KiB) TX bytes:3332 (3.2 KiB)

I will test openflow on this rack tomorrow.

Claris, you now have RCI, UCD and TAMU for DAR testing.

  Changed 5 years ago by ibaldin

This is the issue on #365. The two are related - the storage not mounted and same IP on both storage interfaces on VM and baremetal.

  Changed 5 years ago by anirban

I updated rdf, redeployed and retested UCD rack.

TS2.2 square topology: works fine
Node attached to Nodegroup: works fine
vm attached to storage and another nodegroup: works fine

Tried modify, extend, recovery multiple times. Slices held up fine for these actions.

baremetal, vm, multiple storage, broadcast network:

This time, the storage and dataplane interfaces come up fine, both for vms and baremetal. The storage interface IPs are different for the vm and the baremetal. Storage is mounted on vm and baremetal. However, none of the nodes are ping-able from each other.

The handler-vm.log showed the following. NOTE the three "of-data" s. This slice had nothing to do with openflow. I suspect, that somehow quantum and xcat handler were specified the wrong data network.

2014-09-19 13:34:38,468 -- neuca-quantum-add-iface 24995 DEBUG : QUANTUM_NET_NETWORK: of-data
2014-09-19 13:35:01,688 -- neuca-quantum-add-iface 26079 DEBUG : QUANTUM_NET_NETWORK: of-data
2014-09-19 13:35:14,243 -- neuca-quantum-add-iface 26513 DEBUG : QUANTUM_NET_NETWORK: of-data
2014-09-19 13:35:16,031 -- neuca-quantum-add-iface 26548 DEBUG : QUANTUM_NET_NETWORK: vlan-storage
2014-09-19 13:36:08,874 -- neuca-quantum-add-iface 28794 DEBUG : QUANTUM_NET_NETWORK: vlan-data
2014-09-19 13:36:16,588 -- neuca-quantum-add-iface 29101 DEBUG : QUANTUM_NET_NETWORK: vlan-data
2014-09-19 13:36:16,789 -- neuca-quantum-add-iface 29111 DEBUG : QUANTUM_NET_NETWORK: vlan-data
2014-09-19 13:36:29,079 -- neuca-quantum-add-iface 29850 DEBUG : QUANTUM_NET_NETWORK: vlan-data
2014-09-19 13:36:29,091 -- neuca-quantum-add-iface 29853 DEBUG : QUANTUM_NET_NETWORK: vlan-data
2014-09-19 13:37:20,224 -- neuca-quantum-add-iface 30759 DEBUG : QUANTUM_NET_NETWORK: vlan-data
2014-09-19 13:37:22,010 -- neuca-quantum-add-iface 30814 DEBUG : QUANTUM_NET_NETWORK: vlan-storage
2014-09-19 13:37:52,026 -- neuca-quantum-add-iface 32256 DEBUG : QUANTUM_NET_NETWORK: vlan-data
2014-09-19 13:37:53,817 -- neuca-quantum-add-iface 32351 DEBUG : QUANTUM_NET_NETWORK: vlan-data
2014-09-19 13:37:55,688 -- neuca-quantum-add-iface 32462 DEBUG : QUANTUM_NET_NETWORK: vlan-data
2014-09-19 13:38:04,200 -- neuca-quantum-add-iface 307 DEBUG : QUANTUM_NET_NETWORK: vlan-data
2014-09-19 13:38:06,168 -- neuca-quantum-add-iface 393 DEBUG : QUANTUM_NET_NETWORK: vlan-data
2014-09-19 13:38:08,051 -- neuca-quantum-add-iface 457 DEBUG : QUANTUM_NET_NETWORK: vlan-data
2014-09-19 13:38:10,032 -- neuca-quantum-add-iface 514 DEBUG : QUANTUM_NET_NETWORK: vlan-data
2014-09-19 13:38:26,108 -- neuca-quantum-add-iface 1017 DEBUG : QUANTUM_NET_NETWORK: vlan-data
2014-09-19 13:38:26,493 -- neuca-quantum-add-iface 1037 DEBUG : QUANTUM_NET_NETWORK: vlan-data
2014-09-19 13:38:27,918 -- neuca-quantum-add-iface 1515 DEBUG : QUANTUM_NET_NETWORK: vlan-data
2014-09-19 13:38:28,330 -- neuca-quantum-add-iface 1605 DEBUG : QUANTUM_NET_NETWORK: vlan-data
2014-09-19 13:38:29,049 -- neuca-quantum-add-iface 1639 DEBUG : QUANTUM_NET_NETWORK: vlan-data
2014-09-19 13:38:29,731 -- neuca-quantum-add-iface 1665 DEBUG : QUANTUM_NET_NETWORK: vlan-data
2014-09-19 13:38:30,130 -- neuca-quantum-add-iface 1684 DEBUG : QUANTUM_NET_NETWORK: vlan-data
2014-09-19 13:38:30,981 -- neuca-quantum-add-iface 1710 DEBUG : QUANTUM_NET_NETWORK: vlan-data
2014-09-19 13:38:32,930 -- neuca-quantum-add-iface 1794 DEBUG : QUANTUM_NET_NETWORK: vlan-data

  Changed 5 years ago by anirban

I redeployed and retested TAMU rack.

TS2.2 square topology: works fine
Node attached to Nodegroup: works fine

vm attached to storage and another nodegroup: Storage is not mounted. Dataplane and storage interfaces come up with correct IPs. The nodes are pingable over the dataplane. This slice had worked yesterday.

baremetal, vm, multiple storage, broadcast network:

The storage and dataplane interfaces come up fine, both for vms and baremetal. The nodes are ping-able from each other with dataplane interface. The storage interface IPs are different for the vm and the baremetal. BUT, storage is not mounted on vm and baremetal.

I didn't see any update in tamuvmsite.rdf . Yufeng, did you forget to check in any possible changes you made ?

  Changed 5 years ago by ibaldin

I wonder if the issue at UCD is that the network names are swapped in the RDF (of-data and vlan-data) for some of the interfaces?

What happens if you do a VM-only and a bare-metal-only slices?

For TAMU, was the neuca-user-data correct for storage? Was the /tmp/xcat.XXX.bash script correct?

  Changed 5 years ago by yxin

1. tamuvmsite.rdf: just checked in. sorry, didn't notice the commit failed last night due to a DNS failure in the VM.
2. hostinterfacename: I also need to debug the controller code, which may picks up the wrong interface.

  Changed 5 years ago by ibaldin

about 2 - are we sure it's not an RDF typo? We've been pounding on it for a bit - haven't noticed anything of the sort. Unless your most recent changes somehow affected it.

  Changed 5 years ago by yxin

found an error in ucdvmsite.rdf, wrong bare metal OF interface. Can you pls give it a try after update?

follow-up: ↓ 34   Changed 5 years ago by anirban

TAMU still has exactly the same problem even after the rdf update. To reiterate, the issue for the two slices are:

vm attached to storage and another nodegroup: Storage is not mounted. Dataplane and storage interfaces come up with correct IPs. The nodes are pingable over the dataplane. This slice had worked yesterday.

baremetal, vm, multiple storage, broadcast network:
The storage and dataplane interfaces come up fine, both for vms and baremetal. The nodes are ping-able from each other with dataplane interface. The storage interface IPs are different for the vm and the baremetal. BUT, storage is not mounted on vm and baremetal.

in reply to: ↑ 33   Changed 5 years ago by jonmills

Replying to anirban:

tamu-storage server is functional. However, its /var/log/messages contains these warings:

Sep 19 11:05:55 tamu-storage tgtd: chap_initiator_auth_check_response(412) No valid user/pass combination for initiator iqn.2012-02.net.exogeni:742967b5-e534-445b-bc2e-b4f6393e85d0 found
Sep 19 11:05:55 tamu-storage tgtd: conn_close(101) connection closed, 0xc8e198 1
Sep 19 11:05:55 tamu-storage tgtd: conn_close(101) connection closed, 0xc8e198 1
Sep 19 11:05:56 tamu-storage tgtd: conn_close(101) connection closed, 0xc8e198 1
Sep 19 11:05:59 tamu-storage tgtd: conn_close(101) connection closed, 0xc8e198 1
Sep 19 11:06:02 tamu-storage tgtd: conn_close(101) connection closed, 0xc8e198 1
Sep 19 11:06:02 tamu-storage tgtd: conn_close(101) connection closed, 0xc8e198 1
Sep 19 11:06:05 tamu-storage tgtd: conn_close(101) connection closed, 0xc8e198 1
Sep 19 11:06:08 tamu-storage tgtd: conn_close(101) connection closed, 0xc8e198 1
Sep 19 11:06:11 tamu-storage tgtd: conn_close(101) connection closed, 0xc8e198 1
Sep 19 11:06:14 tamu-storage tgtd: conn_close(101) connection closed, 0xc8e198 1
Sep 19 11:06:16 tamu-storage tgtd: conn_close(101) connection closed, 0xc8e198 1
Sep 19 11:06:16 tamu-storage tgtd: chap_initiator_auth_check_response(412) No valid user/pass combination for initiator iqn.2012-02.net.exogeni:742967b5-e534-445b-bc2e-b4f6393e85d0 found
Sep 19 11:06:16 tamu-storage tgtd: conn_close(101) connection closed, 0xc8e198 1
Sep 19 11:06:17 tamu-storage tgtd: conn_close(101) connection closed, 0xc8e198 1
Sep 19 11:06:17 tamu-storage tgtd: conn_close(101) connection closed, 0xc8e198 1
Sep 19 11:06:20 tamu-storage tgtd: conn_close(101) connection closed, 0xc8e198 1
Sep 19 11:06:23 tamu-storage tgtd: conn_close(101) connection closed, 0xc8e198 1
Sep 19 11:06:23 tamu-storage tgtd: conn_close(101) connection closed, 0xc8e198 1
Sep 19 11:06:24 tamu-storage tgtd: conn_close(101) connection closed, 0xc8e198 1
Sep 19 11:06:26 tamu-storage tgtd: conn_close(101) connection closed, 0xc8e198 1
Sep 19 11:06:27 tamu-storage tgtd: conn_close(101) connection closed, 0xc8e198 1
Sep 19 11:06:29 tamu-storage tgtd: conn_close(101) connection closed, 0xc8e198 1
Sep 19 11:06:30 tamu-storage tgtd: conn_close(101) connection closed, 0xc8e198 1
Sep 19 11:06:32 tamu-storage tgtd: conn_close(101) connection closed, 0xc8e198 1
Sep 19 11:06:33 tamu-storage tgtd: conn_close(101) connection closed, 0xc8e198 1
Sep 19 11:06:35 tamu-storage tgtd: conn_close(101) connection closed, 0xc8e198 1
Sep 19 11:06:36 tamu-storage tgtd: conn_close(101) connection closed, 0xc8e198 1
Sep 19 11:06:37 tamu-storage tgtd: conn_close(101) connection closed, 0xc8e198 1
Sep 19 11:06:38 tamu-storage tgtd: chap_initiator_auth_check_response(412) No valid user/pass combination for initiator iqn.2012-02.net.exogeni:742967b5-e534-445b-bc2e-b4f6393e85d0 found
Sep 19 11:06:38 tamu-storage tgtd: conn_close(101) connection closed, 0xc8e198 1
Sep 19 11:06:38 tamu-storage tgtd: conn_close(101) connection closed, 0xc8e4e8 1
Sep 19 11:06:38 tamu-storage tgtd: conn_close(101) connection closed, 0xc8e198 1
Sep 19 11:06:39 tamu-storage tgtd: conn_close(101) connection closed, 0xc8e198 1
Sep 19 11:06:41 tamu-storage tgtd: conn_close(101) connection closed, 0xc8e198 1
Sep 19 11:06:42 tamu-storage tgtd: conn_close(101) connection closed, 0xc8e198 1
Sep 19 11:06:44 tamu-storage tgtd: conn_close(101) connection closed, 0xc8e198 1
Sep 19 11:06:45 tamu-storage tgtd: conn_close(101) connection closed, 0xc8e198 1
Sep 19 11:06:47 tamu-storage tgtd: conn_close(101) connection closed, 0xc8e198 1
Sep 19 11:06:48 tamu-storage tgtd: conn_close(101) connection closed, 0xc8e198 1
Sep 19 11:06:50 tamu-storage tgtd: conn_close(101) connection closed, 0xc8e198 1

TAMU still has exactly the same problem even after the rdf update. To reiterate, the issue for the two slices are:

vm attached to storage and another nodegroup: Storage is not mounted. Dataplane and storage interfaces come up with correct IPs. The nodes are pingable over the dataplane. This slice had worked yesterday.

baremetal, vm, multiple storage, broadcast network:
The storage and dataplane interfaces come up fine, both for vms and baremetal. The nodes are ping-able from each other with dataplane interface. The storage interface IPs are different for the vm and the baremetal. BUT, storage is not mounted on vm and baremetal.

  Changed 5 years ago by anirban

The UCD slices work fine.

  Changed 5 years ago by ibaldin

So what's up with TAMU storage?

  Changed 5 years ago by ibaldin

Please note that TAMU doesn't have storage defined in its RDF, so I commented out LUN-related sections of config.xml to avoid exceptions on startup.

  Changed 5 years ago by ibaldin

A=nd when I say TAMU, I really mean WVN!!!

  Changed 5 years ago by ibaldin

Despite being configured identically (and as far as I know running the same code), WVN rack refuses to register with DAR claiming no external registry has been specified. I'm blocked until this is resolved.

TAMU is claimed on NDL broker.

  Changed 5 years ago by vjo

How about now?

I changed orca.properties to look like this, near the bottom:

# ORCA registry type selection, by classname
registry.class=orca.shirako.container.DistributedRemoteRegistryCache?
#registry.class=orca.shirako.container.RemoteRegistryCache?

On restart, I only got one exception:

2014-09-22 21:58:20,438 [WrapperSimpleAppMain?] ERROR orca - Error establishing edge from df367e73-f02a-4903-ba3a-04c40fe36465 to 14502ff0-2c8f-442e-9f8d-7fc6df97a50d : orca.shirako.common.ConfigurationException?: establishEdgePrivate(): Actor df367e73-f02a-4903-ba3a-04c40fe36465 does not have a registry cache entry

I'm getting this for both AM/Broker and SM containers; the GUIDs in the exception above are the ones for wvn-broker and wvn-sm.

If needed, I can clean the keystores at WVN - if that would help.

  Changed 5 years ago by claris

Ilya,
Which site did you copy/paste the properties from?

  Changed 5 years ago by ibaldin

I've been using RCI as a template. I used it on NDL broker, ION and DD actors today, as well as ExoSM without any problems. Sent you email with more details. Log says registry URL is not specified, even though it is as far as I am concerned.

  Changed 5 years ago by claris

I see wvn-broker, wvn-sm and wvn-vm-am actors registered in both DARs on TAMU and WSU. They are not VERIFIED though. Do you want me to go ahead and verify them? I ahve not done anything. By any chance did you miss VJO's message? I think he fixed the problem using the right value for registry.class=orca.shirako.container.DistributedRemoteRegistryCache??

  Changed 5 years ago by ibaldin

The issue with storage at TAMU appears to be as follows:

The controller sets the IQN property (to be used by NEuca in the guest) as well as VM_GUID and LUN_GUID properties.

  • The controller creates the IQN property out of a standard string appended with VM_GUID
  • The IBM handler constructs an IQN to pass to DS3512 by appending a standard string with VM_GUID
  • The storage handler (at TAMU) uses the LUN_GUID instead to construct the IQN

This causes the inability of the VM to mount the storage - storage server creates a different IQN from what the node is presenting.

Question is should we be using VM GUID or LUN guid?

  Changed 5 years ago by ibaldin

Storage reportedly is working on TAMU now (anirban please check via ExoSM)

Before TAMU is returned to users, it needs to be clean-restarted - the delegatino of LUNS did not go through to rack-local broker, so storage via rack-local controller not working due to that.

  Changed 5 years ago by vjo

I'd like to update the storage_service handler (and, by corollary, storage_service server code), before returning it to users.

Not critical though - can be returned now.

  Changed 5 years ago by ibaldin

UCD requires a configuration update to delegate ucdNet vlans to rack-local broker (to support GENI stitching). This is adding new resources, not changing an existing delegation, but still not clear if clean-restart is required or can be done with recovery.

  Changed 5 years ago by ibaldin

We should remove SONIC and UCD bridges from BEN and DD created to supprt UCD SONIC experiments.

  Changed 5 years ago by ibaldin

Status per rack before we go live:

RCI - clean restarted with latest tag r6890 and cleanly delegated to local broker and to ndl-broker. Slice publishing finally seems to work to blowhole. RDF updated to add static VLAN 907 for Ezra, added in quantum and tested to at least do the right thing VM-side. VLAN itself is not yet provisioned.

TAMU- needs update to r6890 and clean-restart. Currently unable to do yum_sync. Storage should be tested.

WVN - requires code update to r6890, clean restart and delegation.

  Changed 5 years ago by ibaldin

tamu and wvn are up to r6890, not been restarted.

  Changed 5 years ago by ibaldin

TAMUnet has been delegated to broker again, vlans increased to 20, to account for 10 for GENI stitching.

  Changed 5 years ago by ibaldin

UCD should have its code updated and GENI stitching vlans delegated to local broker.

  Changed 5 years ago by ibaldin

UCD needs RDF update from r6898

  Changed 5 years ago by ibaldin

UCD config.xml updated to delegate 20 GENI stitching vlans to local broker (and 10 to ndl-broker).

  Changed 5 years ago by ibaldin

WVN URN appears to have changed. Waiting NOC confirmation.

UCD is up and running.

  Changed 5 years ago by ibaldin

  • status changed from assigned to closed
  • resolution set to fixed

WVN URN added to urn.map in oscars.site.properties. WVN is reachable now. Ticket closed.

Note: See TracTickets for help on using tickets.