Ticket #341 (closed task: fixed)

Opened 5 years ago

Last modified 5 years ago

Testing 5.0 on UNC rack

Reported by: ibaldin Owned by: pruth
Priority: major Milestone:
Component: External: Testing and Redeployment Version: baseline
Keywords: Cc: vjo, claris, anirban, yxin, jonmills, pruth

Description

Things that should be tested in UNC rack with 5.0

* Updated Nova handler

* Return of boot console

* Interface with couchdb/actor registry

* AM start/stop with state restoration under favorable conditions (no catastrophic tests)

Please add what else.

Attachments

purgeCouchDBServers.tar.gz (256.0 kB) - added by claris 5 years ago.
Script to clean up actor registry databases for a clean deployment

Change History

  Changed 5 years ago by claris

When is this work scheduled for?

  Changed 5 years ago by vjo

It hasn't been.
I am working to allocate time for this, but I have a lot on my plate right now.

  Changed 5 years ago by claris

Stitchport registry should be tested too including replication.

  Changed 5 years ago by ibaldin

Note that stitchport registry testing and deployment is on #307. This ticket should be reserved for discussions on deployment and testing of ORCA5.0

Note that one of the first things we need to do is resize MySQL - both for performance and table structure. See Migration Notes here: https://geni-orca.renci.org/trac/wiki/releases/Eastsound-5.0%22

  Changed 5 years ago by ibaldin

Things to test (that aren't embedding):

- Actor registry via couch db

- That slice garbage collection works (was broken before). Create a slice, delete it and then try to create another slice with the same name

- Recovery of actors (SM, AM, Broker)

- Recovery of controller. Note that in order for the controller to recover, the corresponding SM must already be running and ready to accept queries.

- The important thing is that post-recovery everything should continue working as before, including slice delete (of slices created prior to recovery), slice modify (of slices created prior to recovery), creation of new slices (that previously allocated labels aren't being stepped on.

  Changed 5 years ago by anirban

Did some preliminary actor recovery testing (SM, AM, Broker). Haven't seen any issues yet. Following worked fine:

a. Slice delete after recovery of SM + recovery of AM+Broker
b. Slice renew after recovery of AM+Broker
c. Slice garbage collection
d. New slice creation after recovery of AM+Broker

- Anirban

  Changed 5 years ago by claris

I have encountered some issues with the current CouchDB setup. SSL is timing out. I am not sure if this has to do with a bug I may have introduced in shirako while cleaning up or is on the server --as I found it unresponsive today. Looking into it. Hopefully it will be cleared early Thursday.

  Changed 5 years ago by claris

Problem with CouchDB solved. DistributedRemoteRegistryCache? is ready for testing.

Question: By default ORCA will use the old RemoteRegistryCache?, i.e., if the RegistryCache? property is not set in orca.properties (see release notes). Do we have an actor registry to use for testing purposes?

  Changed 5 years ago by ibaldin

- Need to test pubsub after recovery (of the controller).

  Changed 5 years ago by ibaldin

Once the controller branch is merged, we should retest slice GC. I've modified the rules by which slices are GC-ed - a 'dead' slice (all reservations closed/closing/failed) will only be deleted if either it has been seen by the user (manifest has been retrieved) or it's been more than 24 hours since the first time we tried to delete it.

  Changed 5 years ago by claris

I have checked in all the code.
(1) Check if without any changes to the configuration file the default actor registry is working.
(2) Check if DistributedActorRegistry? without replication works. Add new properties to orca.properties using only ONE actor registry.
(3) Check if DistributedActorRegistry? with replication works. Add two CouchDB services in orca.properties
(4) Check if authentication is working properly. Different user per actor.

Note that 4 is required to be done separately because the user_db is a table replicated across nodes. I need to ensure that replication works perfectly before we can test this in the wild.

  Changed 5 years ago by anirban

We are continuing on testing actor recovery on the UNC rack with Paul's changes for getting console log, renew etc.

The next feature to test is the DistributedActorRegistry?. As I understand, Claris's changes are all checked in. Victor wrote in a previous email that the following needed to be completed before we start testing DistributedActorRegistry?.

====
1) The aydan-recovery regression build needs to be fixed, since the code relies on the ektorp jar that I need to put into Nexus.
2) Once the regression build is stable, I need to work with Jonathan to package up and distribute the underlying dependencies for the code
3) I then need to figure out what code of Claris’s needs to be distributed/version controlled via packages, and what of it needs to be managed via another process
4) I need to get the dependencies and Claris’s code on UNC rack
5) (least effort) I need to get the ORCA configuration for UNC rack altered to allow testing both with and without Claris’s code, touch off a new RPM build, and update the ORCA installation.
====

Victor, where are we with these ? If we trigger rpm rebuild with Claris's changes, do yum_sync, clean, update on unc headnode, add properties to orca.properties, would that be enough to start testing ?

Regards,
- Anirban

  Changed 5 years ago by vjo

(1) is complete.
(2) is complete to a degree; we may need a still newer version of Erlang and CouchDB to properly support SSL (without proxying through Apache).
(3) is not started.
(4) is completed to a degree; completion relies on the answer to (2).
(5) is not started.

Your procedure would be similar to (5) and would get you close; I do not know the state of deployment of Claris's code on the existing CouchDB on unc-hn.

FYI: CCNIE/EAGER/Cisco stuff at Duke is also taking up a large fraction of my time.

  Changed 5 years ago by claris

Vjo,
regarding your question about (5). The code is already on the existing CouchDB on unc-hn.

Claris

follow-up: ↓ 17   Changed 5 years ago by vjo

OK.

I would say that the config can be added to orca.properties, and re-built RPMs installed.
You should be able to test - but, here are my remaining concerns:

1) We may need newer packages for everything (SSL) to be supported properly.
2) We need to develop our procedures/processes for managing the code that's in CouchDB, so that anyone can properly manage it, distribute it, roll it out, or roll it back when it's in production.

Whether (2) means packaging it as RPMs and management via Puppet, or distribution via CouchDB replication - I just want to make sure we have everything version controlled, we know how to verify the version running on each rack, and how to manage the software as a whole.

  Changed 5 years ago by vjo

And - as Ilya has rightly said - the testing of the CouchDB code is a separate ticket.

My concerns, as voiced here, are merely in regard to the CouchDB code being a dependency for ORCA 5 deployment.

in reply to: ↑ 15   Changed 5 years ago by anirban

Claris and I started testing with couchDB. We had an rpm built from the latest code and did yum_sync/update on the UNC head node. We changed the property files for the actors to use couchDB. But, it seems that the actors fail to come up because of a NoClassDefFoundError?. It is not being able to find org.ektorp.CouchDbInstance? , which might have to do with this dependency not being packaged. That's our best guess. Victor, how do we go about from here ? This is the error snippet from orca-stdout.log

STATUS | wrapper | 2014/08/12 18:41:15 | Launching a JVM...
INFO | jvm 1 | 2014/08/12 18:41:16 | Wrapper (Version 3.2.3) http://wrapper.tanukisoftware.org
INFO | jvm 1 | 2014/08/12 18:41:16 | Copyright 1999-2006 Tanuki Software, Inc. All Rights Reserved.
INFO | jvm 1 | 2014/08/12 18:41:16 |
INFO | jvm 1 | 2014/08/12 18:41:16 | log4j:WARN No appenders could be found for logger (org.eclipse.jetty.util.log).
INFO | jvm 1 | 2014/08/12 18:41:16 | log4j:WARN Please initialize the log4j system properly.
INFO | jvm 1 | 2014/08/12 18:41:19 |
INFO | jvm 1 | 2014/08/12 18:41:19 | WrapperSimpleApp?: Encountered an error running main: java.lang.NoClassDefFoundError?: org/ektorp/CouchDbInstance
INFO | jvm 1 | 2014/08/12 18:41:19 | java.lang.NoClassDefFoundError?: org/ektorp/CouchDbInstance
INFO | jvm 1 | 2014/08/12 18:41:19 | at java.lang.Class.forName0(Native Method)
INFO | jvm 1 | 2014/08/12 18:41:19 | at java.lang.Class.forName(Unknown Source)
INFO | jvm 1 | 2014/08/12 18:41:19 | at orca.shirako.container.OrcaContainer?.initialize(OrcaContainer?.java:232)
INFO | jvm 1 | 2014/08/12 18:41:19 | at orca.shirako.container.Globals.start(Globals.java:118)
INFO | jvm 1 | 2014/08/12 18:41:19 | at orca.server.OrcaServer?.startOrca(OrcaServer?.java:62)
INFO | jvm 1 | 2014/08/12 18:41:19 | at orca.server.OrcaServer?.start(OrcaServer?.java:114)
INFO | jvm 1 | 2014/08/12 18:41:19 | at orca.server.OrcaServer?.main(OrcaServer?.java:165)
INFO | jvm 1 | 2014/08/12 18:41:19 | at sun.reflect.NativeMethodAccessorImpl?.invoke0(Native Method)
INFO | jvm 1 | 2014/08/12 18:41:19 | at sun.reflect.NativeMethodAccessorImpl?.invoke(Unknown Source)
INFO | jvm 1 | 2014/08/12 18:41:19 | at sun.reflect.DelegatingMethodAccessorImpl?.invoke(Unknown Source)
INFO | jvm 1 | 2014/08/12 18:41:19 | at java.lang.reflect.Method.invoke(Unknown Source)
INFO | jvm 1 | 2014/08/12 18:41:19 | at org.tanukisoftware.wrapper.WrapperSimpleApp?.run(WrapperSimpleApp?.java:240)
INFO | jvm 1 | 2014/08/12 18:41:19 | at java.lang.Thread.run(Unknown Source)

Replying to vjo:

OK.

I would say that the config can be added to orca.properties, and re-built RPMs installed.
You should be able to test - but, here are my remaining concerns:

1) We may need newer packages for everything (SSL) to be supported properly.
2) We need to develop our procedures/processes for managing the code that's in CouchDB, so that anyone can properly manage it, distribute it, roll it out, or roll it back when it's in production.

Whether (2) means packaging it as RPMs and management via Puppet, or distribution via CouchDB replication - I just want to make sure we have everything version controlled, we know how to verify the version running on each rack, and how to manage the software as a whole.

  Changed 5 years ago by vjo

  • cc jonmills, pruth added

OK - some explanation is in order.

The following file:

/opt/orca/conf/wrapper.conf

is marked as a configuration file in the RPM metadata.

The file is the configuration file for the native code program that acts as a "watchdog" for the JVM that runs ORCA, and that effectively daemonizes it.

I marked this as a configuration file, because I wanted to have the ability to alter things like JAVA_HOME in it, or the heap size for the JVM, without having those changes automatically overwritten by a subsequent RPM upgrade.

It includes many things - among them, a full list of jars that the native wrapper will include in the classpath of the JVM beng loaded.

This configuration file is automatically generated as part of the ORCA build, and is included in the generated RPM.

Now - as to why things stopped working:
I modified the file, so that JAVA_HOME would reference a 1.7 JVM.

This marked the wrapper.conf file as changed, which meant that, according to how I wrote the spec file, it would not get replaced.

Here's the problem:
The newly built ORCA required the ektorp jar, which was included in the newly created wrapper.conf file.
The old wrapper.conf, because it was marked as changed, was not overwritten.
Because it was not overwritten, the native wrapper attempted to start ORCA without the ektorp jar it required, despite it being included in the package.

The cure is to merge the old and new configuration files; the old one remains in wrapper.conf, the new one gets placed in wrapper.conf.rpmnew.

I have done this, and restarted the actors.

This change is also required for the controller. I took care of that too.

  Changed 5 years ago by vjo

FYI - processing of the orca.properties file needs the use of String.trim() too; the cert fingerprints had spaces at the end, that caused an exception on startup.

Also, we get:
Caused by: java.net.UnknownHostException?: slookup2.exogeni.net

at java.net.InetAddress?.getAllByName0(Unknown Source)
at java.net.InetAddress?.getAllByName(Unknown Source)
at java.net.InetAddress?.getAllByName(Unknown Source)

I edited the properties to remove slookup2, and got the AM running again.

When I tried to do the same w/ the SM, I got:

2014-08-13 03:32:50,313 [WrapperSimpleAppMain?] FATAL orca - Critical error: Orca failed to initialize
orca.shirako.container.ContainerInitializationException?: java.lang.reflect.InvocationTargetException?

at orca.shirako.container.OrcaContainer?.initialize(OrcaContainer?.java:280)
at orca.shirako.container.Globals.start(Globals.java:118)
at orca.server.OrcaServer?.startOrca(OrcaServer?.java:62)
at orca.server.OrcaServer?.start(OrcaServer?.java:114)
at orca.server.OrcaServer?.main(OrcaServer?.java:165)
at sun.reflect.NativeMethodAccessorImpl?.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl?.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl?.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.tanukisoftware.wrapper.WrapperSimpleApp?.run(WrapperSimpleApp?.java:240)
at java.lang.Thread.run(Unknown Source)

Caused by: java.lang.reflect.InvocationTargetException?

at sun.reflect.NativeMethodAccessorImpl?.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl?.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl?.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at orca.shirako.container.OrcaContainer?.initialize(OrcaContainer?.java:250)
... 10 more

Caused by: org.ektorp.DbAccessException?: 401:Unauthorized
URI: /actor/
Response Body:
{

"error" : "unauthorized",
"reason" : "Name or password is incorrect."

}

at org.ektorp.http.StdResponseHandler?.createDbAccessException(StdResponseHandler?.java:50)
at org.ektorp.http.StdResponseHandler?.error(StdResponseHandler?.java:68)
at org.ektorp.http.RestTemplate?.handleVoidResponse(RestTemplate?.java:113)
at org.ektorp.http.RestTemplate?.put(RestTemplate?.java:39)
at org.ektorp.impl.StdCouchDbInstance?.createDatabase(StdCouchDbInstance?.java:58)
at org.ektorp.impl.StdCouchDbInstance?.createDatabase(StdCouchDbInstance?.java:50)
at org.ektorp.impl.StdCouchDbConnector?.createDatabaseIfNotExists(StdCouchDbConnector?.java:400)
at org.ektorp.impl.StdCouchDbInstance?.createConnector(StdCouchDbInstance?.java:115)
at orca.shirako.container.DistributedRemoteRegistryCache?.singleQuery(DistributedRemoteRegistryCache?.java:270)
... 15 more

That appears to have been caused by a space at the end of the username...

Now - to the *real* problems with the code:

2014-08-13 03:37:59,219 [WrapperSimpleAppMain?] ERROR orca - Could not connect to the registry server null:
org.ektorp.UpdateConflictException?: document update conflict: id: unknown rev: unknown

at org.ektorp.http.StdResponseHandler?.createDbAccessException(StdResponseHandler?.java:42)
at org.ektorp.http.StdResponseHandler?.error(StdResponseHandler?.java:68)
at org.ektorp.http.RestTemplate?.handleResponse(RestTemplate?.java:122)
at org.ektorp.http.RestTemplate?.put(RestTemplate?.java:43)
at org.ektorp.impl.StdCouchDbConnector?.create(StdCouchDbConnector?.java:114)
at org.ektorp.support.CouchDbRepositorySupport?.add(CouchDbRepositorySupport?.java:86)
at orca.shirako.container.DistributedRemoteRegistryCache?.registerWithRegistry(DistributedRemoteRegistryCache?.java:871)
at orca.shirako.container.OrcaContainer?.registerCommon(OrcaContainer?.java:873)
at orca.shirako.container.OrcaContainer?.registerActor(OrcaContainer?.java:813)
at orca.boot.ConfigurationProcessor?.registerActors(ConfigurationProcessor?.java:220)
at orca.boot.ConfigurationProcessor?.process(ConfigurationProcessor?.java:140)
at orca.boot.ConfigurationLoader?.process(ConfigurationLoader?.java:76)
at orca.shirako.container.OrcaContainer?.loadConfiguration(OrcaContainer?.java:782)
at orca.shirako.container.OrcaContainer?.loadConfiguration(OrcaContainer?.java:751)
at orca.shirako.container.OrcaContainer?.initialize(OrcaContainer?.java:256)
at orca.shirako.container.Globals.start(Globals.java:118)
at orca.server.OrcaServer?.startOrca(OrcaServer?.java:62)
at orca.server.OrcaServer?.start(OrcaServer?.java:114)
at orca.server.OrcaServer?.main(OrcaServer?.java:165)
at sun.reflect.NativeMethodAccessorImpl?.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl?.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl?.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.tanukisoftware.wrapper.WrapperSimpleApp?.run(WrapperSimpleApp?.java:240)
at java.lang.Thread.run(Unknown Source)

I'm getting that repeatably, on restart, for all actors.

  Changed 5 years ago by ibaldin

I tested complete shutdown and recovery in a simple case today (3 slices interdomain). AM, SM, Broker, controller went down and back up. Seems to mostly work. There was an issue with some reservation states not being properly updated after recovery - Yufeng will be looking at that.

  Changed 5 years ago by claris

Victor,

* Absolutely right, I should have done the trimming on the orca.properties. I'll check that fix in.
* The problem with slookup2.exogeni.net has to do with a fix that I checked in yesterday but was not deployeded in UNC-HN. I did that this morning and voila. Problem solved.
* The update conflict error you encountered was because couchdb was already populated with all the actors from a previous run, i.e., actor entries were already in the registry. During your run the actors tried to create new documents (records) with the same GUIDs already in the server and the server responded with an update conflict (key conflict in traditional DB speaking). This is what a datastore should be doing. The fix for that is either to purge the actor registry or perform proper conflict management by allowing the actor to "update" the existing record--if authenticated. I am inclined to the latter but open to suggestions.

UPDATE on TESTING DISTRIBUTED REMOTE REGISTRY
The key functionality seems to work just fine. Some of the key issues I encountered had to do with incompatibilities with older version of HTTP artifacts which I resolved and poor understanding of how to enter the topology information in config.xml.

Things I looked at while testing.
- Keystores in AM and SM are properly populated once new actors are verified in DAR.
- Deployed dumb bell in UNC site under two config cases:
(1) AM-broker topology in config.xml includes edges between broker and AMs and edge between SM and AM but no certificate.
(2) AM-broker topology in config.xml includes edges between brokers and AM and no edge (therefore no certificate) between SM and AM.

Rule of thumb: Any edge including a remote actor end-point should not be included in the static topology provided via config.xml. It is learned via the DAR. However, if included it will still work. In the latter case, certificate is not required as it is learned via the DAR.

What has not been tested yet?
* Replication works as it should
* User-database is replicated properly using incremental replication mode (not continuous). Problem with continuous replication is that app developer has no control over when is the data replicated. This becomes an issue with 1 minute heartbeats as it requires too much logic for handling update conflicts.
* Bringing up and down actors (update conflict must be handled as VJO run exposed)

What needs to be done
* Check in javascript code into orca/external
* Write up script for repopulating couchdb servers with code (db/_design docs) and replication (replicator_db) rules (dump/load via curl)
* Clean logging (Reduce logger.info )
* Improve documentation : Create some documentation for testing actor registry in general (couchdb or not), test of couchdb server in isolation, X
*
*
*... I'll be adding as I go.

  Changed 5 years ago by ibaldin

A note on slice deletion - there is a new code in controller-recovery branch, soon to be merged to aydan-recovery, that changes the behavior of slice garbage collection - instead of being deleted immediately, slices go away after 24 hours (or later - since this is lazy) after they were first ready to be deleted. This is done to support cases when every sliver in a slice fails - want to avoid immediate GC in this case.

This complicates the testing somewhat as now, to see a slice go away you must delete it, then after 24 hours do either listResources or create another slice to trigger the actual deletion.

  Changed 5 years ago by ibaldin

While testing recovery in dev VM I came across these two exceptions:

java.lang.NullPointerException?

at orca.plugins.ben.control.NdlInterfaceVLANControl.assign(NdlInterfaceVLANControl.java:105)
at orca.policy.core.AuthorityCalendarPolicy?.assign(AuthorityCalendarPolicy?.java:502)
at orca.policy.core.AuthorityCalendarPolicy?.map(AuthorityCalendarPolicy?.java:472)
at orca.policy.core.AuthorityCalendarPolicy?.mapGrowing(AuthorityCalendarPolicy?.java:455)
at orca.policy.core.AuthorityCalendarPolicy?.mapForCycle(AuthorityCalendarPolicy?.java:392)
at orca.policy.core.AuthorityCalendarPolicy?.assign(AuthorityCalendarPolicy?.java:361)
at orca.shirako.core.Authority.tickHandler(Authority.java:336)
at orca.shirako.core.Actor.actorTick(Actor.java:427)
at orca.shirako.core.Actor.access$000(Actor.java:60)
at orca.shirako.core.Actor$1.process(Actor.java:332)
at orca.shirako.core.Actor.actorMain(Actor.java:376)
at orca.shirako.core.Actor$4.run(Actor.java:1016)
at java.lang.Thread.run(Thread.java:745)

rca.util.OrcaException?: Could not recover Reservation #4803A855 due to null

at orca.shirako.core.Actor.recoverReservation(Actor.java:796)
at orca.shirako.core.Actor.recoverReservations(Actor.java:751)
at orca.shirako.core.Actor.recoverSlice(Actor.java:729)
at orca.shirako.core.Actor.recoverSlices(Actor.java:698)
at orca.shirako.core.Actor.recover(Actor.java:676)
at orca.shirako.container.OrcaContainer?.recoverActor(OrcaContainer?.java:648)
at orca.shirako.container.OrcaContainer?.recoverActors(OrcaContainer?.java:612)
at orca.shirako.container.OrcaContainer?.boot(OrcaContainer?.java:287)
at orca.shirako.container.OrcaContainer?.initialize(OrcaContainer?.java:194)
at orca.shirako.container.Globals.start(Globals.java:119)
at orca.server.OrcaServer?.startOrca(OrcaServer?.java:62)
at orca.server.OrcaServer?.start(OrcaServer?.java:114)
at orca.server.OrcaServer?.main(OrcaServer?.java:165)
at sun.reflect.NativeMethodAccessorImpl?.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl?.invoke(NativeMethodAccessorImpl?.java:57)
at sun.reflect.DelegatingMethodAccessorImpl?.invoke(DelegatingMethodAccessorImpl?.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.tanukisoftware.wrapper.WrapperSimpleApp?.run(WrapperSimpleApp?.java:240)
at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.NullPointerException?

at orca.policy.core.AuthorityCalendarPolicy?.getClose(AuthorityCalendarPolicy?.java:575)
at orca.policy.core.AuthorityCalendarPolicy?.revisit(AuthorityCalendarPolicy?.java:209)
at orca.shirako.core.Actor.recoverReservation(Actor.java:790)

It isn't clear how reproducible they are. I have seen strange behavior without recovery in the dev VM if I put the laptop to sleep - probably because ORCA actors inside the VM aren't properly sleeping and after that problems begin. I'm simply adding these for completeness, in case anyone else sees them.

  Changed 5 years ago by claris

Not able to establish connection to couchdb instances behind Apache SSL.

Ektorp library is unable to talk to the db instances in WSU and TAMU. I am getting javax.net.ssl.SSLPeerUnverifiedException with both. Any suggestion on how to debug this?


014-08-14 03:38:41,645 [WrapperSimpleAppMain?] ERROR orca - Unable to reach Actor Registry at https://tamu-hn.exogeni.net:6984
2014-08-14 03:38:41,797 [WrapperSimpleAppMain?] ERROR orca - Could not connect to the registry server null:
org.ektorp.DbAccessException?: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated

at org.ektorp.util.Exceptions.propagate(Exceptions.java:19)
at orca.ektorp.client.OrcaStdHttpClient?.executeRequest(OrcaStdHttpClient?.java:195)
at orca.ektorp.client.OrcaStdHttpClient?.executeRequest(OrcaStdHttpClient?.java:204)
at orca.ektorp.client.OrcaStdHttpClient?.head(OrcaStdHttpClient?.java:158)
at org.ektorp.http.RestTemplate?.head(RestTemplate?.java:105)
at org.ektorp.impl.StdCouchDbInstance?.checkIfDbExists(StdCouchDbInstance?.java:73)
at org.ektorp.impl.StdCouchDbConnector?.createDatabaseIfNotExists(StdCouchDbConnector?.java:399)
at org.ektorp.support.CouchDbRepositorySupport?.<init>(CouchDbRepositorySupport?.java:62)
at org.ektorp.support.CouchDbRepositorySupport?.<init>(CouchDbRepositorySupport?.java:53)
at orca.ektorp.repository.ActorRepository?.<init>(ActorRepository?.java:20)
at orca.shirako.container.DistributedRemoteRegistryCache?.registerWithRegistry(DistributedRemoteRegistryCache?.java:856)
at orca.shirako.container.OrcaContainer?.registerCommon(OrcaContainer?.java:873)
at orca.shirako.container.OrcaContainer?.registerActor(OrcaContainer?.java:813)
at orca.boot.ConfigurationProcessor?.registerActors(ConfigurationProcessor?.java:220)
at orca.boot.ConfigurationProcessor?.process(ConfigurationProcessor?.java:140)
at orca.boot.ConfigurationLoader?.process(ConfigurationLoader?.java:76)
at orca.shirako.container.OrcaContainer?.loadConfiguration(OrcaContainer?.java:782)
at orca.shirako.container.OrcaContainer?.loadConfiguration(OrcaContainer?.java:751)
at orca.shirako.container.OrcaContainer?.initialize(OrcaContainer?.java:256)
at orca.shirako.container.Globals.start(Globals.java:118)
at orca.server.OrcaServer?.startOrca(OrcaServer?.java:62)
at orca.server.OrcaServer?.start(OrcaServer?.java:114)
at orca.server.OrcaServer?.main(OrcaServer?.java:165)
at sun.reflect.NativeMethodAccessorImpl?.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl?.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl?.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.tanukisoftware.wrapper.WrapperSimpleApp?.run(WrapperSimpleApp?.java:240)
at java.lang.Thread.run(Unknown Source)

Caused by: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated

  Changed 5 years ago by ibaldin

@anirban: you need to look at deleteSlice() call in XmlrpcOrcaHandler?. It calls a function to remove a slice from pubsub queue, but the comment on the function says "This method is not called when deleteSlice is called; the publishing state machine takes care of this case". Perhaps it should not be called there?

Please do it after the merge of controller-recovery branch.

  Changed 5 years ago by claris

Success with couchdb behind Apache SSL.
Apparently it has to do with the multikeymanager getting on the way of how SSL handles public CAs. I have fixed the problem but I will clean it up and checked it in later.

  Changed 5 years ago by claris

Replication:
I have tested replication with wsu-hn as master node and tamu-hn as slave node. I have implemented incremental replication instead of continuous replication. The down side is that I had to write more code for that (i.e., continuous replication is handled by the store itself in the background). The up side is that we decide what and when to replicate. The problem i encountered with continous replication is that it scans to the whole datastore before pushing/pulling replication and I can't trigger the replication from the application. Both replica methods reach eventual consistency but incremental is more immediate.

Conflict Management:
Actors can come and go. An actor trying to "create" a record with the same guid and hash than an existing one can only update.

WorkingRegistry?:
An actor will communicate with the first actor registry that is available in the RegisterURL property. Every transaction is updated and replicated. Update and replicate are not considered one single atomic operation. Replication may fail if the replica actor registry is offline.

More to do:
- I need to write code that will sync an incoming actor. This may be as simple as doing a continous replication from all existing registries to the new one.
- Enable replication in the webpage (once actors are verified). Find code buried in couchapp.


  Changed 5 years ago by ibaldin

@claris: add some command line tools for listing,approval,removal of actor records.

  Changed 5 years ago by anirban

I did some controller recovery testing on UNC. Other than an intermittent issue with slice state, described in ticket #349, things seem to be doing.

There is an issue with verbose logging, which Victor is going to take a look at.

Next steps:

1. Victor is going to do a code update, rebuild etc. on UNC with new code checked in today afternoon. He will pass the handle to CLaris.
2. Claris will be doing testing with couchDB with her changes. When done, she will pass the handle to Paul.
3. Paul is going to beat on UNC with recovery testing after that.

- Anirban

  Changed 5 years ago by ibaldin

I will look at #349 on Monday

  Changed 5 years ago by vjo

Should I be asking folks to close their slices, before doing the code update? ;)

There's VMs outstanding...

  Changed 5 years ago by vjo

Oops. Clarified in email from Anirban. Cleaning up UNC and installing latest code.

  Changed 5 years ago by vjo

OK - actors restarted with latest code.

Claris - you have the ball. ;)

follow-up: ↓ 35   Changed 5 years ago by claris

Tested registry + replication (w and w/o fail over) + actors restart. I deployed two slices: one single vm and a dumb bell.
The current deployment at unc-hn is running the latest code with DAR activated. To de-active DAR comment out the last property section on orca.properties.

in reply to: ↑ 34   Changed 5 years ago by anirban

Replying to claris:

Tested registry + replication (w and w/o fail over) + actors restart. I deployed two slices: one single vm and a dumb bell.
The current deployment at unc-hn is running the latest code with DAR activated. To de-active DAR comment out the last property section on orca.properties.

Claris, I see that properties for old Actor registry and DAR are both uncommented in orca.properties. Looking at the logs it seems that even if both are uncommented, the DAR is picked, most probably because it is the later set of properties. Just to be consistent, we should comment out the old actor registry properties before we redeploy again.

  Changed 5 years ago by claris

We currently have two modes: DAR and nDAR.

When an actor comes up it looks for property "registry.class" which can have two values
orca.shirako.container.DistributedRemoteRegistryCache? or orca.shirako.container.RemoteRegistryCache?

An actor picks up whatever mode this property determines. If this property is not present it defaults to nDAR.

I will clarify this better in the Release notes.

Changed 5 years ago by claris

Script to clean up actor registry databases for a clean deployment

follow-up: ↓ 38   Changed 5 years ago by claris

I have attached a script that can be used to clean up the databases related to the actor registry. The tar file includes the script, jq (json processor for linux and OSX) and Readme file. Notice that for a redeployment it is not required to clean up the databases however it helps testing a fresh deployment from scratch. The script can run from any host that can reach the couchdb servers over the Internet.

I have created a username and password for managin the actor database, actor-name:aktorpazz. orca.properties must be updated in UNC-HN with these credentials. See below:

###############################################
# ORCA global actor registry
###############################################
registry.url.1=https://wsu-hn.exogeni.net,https://tamu-hn.exogeni.net
registry.certfingerprint.1=d1:b9:7f:a9:e2:29:89:8d:1a:7c:c3:f2:df:ba:b6:26
registry.couchdb.username=actor-admin
registry.couchdb.password=aktorpazz
registry.class=orca.shirako.container.DistributedRemoteRegistryCache?

Finally, to verify individual actors go to https://admin:X0admin@wsu-hn.exogeni.net:6984/actor. Find the actor you want to verify --you can use the Views that I have created to filter out documents (actos) per type (look at the top right corner), double click on the actor and change the "Verified" field to "Y".

And if something goes wrong disable DAR and shoot me an email. To disable DAR change
registry.class to orca.shirako.container.RemoteRegistryCache?

in reply to: ↑ 37 ; follow-up: ↓ 39   Changed 5 years ago by anirban

Claris,

I was assuming that to disable DAR, I would need to just comment out all couchDB related properties. That should use the old actor registry. Correct ?

You mentioned here that this property needs to be set to use a non-distributed version of couchDB AR (?)

registry.class=orca.shirako.container.RemoteRegistryCache?

What are additional properties for that to work ? I believe we tested that. But I don't seem to have a copy of the relevant properties.

- Anirban

Replying to claris:

I have attached a script that can be used to clean up the databases related to the actor registry. The tar file includes the script, jq (json processor for linux and OSX) and Readme file. Notice that for a redeployment it is not required to clean up the databases however it helps testing a fresh deployment from scratch. The script can run from any host that can reach the couchdb servers over the Internet.

I have created a username and password for managin the actor database, actor-name:aktorpazz. orca.properties must be updated in UNC-HN with these credentials. See below:

###############################################
# ORCA global actor registry
###############################################
registry.url.1=https://wsu-hn.exogeni.net,https://tamu-hn.exogeni.net
registry.certfingerprint.1=d1:b9:7f:a9:e2:29:89:8d:1a:7c:c3:f2:df:ba:b6:26
registry.couchdb.username=actor-admin
registry.couchdb.password=aktorpazz
registry.class=orca.shirako.container.DistributedRemoteRegistryCache?


Finally, to verify individual actors go to https://admin:X0admin@wsu-hn.exogeni.net:6984/actor. Find the actor you want to verify --you can use the Views that I have created to filter out documents (actos) per type (look at the top right corner), double click on the actor and change the "Verified" field to "Y".

And if something goes wrong disable DAR and shoot me an email. To disable DAR change
registry.class to orca.shirako.container.RemoteRegistryCache?

in reply to: ↑ 38 ; follow-up: ↓ 40   Changed 5 years ago by claris

Anirban,

To disable DAR (old fashion actor registry):
registry.class=orca.shirako.container.RemoteRegistryCache?
and uncomment all the properties related to old Actor registry.

To enable DAR
registry.class=orca.shirako.container.DistributedRemoteRegistryCache?
and uncomment all the properties related to couchdb (as below).

Claris
Replying to anirban:

Claris,

I was assuming that to disable DAR, I would need to just comment out all couchDB related properties. That should use the old actor registry. Correct ?

You mentioned here that this property needs to be set to use a non-distributed version of couchDB AR (?)

registry.class=orca.shirako.container.RemoteRegistryCache?

What are additional properties for that to work ? I believe we tested that. But I don't seem to have a copy of the relevant properties.

- Anirban



Replying to claris:

I have attached a script that can be used to clean up the databases related to the actor registry. The tar file includes the script, jq (json processor for linux and OSX) and Readme file. Notice that for a redeployment it is not required to clean up the databases however it helps testing a fresh deployment from scratch. The script can run from any host that can reach the couchdb servers over the Internet.

I have created a username and password for managin the actor database, actor-name:aktorpazz. orca.properties must be updated in UNC-HN with these credentials. See below:

###############################################
# ORCA global actor registry
###############################################
registry.url.1=https://wsu-hn.exogeni.net,https://tamu-hn.exogeni.net
registry.certfingerprint.1=d1:b9:7f:a9:e2:29:89:8d:1a:7c:c3:f2:df:ba:b6:26
registry.couchdb.username=actor-admin
registry.couchdb.password=aktorpazz
registry.class=orca.shirako.container.DistributedRemoteRegistryCache?


Finally, to verify individual actors go to https://admin:X0admin@wsu-hn.exogeni.net:6984/actor. Find the actor you want to verify --you can use the Views that I have created to filter out documents (actos) per type (look at the top right corner), double click on the actor and change the "Verified" field to "Y".

And if something goes wrong disable DAR and shoot me an email. To disable DAR change
registry.class to orca.shirako.container.RemoteRegistryCache?

in reply to: ↑ 39   Changed 5 years ago by anirban

Claris,

Your recent checkin has broken the build. shirako core fails to compile. Can you please fix it ?

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.0.2:compile (default-compile) on project shirako: Compilation failure
[ERROR] /Users/anirban/Documents/RENCI-research/Codes/orca-5.0-aydan-recovery/core/shirako/src/main/java/orca/shirako/util/SSLRestHttpClient.java:[31,25] error: cannot find symbol
[ERROR] -> [Help 1]
[ERROR]

Regards,
- Anirban

Replying to claris:

Anirban,

To disable DAR (old fashion actor registry):
registry.class=orca.shirako.container.RemoteRegistryCache?
and uncomment all the properties related to old Actor registry.

To enable DAR
registry.class=orca.shirako.container.DistributedRemoteRegistryCache?
and uncomment all the properties related to couchdb (as below).

Claris
Replying to anirban:

Claris,

I was assuming that to disable DAR, I would need to just comment out all couchDB related properties. That should use the old actor registry. Correct ?

You mentioned here that this property needs to be set to use a non-distributed version of couchDB AR (?)

registry.class=orca.shirako.container.RemoteRegistryCache?

What are additional properties for that to work ? I believe we tested that. But I don't seem to have a copy of the relevant properties.

- Anirban



Replying to claris:

I have attached a script that can be used to clean up the databases related to the actor registry. The tar file includes the script, jq (json processor for linux and OSX) and Readme file. Notice that for a redeployment it is not required to clean up the databases however it helps testing a fresh deployment from scratch. The script can run from any host that can reach the couchdb servers over the Internet.

I have created a username and password for managin the actor database, actor-name:aktorpazz. orca.properties must be updated in UNC-HN with these credentials. See below:

###############################################
# ORCA global actor registry
###############################################
registry.url.1=https://wsu-hn.exogeni.net,https://tamu-hn.exogeni.net
registry.certfingerprint.1=d1:b9:7f:a9:e2:29:89:8d:1a:7c:c3:f2:df:ba:b6:26
registry.couchdb.username=actor-admin
registry.couchdb.password=aktorpazz
registry.class=orca.shirako.container.DistributedRemoteRegistryCache?


Finally, to verify individual actors go to https://admin:X0admin@wsu-hn.exogeni.net:6984/actor. Find the actor you want to verify --you can use the Views that I have created to filter out documents (actos) per type (look at the top right corner), double click on the actor and change the "Verified" field to "Y".

And if something goes wrong disable DAR and shoot me an email. To disable DAR change
registry.class to orca.shirako.container.RemoteRegistryCache?

follow-up: ↓ 42   Changed 5 years ago by claris

Is there a way to run maven with the -e switch so that I can see more details on the error?
The error does not say which symbol it can't find and shirako compiles just fine in my emulation environment. Anirban did tell me over GTalk that it does not compile in his environment so I may have added something to my environment that I can't recall.

I will look into this tomorrow morning. In the mean time can someone advice on how to obtain more details on the error it would be greatly appreciated. Thanks!

in reply to: ↑ 41 ; follow-up: ↓ 43   Changed 5 years ago by anirban

Claris,

I think you might have forgotten to check in the class SSLCurl, which you imported in SSLRestHttpClient.java

import orca.ektorp.client.SSLCurl;

I don't see that class in orca.ektorp.client package

Regards,
- Anirban

Replying to claris:

Is there a way to run maven with the -e switch so that I can see more details on the error?
The error does not say which symbol it can't find and shirako compiles just fine in my emulation environment. Anirban did tell me over GTalk that it does not compile in his environment so I may have added something to my environment that I can't recall.

I will look into this tomorrow morning. In the mean time can someone advice on how to obtain more details on the error it would be greatly appreciated. Thanks!

in reply to: ↑ 42   Changed 5 years ago by claris

Aniban,
Indeed. My Eclipse svn plugin seems to be confused as the Team Sync View shows SSLCurl as if it is already checked in.
The filesystem was also confused since I had to delete it and recreate it again. I just checked it in.

Replying to anirban:

Claris,

I think you might have forgotten to check in the class SSLCurl, which you imported in SSLRestHttpClient.java

import orca.ektorp.client.SSLCurl;

I don't see that class in orca.ektorp.client package

Regards,
- Anirban


Replying to claris:

Is there a way to run maven with the -e switch so that I can see more details on the error?
The error does not say which symbol it can't find and shirako compiles just fine in my emulation environment. Anirban did tell me over GTalk that it does not compile in his environment so I may have added something to my environment that I can't recall.

I will look into this tomorrow morning. In the mean time can someone advice on how to obtain more details on the error it would be greatly appreciated. Thanks!

  Changed 5 years ago by anirban

Claris,

Both modes are crashing on startup. When using DAR, it gives the following exception. This might have to do with always assuming existence of "registry.url.2" property, which is not there in the most recent set of properties.

2014-08-21 17:10:43,053 [WrapperSimpleAppMain?] INFO orca - Container database created successfully
2014-08-21 17:10:43,063 [WrapperSimpleAppMain?] FATAL orca - Critical error: Orca failed to initialize
orca.shirako.container.ContainerInitializationException?: java.lang.reflect.InvocationTargetException?

at orca.shirako.container.OrcaContainer?.initialize(OrcaContainer?.java:280)
at orca.shirako.container.Globals.start(Globals.java:119)
at orca.server.OrcaServer?.startOrca(OrcaServer?.java:62)
at orca.server.OrcaServer?.start(OrcaServer?.java:114)
at orca.server.OrcaServer?.main(OrcaServer?.java:165)
at sun.reflect.NativeMethodAccessorImpl?.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl?.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl?.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.tanukisoftware.wrapper.WrapperSimpleApp?.run(WrapperSimpleApp?.java:240)
at java.lang.Thread.run(Unknown Source)

Caused by: java.lang.reflect.InvocationTargetException?

at sun.reflect.NativeMethodAccessorImpl?.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl?.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl?.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at orca.shirako.container.OrcaContainer?.initialize(OrcaContainer?.java:243)
... 10 more

Caused by: java.lang.NullPointerException?

at orca.shirako.container.DistributedRemoteRegistryCache?.configureSSL(DistributedRemoteRegistryCache?.java:662)
... 15 more

2014-08-21 17:10:43,068 [Thread-0] INFO orca - Jetty shutting down. Destroying Orca context

When DAR is disabled, it is throwing

2014-08-21 17:05:45,750 [WrapperSimpleAppMain?] FATAL orca - Critical error: Orca failed to initialize
orca.shirako.container.ContainerInitializationException?: java.lang.reflect.InvocationTargetException?

at orca.shirako.container.OrcaContainer?.initialize(OrcaContainer?.java:280)
at orca.shirako.container.Globals.start(Globals.java:119)
at orca.server.OrcaServer?.startOrca(OrcaServer?.java:62)
at orca.server.OrcaServer?.start(OrcaServer?.java:114)
at orca.server.OrcaServer?.main(OrcaServer?.java:165)
at sun.reflect.NativeMethodAccessorImpl?.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl?.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl?.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.tanukisoftware.wrapper.WrapperSimpleApp?.run(WrapperSimpleApp?.java:240)
at java.lang.Thread.run(Unknown Source)

Caused by: java.lang.reflect.InvocationTargetException?

at sun.reflect.NativeMethodAccessorImpl?.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl?.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl?.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at orca.shirako.container.OrcaContainer?.initialize(OrcaContainer?.java:250)
... 10 more

Caused by: java.lang.NullPointerException?

at orca.util.ssl.ContextualSSLProtocolSocketFactory$HostPortPair?.equals(ContextualSSLProtocolSocketFactory.java:99)
at java.util.HashMap?.getEntry(Unknown Source)
at java.util.HashMap?.containsKey(Unknown Source)
at orca.util.ssl.ContextualSSLProtocolSocketFactory.getSSLContext(ContextualSSLProtocolSocketFactory.java:164)
at orca.util.ssl.ContextualSSLProtocolSocketFactory.createSocket(ContextualSSLProtocolSocketFactory.java:204)
at org.apache.commons.httpclient.HttpConnection?.open(HttpConnection?.java:707)
at org.apache.commons.httpclient.HttpMethodDirector?.executeWithRetry(HttpMethodDirector?.java:387)
at org.apache.commons.httpclient.HttpMethodDirector?.executeMethod(HttpMethodDirector?.java:171)
at org.apache.commons.httpclient.HttpClient?.executeMethod(HttpClient?.java:397)
at org.apache.commons.httpclient.HttpClient?.executeMethod(HttpClient?.java:323)
at org.apache.xmlrpc.client.XmlRpcCommonsTransport?.writeRequest(XmlRpcCommonsTransport?.java:227)
at org.apache.xmlrpc.client.XmlRpcStreamTransport?.sendRequest(XmlRpcStreamTransport?.java:151)
at org.apache.xmlrpc.client.XmlRpcHttpTransport?.sendRequest(XmlRpcHttpTransport?.java:143)
at org.apache.xmlrpc.client.XmlRpcClientWorker?.execute(XmlRpcClientWorker?.java:56)
at org.apache.xmlrpc.client.XmlRpcClient?.execute(XmlRpcClient?.java:167)
at org.apache.xmlrpc.client.XmlRpcClient?.execute(XmlRpcClient?.java:158)
at org.apache.xmlrpc.client.XmlRpcClient?.execute(XmlRpcClient?.java:147)
at orca.shirako.container.RemoteRegistryCache?.singleQuery(RemoteRegistryCache?.java:271)
... 15 more

Here is what I used with DAR enabled:

#registry.certfingerprint=78:B6:1A:F0:6C:F8:C7:0F:C0:05:10:13:06:79:E0:AC
#registry.url=https://geni.renci.ben:12443/registry/
#registry.method=registryService.insert
#registry.class=orca.shirako.container.RemoteRegistryCache?

registry.url.1=https://wsu-hn.exogeni.net,https://tamu-hn.exogeni.net
registry.certfingerprint.1=d1:b9:7f:a9:e2:29:89:8d:1a:7c:c3:f2:df:ba:b6:26
registry.couchdb.username=actor-admin
registry.couchdb.password=aktorpazz
registry.class=orca.shirako.container.DistributedRemoteRegistryCache?

Here is what I used with DAR disabled

registry.certfingerprint=78:B6:1A:F0:6C:F8:C7:0F:C0:05:10:13:06:79:E0:AC
registry.url=https://geni.renci.ben:12443/registry/
registry.method=registryService.insert
registry.class=orca.shirako.container.RemoteRegistryCache?

#registry.url.1=https://wsu-hn.exogeni.net,https://tamu-hn.exogeni.net
#registry.certfingerprint.1=d1:b9:7f:a9:e2:29:89:8d:1a:7c:c3:f2:df:ba:b6:26
#registry.couchdb.username=actor-admin
#registry.couchdb.password=aktorpazz
#registry.class=orca.shirako.container.DistributedRemoteRegistryCache?

Please fix this as soon as possible.

Regards,
- Anirban

follow-up: ↓ 47   Changed 5 years ago by ibaldin

I added a new python script under controllers/xmlrpc/resources/scripts/getSliverProperties.py that exercises the new controller call to get the unit properties (including boot console) of a reservation.

Paul - you should be able to now test that boot console is delivered. I will work on adding it to Flukes as well, but this should be enough.

I tested the script and it works with the latest controller code (I have not brought in Claris's changes into my local copy).

  Changed 5 years ago by anirban

The following configuration for actor registry is working at UNC. Keeping a note of the properties, just in case.

###############################################
# ORCA global actor registry (uncomment for production deployments)
###############################################
registry.certfingerprint=78:B6:1A:F0:6C:F8:C7:0F:C0:05:10:13:06:79:E0:AC
registry.url=https://geni.renci.ben:12443/registry/
registry.method=registryService.insert
registry.class=orca.shirako.container.RemoteRegistryCache?

###############################################
# ORCA global distributed actor registry (couchDB)
###############################################
registry.url.1=https://wsu-hn.exogeni.net,https://tamu-hn.exogeni.net
registry.url.2=https://slookup.exogeni.net
registry.certfingerprint.1=d1:b9:7f:a9:e2:29:89:8d:1a:7c:c3:f2:df:ba:b6:26
registry.certfingerprint.2=df:5c:1d:99:46:9a:5f:a8:92:8e:15:e4:b9:82:d8:ad
registry.couchdb.username=actor-admin
registry.couchdb.password=aktorpazz
registry.class=orca.shirako.container.RemoteRegistryCache?

in reply to: ↑ 45   Changed 5 years ago by pruth

Replying to ibaldin:

I added a new python script under controllers/xmlrpc/resources/scripts/getSliverProperties.py that exercises the new controller call to get the unit properties (including boot console) of a reservation.

Paul - you should be able to now test that boot console is delivered. I will work on adding it to Flukes as well, but this should be enough.

I tested the script and it works with the latest controller code (I have not brought in Claris's changes into my local copy).

I think there is something wrong with all of the python scripts. Anirban and I both tried them with the exosm controller and on unc.

This is the error we get:

dhcp152-54-6-124:trunk pruth$ python ./controllers/xmlrpc/resources/scripts/listResources.py -s https://unc-hn.unc.ben:11443/orca/xmlrpc -c ~/Desktop/renci/renci.crt -p ~/Desktop/renci/renci.key
Querying ORCA xml-rpc server for available resources ...

Traceback (most recent call last):

File "./controllers/xmlrpc/resources/scripts/listResources.py", line 49, in <module>

result = server.orca.listResources(credentials, options)

File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xmlrpclib.py", line 1224, in call

return self.send(self.name, args)

File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xmlrpclib.py", line 1578, in request

verbose=self.verbose

File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xmlrpclib.py", line 1264, in request

return self.single_request(host, handler, request_body, verbose)

File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xmlrpclib.py", line 1284, in single_request

h = self.make_connection(host)

File "./controllers/xmlrpc/resources/scripts/listResources.py", line 28, in make_connection

return xmlrpclib.SafeTransport?.make_connection(self,host_with_cert)

File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xmlrpclib.py", line 1484, in make_connection

if self._connection and host == self._connection[0]:

AttributeError?: SafeTransportWithCert? instance has no attribute '_connection'

  Changed 5 years ago by ibaldin

This is a python versioning issue. Make sure Python 2.6 is available and is called from !#/bin line of the script.

follow-up: ↓ 58   Changed 5 years ago by anirban

Issue with synchronization between controller and SM:

I tested whether controller and SM are going out of sync in terms of reservation states when we go through the following steps.

-1: clean-restart everything
0. Submit a request, and let it go to ticketed when I query manifest.
1. Restart SM.
2. Restart controller.
3. Query for manifest again.

When I query for the manifest again, it still shows "Ticketed" state. But in Pequod, both SM and AM show it "Active".

pequod:show>show reservations for 8b31b4f3-45cb-4cb7-9261-49fba026eaa2 actor unc-vm-am
83f59ef3-05ca-4e59-854a-ac9055e33867 unc-vm-am

Slice: 8b31b4f3-45cb-4cb7-9261-49fba026eaa2
1 uncvmsite.vm [ active, nascent]
Notices: Reservation 83f59ef3-05ca-4e59-854a-ac9055e33867 (Slice test-ani-1) is in state [Active,None]
Start: Fri Aug 22 14:35:11 EDT 2014 End:Sat Aug 23 14:35:12 EDT 2014

Total: 1 reservations
pequod:show>show reservations for 8b31b4f3-45cb-4cb7-9261-49fba026eaa2 actor unc-sm
83f59ef3-05ca-4e59-854a-ac9055e33867 unc-sm

Slice: 8b31b4f3-45cb-4cb7-9261-49fba026eaa2
1 uncvmsite.vm [ active, nascent]
Notices: Reservation 83f59ef3-05ca-4e59-854a-ac9055e33867 (Slice test-ani-1) is in state [Active,None]
Start: Fri Aug 22 14:35:11 EDT 2014 End:Sat Aug 23 14:35:12 EDT 2014

Total: 1 reservations

  Changed 5 years ago by ibaldin

I'm hoping Yufeng can answer this - I *think* it may be somewhere in manifest formation - almost sounds like controller does not requery the SM for state updates...

follow-up: ↓ 52   Changed 5 years ago by ibaldin

There is a new version of flukes in the beta location

http://geni-images.renci.org/webstart/0.4-SNAPSHOT/flukes.jnlp

It supports (in manifest view) querying for reservation properties (including, hopefully, the boot console) on the nodes.

in reply to: ↑ 51   Changed 5 years ago by pruth

Replying to ibaldin:

There is a new version of flukes in the beta location

http://geni-images.renci.org/webstart/0.4-SNAPSHOT/flukes.jnlp

It supports (in manifest view) querying for reservation properties (including, hopefully, the boot console) on the nodes.

I can get the console log from active vms using reservation properties. Failed VMs cannot get reservation properties but the console log is in the message box.

Paul

follow-up: ↓ 54   Changed 5 years ago by ibaldin

Is this the intended behavior? Sorry it wasn't clear if you are reporting an expected result or a problem.

in reply to: ↑ 53   Changed 5 years ago by pruth

Replying to ibaldin:

Is this the intended behavior? Sorry it wasn't clear if you are reporting an expected result or a problem.

This is the expected behavior. I suppose that having the console log in the same place regardless of status would be ideal but I understand if failed nodes can't have properties.

My only suggestion would be to have the Flukes box that contains the console log be bigger or resizable.

  Changed 5 years ago by ibaldin

I'll tweak the GUI. This is the initial cut to help you test.

follow-up: ↓ 57   Changed 5 years ago by anirban

Yufeng,

I found a couple of issues with modify with recovery.

Modifying a slice before recovery works fine. After recovering AM, SM, controller, the following happens:

1. If I want to add nodes to the nodegroup, it throws NP exceptions and the logic also thinks that nothing needs to be added.

INFO | jvm 1 | 2014/08/23 21:18:02 | 2014-08-23 21:18:02,448 [qtp693536729-34 - /orca/xmlrpc] DEBUG controller.orca.controllers.xmlrpc.XmlrpcOrcaState? - This slice ID=e5ac721f-d882-44d9-a5aa-751bb97222a1 for urn=ng-n-3
INFO | jvm 1 | 2014/08/23 21:18:02 | 2014-08-23 21:18:02,541 [qtp693536729-34 - /orca/xmlrpc] INFO controller.orca.controllers.xmlrpc.OrcaXmlrpcHandler? - discoverTypes: uncvmsite/vm rt=uncvmsite.vm available units=27
INFO | jvm 1 | 2014/08/23 21:18:02 | 2014-08-23 21:18:02,541 [qtp693536729-34 - /orca/xmlrpc] DEBUG controller.orca.controllers.xmlrpc.OrcaXmlrpcHandler? - Found abstract model for resource pool: uncvmsite.vm
INFO | jvm 1 | 2014/08/23 21:18:02 | 2014-08-23 21:18:02,590 [qtp693536729-34 - /orca/xmlrpc] INFO controller.orca.controllers.xmlrpc.OrcaXmlrpcHandler? - discoverTypes: uncNet/vlan rt=uncNet.vlan available units=1000
INFO | jvm 1 | 2014/08/23 21:18:02 | 2014-08-23 21:18:02,590 [qtp693536729-34 - /orca/xmlrpc] DEBUG controller.orca.controllers.xmlrpc.OrcaXmlrpcHandler? - Found abstract model for resource pool: uncNet.vlan
INFO | jvm 1 | 2014/08/23 21:18:02 | 2014-08-23 21:18:02,638 [qtp693536729-34 - /orca/xmlrpc] INFO controller.orca.controllers.xmlrpc.OrcaXmlrpcHandler? - discoverTypes: uncvmsite/vlan rt=uncvmsite.vlan available units=999
INFO | jvm 1 | 2014/08/23 21:18:02 | 2014-08-23 21:18:02,638 [qtp693536729-34 - /orca/xmlrpc] DEBUG controller.orca.controllers.xmlrpc.OrcaXmlrpcHandler? - Found abstract model for resource pool: uncvmsite.vlan
INFO | jvm 1 | 2014/08/23 21:18:02 | See modify http://geni-orca.renci.org/owl/modify/bba8825a-2043-4f0f-bf75-fb99b05563e9#a8430039-3543-440b-9c4f-47e618599b08# by name my-modify
INFO | jvm 1 | 2014/08/23 21:18:02 | See modify element http://geni-orca.renci.org/owl/modify/bba8825a-2043-4f0f-bf75-fb99b05563e9#a8430039-3543-440b-9c4f-47e618599b08#modifyElement/79aaed14-0f42-4c78-946c-a0646b20a35d with subject http://geni-orca.renci.org/owl/91232496-85b8-4adb-b9d4-48c4af4299ec#NodeGroup0 of type INCREASE
INFO | jvm 1 | 2014/08/23 21:18:02 | java.lang.NullPointerException?
INFO | jvm 1 | 2014/08/23 21:18:02 | at orca.embed.cloudembed.MappingHandler?.getIPRange(MappingHandler?.java:196)
INFO | jvm 1 | 2014/08/23 21:18:02 | at orca.embed.cloudembed.controller.ModifyHandler?.addElements(ModifyHandler?.java:163)
INFO | jvm 1 | 2014/08/23 21:18:02 | at orca.embed.cloudembed.controller.ModifyHandler?.modifySlice(ModifyHandler?.java:107)
INFO | jvm 1 | 2014/08/23 21:18:02 | at orca.embed.workflow.RequestWorkflow?.modify(RequestWorkflow?.java:162)
INFO | jvm 1 | 2014/08/23 21:18:02 | at orca.controllers.xmlrpc.OrcaXmlrpcHandler?.modifySlice(OrcaXmlrpcHandler?.java:583)
INFO | jvm 1 | 2014/08/23 21:18:02 | at sun.reflect.NativeMethodAccessorImpl?.invoke0(Native Method)
INFO | jvm 1 | 2014/08/23 21:18:02 | at sun.reflect.NativeMethodAccessorImpl?.invoke(Unknown Source)
INFO | jvm 1 | 2014/08/23 21:18:02 | at sun.reflect.DelegatingMethodAccessorImpl?.invoke(Unknown Source)
INFO | jvm 1 | 2014/08/23 21:18:02 | at java.lang.reflect.Method.invoke(Unknown Source)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.apache.xmlrpc.server.ReflectiveXmlRpcHandler?.invoke(ReflectiveXmlRpcHandler?.java:115)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.apache.xmlrpc.server.ReflectiveXmlRpcHandler?.execute(ReflectiveXmlRpcHandler?.java:106)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.apache.xmlrpc.server.XmlRpcServerWorker?.execute(XmlRpcServerWorker?.java:46)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.apache.xmlrpc.server.XmlRpcServer?.execute(XmlRpcServer?.java:86)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.apache.xmlrpc.server.XmlRpcStreamServer?.execute(XmlRpcStreamServer?.java:200)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.apache.xmlrpc.webserver.XmlRpcServletServer?.execute(XmlRpcServletServer?.java:112)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.apache.xmlrpc.webserver.XmlRpcServlet?.doPost(XmlRpcServlet?.java:196)
INFO | jvm 1 | 2014/08/23 21:18:02 | at orca.controllers.OrcaXmlrpcServlet?.doPost(OrcaXmlrpcServlet?.java:151)
INFO | jvm 1 | 2014/08/23 21:18:02 | at javax.servlet.http.HttpServlet?.service(HttpServlet?.java:727)
INFO | jvm 1 | 2014/08/23 21:18:02 | at javax.servlet.http.HttpServlet?.service(HttpServlet?.java:820)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.servlet.ServletHolder?.handle(ServletHolder?.java:527)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.servlet.ServletHandler?.doHandle(ServletHandler?.java:423)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.server.handler.ScopedHandler?.handle(ScopedHandler?.java:119)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.security.SecurityHandler?.handle(SecurityHandler?.java:493)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.server.handler.ContextHandler?.doHandle(ContextHandler?.java:926)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.servlet.ServletHandler?.doScope(ServletHandler?.java:358)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.server.handler.ContextHandler?.doScope(ContextHandler?.java:860)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.server.handler.ScopedHandler?.handle(ScopedHandler?.java:117)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.server.handler.HandlerWrapper?.handle(HandlerWrapper?.java:113)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.server.Server.handle(Server.java:331)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.server.HttpConnection?.handleRequest(HttpConnection?.java:588)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.server.HttpConnection?$RequestHandler?.content(HttpConnection?.java:1046)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.http.HttpParser?.parseNext(HttpParser?.java:764)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.http.HttpParser?.parseAvailable(HttpParser?.java:217)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.server.HttpConnection?.handle(HttpConnection?.java:418)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.io.nio.SelectChannelEndPoint?.run(SelectChannelEndPoint?.java:476)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.util.thread.QueuedThreadPool?$2.run(QueuedThreadPool?.java:436)
INFO | jvm 1 | 2014/08/23 21:18:02 | at java.lang.Thread.run(Unknown Source)
INFO | jvm 1 | 2014/08/23 21:18:02 | java.lang.NullPointerException?
INFO | jvm 1 | 2014/08/23 21:18:02 | at orca.embed.cloudembed.MappingHandler?.findIPRangeHole(MappingHandler?.java:141)
INFO | jvm 1 | 2014/08/23 21:18:02 | at orca.embed.cloudembed.controller.ModifyHandler?.addElements(ModifyHandler?.java:185)
INFO | jvm 1 | 2014/08/23 21:18:02 | at orca.embed.cloudembed.controller.ModifyHandler?.modifySlice(ModifyHandler?.java:107)
INFO | jvm 1 | 2014/08/23 21:18:02 | at orca.embed.workflow.RequestWorkflow?.modify(RequestWorkflow?.java:162)
INFO | jvm 1 | 2014/08/23 21:18:02 | at orca.controllers.xmlrpc.OrcaXmlrpcHandler?.modifySlice(OrcaXmlrpcHandler?.java:583)
INFO | jvm 1 | 2014/08/23 21:18:02 | at sun.reflect.NativeMethodAccessorImpl?.invoke0(Native Method)
INFO | jvm 1 | 2014/08/23 21:18:02 | at sun.reflect.NativeMethodAccessorImpl?.invoke(Unknown Source)
INFO | jvm 1 | 2014/08/23 21:18:02 | at sun.reflect.DelegatingMethodAccessorImpl?.invoke(Unknown Source)
INFO | jvm 1 | 2014/08/23 21:18:02 | at java.lang.reflect.Method.invoke(Unknown Source)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.apache.xmlrpc.server.ReflectiveXmlRpcHandler?.invoke(ReflectiveXmlRpcHandler?.java:115)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.apache.xmlrpc.server.ReflectiveXmlRpcHandler?.execute(ReflectiveXmlRpcHandler?.java:106)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.apache.xmlrpc.server.XmlRpcServerWorker?.execute(XmlRpcServerWorker?.java:46)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.apache.xmlrpc.server.XmlRpcServer?.execute(XmlRpcServer?.java:86)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.apache.xmlrpc.server.XmlRpcStreamServer?.execute(XmlRpcStreamServer?.java:200)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.apache.xmlrpc.webserver.XmlRpcServletServer?.execute(XmlRpcServletServer?.java:112)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.apache.xmlrpc.webserver.XmlRpcServlet?.doPost(XmlRpcServlet?.java:196)
INFO | jvm 1 | 2014/08/23 21:18:02 | at orca.controllers.OrcaXmlrpcServlet?.doPost(OrcaXmlrpcServlet?.java:151)
INFO | jvm 1 | 2014/08/23 21:18:02 | at javax.servlet.http.HttpServlet?.service(HttpServlet?.java:727)
INFO | jvm 1 | 2014/08/23 21:18:02 | at javax.servlet.http.HttpServlet?.service(HttpServlet?.java:820)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.servlet.ServletHolder?.handle(ServletHolder?.java:527)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.servlet.ServletHandler?.doHandle(ServletHandler?.java:423)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.server.handler.ScopedHandler?.handle(ScopedHandler?.java:119)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.security.SecurityHandler?.handle(SecurityHandler?.java:493)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.server.handler.ContextHandler?.doHandle(ContextHandler?.java:926)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.servlet.ServletHandler?.doScope(ServletHandler?.java:358)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.server.handler.ContextHandler?.doScope(ContextHandler?.java:860)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.server.handler.ScopedHandler?.handle(ScopedHandler?.java:117)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.server.handler.HandlerWrapper?.handle(HandlerWrapper?.java:113)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.server.Server.handle(Server.java:331)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.server.HttpConnection?.handleRequest(HttpConnection?.java:588)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.server.HttpConnection?$RequestHandler?.content(HttpConnection?.java:1046)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.http.HttpParser?.parseNext(HttpParser?.java:764)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.http.HttpParser?.parseAvailable(HttpParser?.java:217)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.server.HttpConnection?.handle(HttpConnection?.java:418)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.io.nio.SelectChannelEndPoint?.run(SelectChannelEndPoint?.java:476)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.util.thread.QueuedThreadPool?$2.run(QueuedThreadPool?.java:436)
INFO | jvm 1 | 2014/08/23 21:18:02 | at java.lang.Thread.run(Unknown Source)
INFO | jvm 1 | 2014/08/23 21:18:02 | 2014-08-23 21:18:02,689 [qtp693536729-34 - /orca/xmlrpc] INFO controller.orca.controllers.xmlrpc.OrcaXmlrpcHandler? - ORCA API modifySlice() started...
INFO | jvm 1 | 2014/08/23 21:18:02 | 2014-08-23 21:18:02,808 [qtp693536729-34 - /orca/xmlrpc] DEBUG controller.orca.controllers.xmlrpc.XmlrpcOrcaState? - This slice ID=e5ac721f-d882-44d9-a5aa-751bb97222a1 for urn=ng-n-3
INFO | jvm 1 | 2014/08/23 21:18:02 | d=:http://geni-orca.renci.org/owl/uncvmsite.rdf#uncvmsite/Domain/vm;resourceType=vm:1:null:http://geni-orca.renci.org/owl/uncvmsite.rdf#uncvmsite/Domain/vm/vm
INFO | jvm 1 | 2014/08/23 21:18:02 | d=:http://geni-orca.renci.org/owl/uncvmsite.rdf#uncvmsite/Domain/vm;resourceType=vm:1:null:http://geni-orca.renci.org/owl/uncvmsite.rdf#uncvmsite/Domain/vm
INFO | jvm 1 | 2014/08/23 21:18:02 | d=:http://geni-orca.renci.org/owl/uncvmsite.rdf#uncvmsite/Domain/vm;resourceType=vm:1:null:http://geni-orca.renci.org/owl/uncvmsite.rdf#uncvmsite/Domain/vm
INFO | jvm 1 | 2014/08/23 21:18:02 | d=:http://geni-orca.renci.org/owl/uncvmsite.rdf#uncvmsite/Domain/vm;resourceType=vm:1:null:http://geni-orca.renci.org/owl/uncvmsite.rdf#uncvmsite/Domain/vm/vm
INFO | jvm 1 | 2014/08/23 21:18:02 | d=:http://geni-orca.renci.org/owl/91232496-85b8-4adb-b9d4-48c4af4299ec#Link31;resourceType=vlan:1:null:http://geni-orca.renci.org/owl/uncvmsite.rdf#uncvmsite/Domain/vlan
INFO | jvm 1 | 2014/08/23 21:18:02 | 2014-08-23 21:18:02,822 [qtp693536729-34 - /orca/xmlrpc] DEBUG controller.orca.controllers.xmlrpc.OrcaXmlrpcHandler? - No removed reservations in slice with urn ng-n-3 sliceId e5ac721f-d882-44d9-a5aa-751bb97222a1
INFO | jvm 1 | 2014/08/23 21:18:02 | 2014-08-23 21:18:02,822 [qtp693536729-34 - /orca/xmlrpc] DEBUG controller.orca.controllers.xmlrpc.OrcaXmlrpcHandler? - No added reservations in slice with urn ng-n-3 sliceId e5ac721f-d882-44d9-a5aa-751bb97222a1

2. If I tried to remove a couple of nodes from a nodegroup, nothing happens. The manifest remains same as before. This is what I see in the logs. There are no exceptions. It seems that the logic thinks that there are no new reservations to add or remove.

INFO | jvm 1 | 2014/08/23 21:16:19 | 2014-08-23 21:16:19,133 [qtp693536729-28 - /orca/xmlrpc] DEBUG controller.orca.controllers.xmlrpc.XmlrpcOrcaState? - This slice ID=e5ac721f-d882-44d9-a5aa-751bb97222a1 for urn=ng-n-3
DEBUG | wrapperp | 2014/08/23 21:16:19 | send a packet PING : ping
INFO | jvm 1 | 2014/08/23 21:16:19 | Received a packet PING : ping
INFO | jvm 1 | 2014/08/23 21:16:19 | Send a packet PING : ok
DEBUG | wrapperp | 2014/08/23 21:16:19 | read a packet PING : ok
DEBUG | wrapper | 2014/08/23 21:16:19 | Got ping response from JVM
INFO | jvm 1 | 2014/08/23 21:16:19 | 2014-08-23 21:16:19,815 [qtp693536729-28 - /orca/xmlrpc] INFO controller.orca.controllers.xmlrpc.OrcaXmlrpcHandler? - discoverTypes: uncvmsite/vm rt=uncvmsite.vm available units=27
INFO | jvm 1 | 2014/08/23 21:16:19 | 2014-08-23 21:16:19,816 [qtp693536729-28 - /orca/xmlrpc] DEBUG controller.orca.controllers.xmlrpc.OrcaXmlrpcHandler? - Found abstract model for resource pool: uncvmsite.vm
INFO | jvm 1 | 2014/08/23 21:16:20 | 2014-08-23 21:16:19,911 [qtp693536729-28 - /orca/xmlrpc] INFO controller.orca.controllers.xmlrpc.OrcaXmlrpcHandler? - discoverTypes: uncNet/vlan rt=uncNet.vlan available units=1000
INFO | jvm 1 | 2014/08/23 21:16:20 | 2014-08-23 21:16:19,911 [qtp693536729-28 - /orca/xmlrpc] DEBUG controller.orca.controllers.xmlrpc.OrcaXmlrpcHandler? - Found abstract model for resource pool: uncNet.vlan
INFO | jvm 1 | 2014/08/23 21:16:20 | 2014-08-23 21:16:19,974 [qtp693536729-28 - /orca/xmlrpc] INFO controller.orca.controllers.xmlrpc.OrcaXmlrpcHandler? - discoverTypes: uncvmsite/vlan rt=uncvmsite.vlan available units=999
INFO | jvm 1 | 2014/08/23 21:16:20 | 2014-08-23 21:16:19,975 [qtp693536729-28 - /orca/xmlrpc] DEBUG controller.orca.controllers.xmlrpc.OrcaXmlrpcHandler? - Found abstract model for resource pool: uncvmsite.vlan
INFO | jvm 1 | 2014/08/23 21:16:20 | See modify http://geni-orca.renci.org/owl/modify/b607b2e2-9e5d-490a-9215-3ccb52fa2b3d#e3ab161c-4a8c-48ca-a2f5-87626a50894f# by name my-modify
INFO | jvm 1 | 2014/08/23 21:16:20 | See modify element http://geni-orca.renci.org/owl/modify/b607b2e2-9e5d-490a-9215-3ccb52fa2b3d#e3ab161c-4a8c-48ca-a2f5-87626a50894f#modifyElement/4f1e5fa7-0d26-4620-8ff1-45854c700259 with subject http://geni-orca.renci.org/owl/91232496-85b8-4adb-b9d4-48c4af4299ec#NodeGroup0 of type REMOVE
INFO | jvm 1 | 2014/08/23 21:16:20 | See modify element http://geni-orca.renci.org/owl/modify/b607b2e2-9e5d-490a-9215-3ccb52fa2b3d#e3ab161c-4a8c-48ca-a2f5-87626a50894f#modifyElement/0e013d1d-92f6-447b-b244-fa461251c305 with subject http://geni-orca.renci.org/owl/91232496-85b8-4adb-b9d4-48c4af4299ec#NodeGroup0 of type REMOVE
INFO | jvm 1 | 2014/08/23 21:16:20 | 2014-08-23 21:16:20,033 [qtp693536729-28 - /orca/xmlrpc] INFO controller.orca.controllers.xmlrpc.OrcaXmlrpcHandler? - ORCA API modifySlice() started...
INFO | jvm 1 | 2014/08/23 21:16:20 | 2014-08-23 21:16:20,149 [qtp693536729-28 - /orca/xmlrpc] DEBUG controller.orca.controllers.xmlrpc.XmlrpcOrcaState? - This slice ID=e5ac721f-d882-44d9-a5aa-751bb97222a1 for urn=ng-n-3
INFO | jvm 1 | 2014/08/23 21:16:20 | d=:http://geni-orca.renci.org/owl/uncvmsite.rdf#uncvmsite/Domain/vm;resourceType=vm:1:null:http://geni-orca.renci.org/owl/uncvmsite.rdf#uncvmsite/Domain/vm/vm
INFO | jvm 1 | 2014/08/23 21:16:20 | d=:http://geni-orca.renci.org/owl/uncvmsite.rdf#uncvmsite/Domain/vm;resourceType=vm:1:null:http://geni-orca.renci.org/owl/uncvmsite.rdf#uncvmsite/Domain/vm
INFO | jvm 1 | 2014/08/23 21:16:20 | d=:http://geni-orca.renci.org/owl/uncvmsite.rdf#uncvmsite/Domain/vm;resourceType=vm:1:null:http://geni-orca.renci.org/owl/uncvmsite.rdf#uncvmsite/Domain/vm
INFO | jvm 1 | 2014/08/23 21:16:20 | d=:http://geni-orca.renci.org/owl/uncvmsite.rdf#uncvmsite/Domain/vm;resourceType=vm:1:null:http://geni-orca.renci.org/owl/uncvmsite.rdf#uncvmsite/Domain/vm/vm
INFO | jvm 1 | 2014/08/23 21:16:20 | d=:http://geni-orca.renci.org/owl/91232496-85b8-4adb-b9d4-48c4af4299ec#Link31;resourceType=vlan:1:null:http://geni-orca.renci.org/owl/uncvmsite.rdf#uncvmsite/Domain/vlan
INFO | jvm 1 | 2014/08/23 21:16:20 | 2014-08-23 21:16:20,162 [qtp693536729-28 - /orca/xmlrpc] DEBUG controller.orca.controllers.xmlrpc.OrcaXmlrpcHandler? - No removed reservations in slice with urn ng-n-3 sliceId e5ac721f-d882-44d9-a5aa-751bb97222a1
INFO | jvm 1 | 2014/08/23 21:16:20 | 2014-08-23 21:16:20,163 [qtp693536729-28 - /orca/xmlrpc] DEBUG controller.orca.controllers.xmlrpc.OrcaXmlrpcHandler? - No added reservations in slice with urn ng-n-3 sliceId e5ac721f-d882-44d9-a5aa-751bb97222a1

Can you please check.

in reply to: ↑ 56   Changed 5 years ago by anirban

Creating a new ticket for this: #351

Replying to anirban:

Yufeng,

I found a couple of issues with modify with recovery.

Modifying a slice before recovery works fine. After recovering AM, SM, controller, the following happens:

1. If I want to add nodes to the nodegroup, it throws NP exceptions and the logic also thinks that nothing needs to be added.

INFO | jvm 1 | 2014/08/23 21:18:02 | 2014-08-23 21:18:02,448 [qtp693536729-34 - /orca/xmlrpc] DEBUG controller.orca.controllers.xmlrpc.XmlrpcOrcaState? - This slice ID=e5ac721f-d882-44d9-a5aa-751bb97222a1 for urn=ng-n-3
INFO | jvm 1 | 2014/08/23 21:18:02 | 2014-08-23 21:18:02,541 [qtp693536729-34 - /orca/xmlrpc] INFO controller.orca.controllers.xmlrpc.OrcaXmlrpcHandler? - discoverTypes: uncvmsite/vm rt=uncvmsite.vm available units=27
INFO | jvm 1 | 2014/08/23 21:18:02 | 2014-08-23 21:18:02,541 [qtp693536729-34 - /orca/xmlrpc] DEBUG controller.orca.controllers.xmlrpc.OrcaXmlrpcHandler? - Found abstract model for resource pool: uncvmsite.vm
INFO | jvm 1 | 2014/08/23 21:18:02 | 2014-08-23 21:18:02,590 [qtp693536729-34 - /orca/xmlrpc] INFO controller.orca.controllers.xmlrpc.OrcaXmlrpcHandler? - discoverTypes: uncNet/vlan rt=uncNet.vlan available units=1000
INFO | jvm 1 | 2014/08/23 21:18:02 | 2014-08-23 21:18:02,590 [qtp693536729-34 - /orca/xmlrpc] DEBUG controller.orca.controllers.xmlrpc.OrcaXmlrpcHandler? - Found abstract model for resource pool: uncNet.vlan
INFO | jvm 1 | 2014/08/23 21:18:02 | 2014-08-23 21:18:02,638 [qtp693536729-34 - /orca/xmlrpc] INFO controller.orca.controllers.xmlrpc.OrcaXmlrpcHandler? - discoverTypes: uncvmsite/vlan rt=uncvmsite.vlan available units=999
INFO | jvm 1 | 2014/08/23 21:18:02 | 2014-08-23 21:18:02,638 [qtp693536729-34 - /orca/xmlrpc] DEBUG controller.orca.controllers.xmlrpc.OrcaXmlrpcHandler? - Found abstract model for resource pool: uncvmsite.vlan
INFO | jvm 1 | 2014/08/23 21:18:02 | See modify http://geni-orca.renci.org/owl/modify/bba8825a-2043-4f0f-bf75-fb99b05563e9#a8430039-3543-440b-9c4f-47e618599b08# by name my-modify
INFO | jvm 1 | 2014/08/23 21:18:02 | See modify element http://geni-orca.renci.org/owl/modify/bba8825a-2043-4f0f-bf75-fb99b05563e9#a8430039-3543-440b-9c4f-47e618599b08#modifyElement/79aaed14-0f42-4c78-946c-a0646b20a35d with subject http://geni-orca.renci.org/owl/91232496-85b8-4adb-b9d4-48c4af4299ec#NodeGroup0 of type INCREASE
INFO | jvm 1 | 2014/08/23 21:18:02 | java.lang.NullPointerException?
INFO | jvm 1 | 2014/08/23 21:18:02 | at orca.embed.cloudembed.MappingHandler?.getIPRange(MappingHandler?.java:196)
INFO | jvm 1 | 2014/08/23 21:18:02 | at orca.embed.cloudembed.controller.ModifyHandler?.addElements(ModifyHandler?.java:163)
INFO | jvm 1 | 2014/08/23 21:18:02 | at orca.embed.cloudembed.controller.ModifyHandler?.modifySlice(ModifyHandler?.java:107)
INFO | jvm 1 | 2014/08/23 21:18:02 | at orca.embed.workflow.RequestWorkflow?.modify(RequestWorkflow?.java:162)
INFO | jvm 1 | 2014/08/23 21:18:02 | at orca.controllers.xmlrpc.OrcaXmlrpcHandler?.modifySlice(OrcaXmlrpcHandler?.java:583)
INFO | jvm 1 | 2014/08/23 21:18:02 | at sun.reflect.NativeMethodAccessorImpl?.invoke0(Native Method)
INFO | jvm 1 | 2014/08/23 21:18:02 | at sun.reflect.NativeMethodAccessorImpl?.invoke(Unknown Source)
INFO | jvm 1 | 2014/08/23 21:18:02 | at sun.reflect.DelegatingMethodAccessorImpl?.invoke(Unknown Source)
INFO | jvm 1 | 2014/08/23 21:18:02 | at java.lang.reflect.Method.invoke(Unknown Source)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.apache.xmlrpc.server.ReflectiveXmlRpcHandler?.invoke(ReflectiveXmlRpcHandler?.java:115)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.apache.xmlrpc.server.ReflectiveXmlRpcHandler?.execute(ReflectiveXmlRpcHandler?.java:106)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.apache.xmlrpc.server.XmlRpcServerWorker?.execute(XmlRpcServerWorker?.java:46)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.apache.xmlrpc.server.XmlRpcServer?.execute(XmlRpcServer?.java:86)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.apache.xmlrpc.server.XmlRpcStreamServer?.execute(XmlRpcStreamServer?.java:200)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.apache.xmlrpc.webserver.XmlRpcServletServer?.execute(XmlRpcServletServer?.java:112)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.apache.xmlrpc.webserver.XmlRpcServlet?.doPost(XmlRpcServlet?.java:196)
INFO | jvm 1 | 2014/08/23 21:18:02 | at orca.controllers.OrcaXmlrpcServlet?.doPost(OrcaXmlrpcServlet?.java:151)
INFO | jvm 1 | 2014/08/23 21:18:02 | at javax.servlet.http.HttpServlet?.service(HttpServlet?.java:727)
INFO | jvm 1 | 2014/08/23 21:18:02 | at javax.servlet.http.HttpServlet?.service(HttpServlet?.java:820)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.servlet.ServletHolder?.handle(ServletHolder?.java:527)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.servlet.ServletHandler?.doHandle(ServletHandler?.java:423)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.server.handler.ScopedHandler?.handle(ScopedHandler?.java:119)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.security.SecurityHandler?.handle(SecurityHandler?.java:493)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.server.handler.ContextHandler?.doHandle(ContextHandler?.java:926)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.servlet.ServletHandler?.doScope(ServletHandler?.java:358)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.server.handler.ContextHandler?.doScope(ContextHandler?.java:860)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.server.handler.ScopedHandler?.handle(ScopedHandler?.java:117)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.server.handler.HandlerWrapper?.handle(HandlerWrapper?.java:113)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.server.Server.handle(Server.java:331)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.server.HttpConnection?.handleRequest(HttpConnection?.java:588)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.server.HttpConnection?$RequestHandler?.content(HttpConnection?.java:1046)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.http.HttpParser?.parseNext(HttpParser?.java:764)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.http.HttpParser?.parseAvailable(HttpParser?.java:217)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.server.HttpConnection?.handle(HttpConnection?.java:418)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.io.nio.SelectChannelEndPoint?.run(SelectChannelEndPoint?.java:476)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.util.thread.QueuedThreadPool?$2.run(QueuedThreadPool?.java:436)
INFO | jvm 1 | 2014/08/23 21:18:02 | at java.lang.Thread.run(Unknown Source)
INFO | jvm 1 | 2014/08/23 21:18:02 | java.lang.NullPointerException?
INFO | jvm 1 | 2014/08/23 21:18:02 | at orca.embed.cloudembed.MappingHandler?.findIPRangeHole(MappingHandler?.java:141)
INFO | jvm 1 | 2014/08/23 21:18:02 | at orca.embed.cloudembed.controller.ModifyHandler?.addElements(ModifyHandler?.java:185)
INFO | jvm 1 | 2014/08/23 21:18:02 | at orca.embed.cloudembed.controller.ModifyHandler?.modifySlice(ModifyHandler?.java:107)
INFO | jvm 1 | 2014/08/23 21:18:02 | at orca.embed.workflow.RequestWorkflow?.modify(RequestWorkflow?.java:162)
INFO | jvm 1 | 2014/08/23 21:18:02 | at orca.controllers.xmlrpc.OrcaXmlrpcHandler?.modifySlice(OrcaXmlrpcHandler?.java:583)
INFO | jvm 1 | 2014/08/23 21:18:02 | at sun.reflect.NativeMethodAccessorImpl?.invoke0(Native Method)
INFO | jvm 1 | 2014/08/23 21:18:02 | at sun.reflect.NativeMethodAccessorImpl?.invoke(Unknown Source)
INFO | jvm 1 | 2014/08/23 21:18:02 | at sun.reflect.DelegatingMethodAccessorImpl?.invoke(Unknown Source)
INFO | jvm 1 | 2014/08/23 21:18:02 | at java.lang.reflect.Method.invoke(Unknown Source)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.apache.xmlrpc.server.ReflectiveXmlRpcHandler?.invoke(ReflectiveXmlRpcHandler?.java:115)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.apache.xmlrpc.server.ReflectiveXmlRpcHandler?.execute(ReflectiveXmlRpcHandler?.java:106)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.apache.xmlrpc.server.XmlRpcServerWorker?.execute(XmlRpcServerWorker?.java:46)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.apache.xmlrpc.server.XmlRpcServer?.execute(XmlRpcServer?.java:86)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.apache.xmlrpc.server.XmlRpcStreamServer?.execute(XmlRpcStreamServer?.java:200)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.apache.xmlrpc.webserver.XmlRpcServletServer?.execute(XmlRpcServletServer?.java:112)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.apache.xmlrpc.webserver.XmlRpcServlet?.doPost(XmlRpcServlet?.java:196)
INFO | jvm 1 | 2014/08/23 21:18:02 | at orca.controllers.OrcaXmlrpcServlet?.doPost(OrcaXmlrpcServlet?.java:151)
INFO | jvm 1 | 2014/08/23 21:18:02 | at javax.servlet.http.HttpServlet?.service(HttpServlet?.java:727)
INFO | jvm 1 | 2014/08/23 21:18:02 | at javax.servlet.http.HttpServlet?.service(HttpServlet?.java:820)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.servlet.ServletHolder?.handle(ServletHolder?.java:527)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.servlet.ServletHandler?.doHandle(ServletHandler?.java:423)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.server.handler.ScopedHandler?.handle(ScopedHandler?.java:119)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.security.SecurityHandler?.handle(SecurityHandler?.java:493)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.server.handler.ContextHandler?.doHandle(ContextHandler?.java:926)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.servlet.ServletHandler?.doScope(ServletHandler?.java:358)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.server.handler.ContextHandler?.doScope(ContextHandler?.java:860)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.server.handler.ScopedHandler?.handle(ScopedHandler?.java:117)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.server.handler.HandlerWrapper?.handle(HandlerWrapper?.java:113)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.server.Server.handle(Server.java:331)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.server.HttpConnection?.handleRequest(HttpConnection?.java:588)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.server.HttpConnection?$RequestHandler?.content(HttpConnection?.java:1046)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.http.HttpParser?.parseNext(HttpParser?.java:764)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.http.HttpParser?.parseAvailable(HttpParser?.java:217)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.server.HttpConnection?.handle(HttpConnection?.java:418)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.io.nio.SelectChannelEndPoint?.run(SelectChannelEndPoint?.java:476)
INFO | jvm 1 | 2014/08/23 21:18:02 | at org.eclipse.jetty.util.thread.QueuedThreadPool?$2.run(QueuedThreadPool?.java:436)
INFO | jvm 1 | 2014/08/23 21:18:02 | at java.lang.Thread.run(Unknown Source)
INFO | jvm 1 | 2014/08/23 21:18:02 | 2014-08-23 21:18:02,689 [qtp693536729-34 - /orca/xmlrpc] INFO controller.orca.controllers.xmlrpc.OrcaXmlrpcHandler? - ORCA API modifySlice() started...
INFO | jvm 1 | 2014/08/23 21:18:02 | 2014-08-23 21:18:02,808 [qtp693536729-34 - /orca/xmlrpc] DEBUG controller.orca.controllers.xmlrpc.XmlrpcOrcaState? - This slice ID=e5ac721f-d882-44d9-a5aa-751bb97222a1 for urn=ng-n-3
INFO | jvm 1 | 2014/08/23 21:18:02 | d=:http://geni-orca.renci.org/owl/uncvmsite.rdf#uncvmsite/Domain/vm;resourceType=vm:1:null:http://geni-orca.renci.org/owl/uncvmsite.rdf#uncvmsite/Domain/vm/vm
INFO | jvm 1 | 2014/08/23 21:18:02 | d=:http://geni-orca.renci.org/owl/uncvmsite.rdf#uncvmsite/Domain/vm;resourceType=vm:1:null:http://geni-orca.renci.org/owl/uncvmsite.rdf#uncvmsite/Domain/vm
INFO | jvm 1 | 2014/08/23 21:18:02 | d=:http://geni-orca.renci.org/owl/uncvmsite.rdf#uncvmsite/Domain/vm;resourceType=vm:1:null:http://geni-orca.renci.org/owl/uncvmsite.rdf#uncvmsite/Domain/vm
INFO | jvm 1 | 2014/08/23 21:18:02 | d=:http://geni-orca.renci.org/owl/uncvmsite.rdf#uncvmsite/Domain/vm;resourceType=vm:1:null:http://geni-orca.renci.org/owl/uncvmsite.rdf#uncvmsite/Domain/vm/vm
INFO | jvm 1 | 2014/08/23 21:18:02 | d=:http://geni-orca.renci.org/owl/91232496-85b8-4adb-b9d4-48c4af4299ec#Link31;resourceType=vlan:1:null:http://geni-orca.renci.org/owl/uncvmsite.rdf#uncvmsite/Domain/vlan
INFO | jvm 1 | 2014/08/23 21:18:02 | 2014-08-23 21:18:02,822 [qtp693536729-34 - /orca/xmlrpc] DEBUG controller.orca.controllers.xmlrpc.OrcaXmlrpcHandler? - No removed reservations in slice with urn ng-n-3 sliceId e5ac721f-d882-44d9-a5aa-751bb97222a1
INFO | jvm 1 | 2014/08/23 21:18:02 | 2014-08-23 21:18:02,822 [qtp693536729-34 - /orca/xmlrpc] DEBUG controller.orca.controllers.xmlrpc.OrcaXmlrpcHandler? - No added reservations in slice with urn ng-n-3 sliceId e5ac721f-d882-44d9-a5aa-751bb97222a1


2. If I tried to remove a couple of nodes from a nodegroup, nothing happens. The manifest remains same as before. This is what I see in the logs. There are no exceptions. It seems that the logic thinks that there are no new reservations to add or remove.

INFO | jvm 1 | 2014/08/23 21:16:19 | 2014-08-23 21:16:19,133 [qtp693536729-28 - /orca/xmlrpc] DEBUG controller.orca.controllers.xmlrpc.XmlrpcOrcaState? - This slice ID=e5ac721f-d882-44d9-a5aa-751bb97222a1 for urn=ng-n-3
DEBUG | wrapperp | 2014/08/23 21:16:19 | send a packet PING : ping
INFO | jvm 1 | 2014/08/23 21:16:19 | Received a packet PING : ping
INFO | jvm 1 | 2014/08/23 21:16:19 | Send a packet PING : ok
DEBUG | wrapperp | 2014/08/23 21:16:19 | read a packet PING : ok
DEBUG | wrapper | 2014/08/23 21:16:19 | Got ping response from JVM
INFO | jvm 1 | 2014/08/23 21:16:19 | 2014-08-23 21:16:19,815 [qtp693536729-28 - /orca/xmlrpc] INFO controller.orca.controllers.xmlrpc.OrcaXmlrpcHandler? - discoverTypes: uncvmsite/vm rt=uncvmsite.vm available units=27
INFO | jvm 1 | 2014/08/23 21:16:19 | 2014-08-23 21:16:19,816 [qtp693536729-28 - /orca/xmlrpc] DEBUG controller.orca.controllers.xmlrpc.OrcaXmlrpcHandler? - Found abstract model for resource pool: uncvmsite.vm
INFO | jvm 1 | 2014/08/23 21:16:20 | 2014-08-23 21:16:19,911 [qtp693536729-28 - /orca/xmlrpc] INFO controller.orca.controllers.xmlrpc.OrcaXmlrpcHandler? - discoverTypes: uncNet/vlan rt=uncNet.vlan available units=1000
INFO | jvm 1 | 2014/08/23 21:16:20 | 2014-08-23 21:16:19,911 [qtp693536729-28 - /orca/xmlrpc] DEBUG controller.orca.controllers.xmlrpc.OrcaXmlrpcHandler? - Found abstract model for resource pool: uncNet.vlan
INFO | jvm 1 | 2014/08/23 21:16:20 | 2014-08-23 21:16:19,974 [qtp693536729-28 - /orca/xmlrpc] INFO controller.orca.controllers.xmlrpc.OrcaXmlrpcHandler? - discoverTypes: uncvmsite/vlan rt=uncvmsite.vlan available units=999
INFO | jvm 1 | 2014/08/23 21:16:20 | 2014-08-23 21:16:19,975 [qtp693536729-28 - /orca/xmlrpc] DEBUG controller.orca.controllers.xmlrpc.OrcaXmlrpcHandler? - Found abstract model for resource pool: uncvmsite.vlan
INFO | jvm 1 | 2014/08/23 21:16:20 | See modify http://geni-orca.renci.org/owl/modify/b607b2e2-9e5d-490a-9215-3ccb52fa2b3d#e3ab161c-4a8c-48ca-a2f5-87626a50894f# by name my-modify
INFO | jvm 1 | 2014/08/23 21:16:20 | See modify element http://geni-orca.renci.org/owl/modify/b607b2e2-9e5d-490a-9215-3ccb52fa2b3d#e3ab161c-4a8c-48ca-a2f5-87626a50894f#modifyElement/4f1e5fa7-0d26-4620-8ff1-45854c700259 with subject http://geni-orca.renci.org/owl/91232496-85b8-4adb-b9d4-48c4af4299ec#NodeGroup0 of type REMOVE
INFO | jvm 1 | 2014/08/23 21:16:20 | See modify element http://geni-orca.renci.org/owl/modify/b607b2e2-9e5d-490a-9215-3ccb52fa2b3d#e3ab161c-4a8c-48ca-a2f5-87626a50894f#modifyElement/0e013d1d-92f6-447b-b244-fa461251c305 with subject http://geni-orca.renci.org/owl/91232496-85b8-4adb-b9d4-48c4af4299ec#NodeGroup0 of type REMOVE
INFO | jvm 1 | 2014/08/23 21:16:20 | 2014-08-23 21:16:20,033 [qtp693536729-28 - /orca/xmlrpc] INFO controller.orca.controllers.xmlrpc.OrcaXmlrpcHandler? - ORCA API modifySlice() started...
INFO | jvm 1 | 2014/08/23 21:16:20 | 2014-08-23 21:16:20,149 [qtp693536729-28 - /orca/xmlrpc] DEBUG controller.orca.controllers.xmlrpc.XmlrpcOrcaState? - This slice ID=e5ac721f-d882-44d9-a5aa-751bb97222a1 for urn=ng-n-3
INFO | jvm 1 | 2014/08/23 21:16:20 | d=:http://geni-orca.renci.org/owl/uncvmsite.rdf#uncvmsite/Domain/vm;resourceType=vm:1:null:http://geni-orca.renci.org/owl/uncvmsite.rdf#uncvmsite/Domain/vm/vm
INFO | jvm 1 | 2014/08/23 21:16:20 | d=:http://geni-orca.renci.org/owl/uncvmsite.rdf#uncvmsite/Domain/vm;resourceType=vm:1:null:http://geni-orca.renci.org/owl/uncvmsite.rdf#uncvmsite/Domain/vm
INFO | jvm 1 | 2014/08/23 21:16:20 | d=:http://geni-orca.renci.org/owl/uncvmsite.rdf#uncvmsite/Domain/vm;resourceType=vm:1:null:http://geni-orca.renci.org/owl/uncvmsite.rdf#uncvmsite/Domain/vm
INFO | jvm 1 | 2014/08/23 21:16:20 | d=:http://geni-orca.renci.org/owl/uncvmsite.rdf#uncvmsite/Domain/vm;resourceType=vm:1:null:http://geni-orca.renci.org/owl/uncvmsite.rdf#uncvmsite/Domain/vm/vm
INFO | jvm 1 | 2014/08/23 21:16:20 | d=:http://geni-orca.renci.org/owl/91232496-85b8-4adb-b9d4-48c4af4299ec#Link31;resourceType=vlan:1:null:http://geni-orca.renci.org/owl/uncvmsite.rdf#uncvmsite/Domain/vlan
INFO | jvm 1 | 2014/08/23 21:16:20 | 2014-08-23 21:16:20,162 [qtp693536729-28 - /orca/xmlrpc] DEBUG controller.orca.controllers.xmlrpc.OrcaXmlrpcHandler? - No removed reservations in slice with urn ng-n-3 sliceId e5ac721f-d882-44d9-a5aa-751bb97222a1
INFO | jvm 1 | 2014/08/23 21:16:20 | 2014-08-23 21:16:20,163 [qtp693536729-28 - /orca/xmlrpc] DEBUG controller.orca.controllers.xmlrpc.OrcaXmlrpcHandler? - No added reservations in slice with urn ng-n-3 sliceId e5ac721f-d882-44d9-a5aa-751bb97222a1

Can you please check.

in reply to: ↑ 49   Changed 5 years ago by anirban

Making a new ticket for this: #352

Replying to anirban:

Issue with synchronization between controller and SM:

I tested whether controller and SM are going out of sync in terms of reservation states when we go through the following steps.

-1: clean-restart everything
0. Submit a request, and let it go to ticketed when I query manifest.
1. Restart SM.
2. Restart controller.
3. Query for manifest again.

When I query for the manifest again, it still shows "Ticketed" state. But in Pequod, both SM and AM show it "Active".

pequod:show>show reservations for 8b31b4f3-45cb-4cb7-9261-49fba026eaa2 actor unc-vm-am
83f59ef3-05ca-4e59-854a-ac9055e33867 unc-vm-am
Slice: 8b31b4f3-45cb-4cb7-9261-49fba026eaa2
1 uncvmsite.vm [ active, nascent]
Notices: Reservation 83f59ef3-05ca-4e59-854a-ac9055e33867 (Slice test-ani-1) is in state [Active,None]
Start: Fri Aug 22 14:35:11 EDT 2014 End:Sat Aug 23 14:35:12 EDT 2014

Total: 1 reservations
pequod:show>show reservations for 8b31b4f3-45cb-4cb7-9261-49fba026eaa2 actor unc-sm
83f59ef3-05ca-4e59-854a-ac9055e33867 unc-sm
Slice: 8b31b4f3-45cb-4cb7-9261-49fba026eaa2
1 uncvmsite.vm [ active, nascent]
Notices: Reservation 83f59ef3-05ca-4e59-854a-ac9055e33867 (Slice test-ani-1) is in state [Active,None]
Start: Fri Aug 22 14:35:11 EDT 2014 End:Sat Aug 23 14:35:12 EDT 2014

Total: 1 reservations


  Changed 5 years ago by anirban

Testing summary on UNC:

0. node, nodegroup, topologies, bound/unbound, postboot scripts: works

1. console log: works

2. renew behavior: works

3. slice garbage collection: works; Haven't tested the 24 hour GC yet.

4. couchDB actor registry: The non-DAR version of registry works fine. Claris is independently testing DAR.

5. SM, AM, controller recovery
a. deletion of old slice: works
b. renew of old slice: works
c. states of old slice: works
d. ability to create new slice across restarts: works
e. slice surviving multiple restarts: works
f. slice surviving restarts of either AM/SM/controller: works

The issues found for recovery are in general related to manifests and states of slices on controller/SM recovery. These are now separate tickets.

#349 - slice state on delete
#351 - modify issues on recovery
#352 - sync between controller and SM

Regards,
- Anirban

  Changed 5 years ago by claris

Anyone using UNC-HN this AM? I am not checking in new code but i do want to test the configuration properties for both DAR code and couchdb server.

  Changed 5 years ago by claris

I have tested the DAR code. I don't find any problem. the only change I made was that I had couchdb servers running on debug mode. However, I don't think that was getting on the way of the initialization process of the containers. I have shutdown the actors and clean up the acto registry.

As a friendly reminder.
- Clean the actor registry for a truly clean-restart
- Verify the actors using the Couchdb GUI interface at wsu-hn. Double click on actor record and change Verify field to "Y".

Tested
- Including broker-sm edge with and without certificate
- Disabled actor registry by commenting out ALL registry related properties.
- Anirban did some independent tetsts

  Changed 5 years ago by ibaldin

  • status changed from new to closed
  • resolution set to fixed

Superseded by #355.

Note: See TracTickets for help on using tickets.