Ticket #252 (closed defect: fixed)

Opened 6 years ago

Last modified 6 years ago

Blowhole reporting from ExoSM not working

Reported by: ibaldin Owned by: vjo
Priority: major Milestone:
Component: External: Blowhole/Slice Reporting Version: baseline
Keywords: Cc: anirban@…

Description

As of today 06/05, last slice in db is from 06/03. Other SMs appear to be reporting, but not ExoSM controller.

ExoSM controller was already restarted today, and yet there is nothing in db - I threw in at least one slice (for UvA stitching)...

Change History

Changed 6 years ago by ibaldin

Is it that blowhole simply needs to be restarted after controller is restarted?

Changed 6 years ago by anirban

I saw the following in the Exo-SM logs. But now, this error is gone. I see both sliceLists and manifests being published correctly. It might be a temporary Openfire hiccup ??

INFO | jvm 1 | 2013/06/05 14:37:15 | 2013-06-05 14:37:15,549 [PublisherTask?] ERROR controller.orca.controllers.xmlrpc.pubsub.PublishManager? - Error creating XMPP pubsub node: No response from server.:
INFO | jvm 1 | 2013/06/05 14:37:15 | ERROR [PublisherTask?] (XMPPPubSub.java:297) - Error creating XMPP pubsub node: No response from server.:
INFO | jvm 1 | 2013/06/05 14:37:15 | 2013-06-05 14:37:15,549 [PublisherTask?] ERROR controller.orca.controllers.xmlrpc.pubsub.PublishManager? - Unable to publish sliceList: node does not exist
INFO | jvm 1 | 2013/06/05 14:37:15 | ERROR [PublisherTask?] (XMPPPubSub.java:501) - Unable to publish sliceList: node does not exist
INFO | jvm 1 | 2013/06/05 14:37:42 | 2013-06-05 14:37:41,934 [PublisherTask?] ERROR controller.orca.controllers.xmlrpc.pubsub.PublishManager? - Error creating XMPP pubsub node: No response from server.:
INFO | jvm 1 | 2013/06/05 14:37:42 | ERROR [PublisherTask?] (XMPPPubSub.java:297) - Error creating XMPP pubsub node: No response from server.:
INFO | jvm 1 | 2013/06/05 14:37:42 | 2013-06-05 14:37:41,934 [PublisherTask?] ERROR controller.orca.controllers.xmlrpc.pubsub.PublishManager? - Unable to publish manifest: node does not exist
INFO | jvm 1 | 2013/06/05 14:37:42 | ERROR [PublisherTask?] (XMPPPubSub.java:404) - Unable to publish manifest: node does not exist
INFO | jvm 1 | 2013/06/05 14:39:33 | ERROR [PublisherTask?] (ReservationConverter?.java:804) - unit.instance.config is null
INFO | jvm 1 | 2013/06/05 14:39:42 | 2013-06-05 14:39:41,966 [PublisherTask?] ERROR controller.orca.controllers.xmlrpc.pubsub.PublishManager? - Error creating XMPP pubsub node: No response from server.:
INFO | jvm 1 | 2013/06/05 14:39:42 | ERROR [PublisherTask?] (XMPPPubSub.java:297) - Error creating XMPP pubsub node: No response from server.:
INFO | jvm 1 | 2013/06/05 14:39:42 | 2013-06-05 14:39:41,966 [PublisherTask?] ERROR controller.orca.controllers.xmlrpc.pubsub.PublishManager? - Unable to publish manifest: node does not exist
INFO | jvm 1 | 2013/06/05 14:39:42 | ERROR [PublisherTask?] (XMPPPubSub.java:404) - Unable to publish manifest: node does not exist

We still need to check whether blowhole is catching the new manifests.

Changed 6 years ago by vjo

Disabling the openfire restart cronjob; wondering if that's the reason for the hiccup.
Let's keep an eye on it, both for further hiccups and for openfire database size increase.

Changed 6 years ago by vjo

Continuing to keep an eye on this; new slices have been picked up by blowhole.
Need to check: expiration/cleanup of slices (in both blowhole and openfire).

Changed 6 years ago by vjo

  • status changed from new to accepted

Changed 6 years ago by vjo

  • status changed from accepted to closed
  • resolution set to fixed

Cron job to clean up and restart OpenFire? was disabled last week.

New slices continue to be picked up, one week later.
Furthermore, old slices *are* being cleaned up:

21:03:46,549 INFO ManifestSubscriber? - Up since Wed Jun 05 15:06:36 EDT 2013 subscribed to 2, served 488 manifest events

...

21:03:49,577 INFO ManifestSubscriber? - Removing subscription from manifest /orca/sm/nicta-sm---03aad3b6-80f7-49a9-856c-378b75b9a5fc/urn:publicid:
IDN+cascade:tutorial+slice+test1---00535d98-3445-436d-9f56-9909c5728446/manifest
21:03:51,041 INFO ManifestSubscriber? - Trying to (re)subscribe to slice lists:
21:03:51,549 INFO ManifestSubscriber? - Up since Wed Jun 05 15:06:36 EDT 2013 subscribed to 1, served 488 manifest events

And, the size of the OpenFire? embedded DB is remaining stable:

[vjo@control embedded-db]$ pwd
/opt/openfire/embedded-db
[vjo@control embedded-db]$ ls -alpsh
total 54M
4.0K drwxr-xr-x 2 daemon daemon 4.0K Jun 4 03:08 ./
4.0K drwxr-xr-x 11 daemon daemon 4.0K Nov 9 2012 ../
132K -rw-r--r-- 1 daemon daemon 131K Jun 4 03:08 openfire.backup

25M -rw-r--r-- 1 daemon daemon 32M Jun 13 13:54 openfire.data

4.0K -rw-r--r-- 1 daemon daemon 16 Jun 13 15:53 openfire.lck

29M -rw-r--r-- 1 daemon daemon 29M Jun 13 15:53 openfire.log

4.0K -rw-r--r-- 1 daemon daemon 410 Jun 4 03:08 openfire.properties

20K -rw-r--r-- 1 daemon daemon 19K Jun 4 03:08 openfire.script
32K -rw-r--r-- 1 root root 29K Feb 5 17:22 openfire.script.debug
20K -rw-r--r-- 1 root root 19K Mar 14 17:32 openfire.script.orig
20K -rw-r--r-- 1 root root 20K Mar 14 17:22 openfire.script.orig.pre_vjo

Closing this ticket as resolved.

Note: See TracTickets for help on using tickets.