Recovering an Edge Gateway


#8

If you restore from the core image, you will need update all the snaps before attempting to install the Niagara. I would also recommend during the installation process to use the options to install docker from the internet.

Update all snaps with the command:

sudo snap refresh

I would recommend rebooting after this is done to make sure they are are running the latest update that just got installed.


#9

Installer question: should I assume there are no updates for the Niagara image available on the internet? Seems to ask a bunch. I’ve been denying them and just saying to install what’s included (except Docker).


#10

As an additional question: successfully connected to the platform, then tested a reboot (sudo init 6), and waited a bit. Was able to serial back in to the system with the same IP configuration, but the Niagara service seemingly did not start on its own. Is there an additional step I need to take to get that to start automatically?

I guess a recommended* additional step is more accurate. I could probably fenagle some Linux stuff to get that script to auto-start, but was wondering if there was a typical, expected procedure to follow for this.


#11

Yes, the niagara images are available online. There are a few options in the installer that allow you to download these versions without needing to update the installer if a later version is released that you need to update to. These are the selection version option and the install latest. Latest is considered the development version and not officially supported so is not intended to be used unless directly asked to download this version for troubleshooting purposes.

Towards the end of the installer, it asks if you wish to allow docker to manage the auto starting of Niagara. If you do not say yes, then you will need to do some sort of linux service manual setup which may be problematic with Ubuntu Core’s security model so it is not recommended. If for some reason docker did not start on reboot, then Niagara won’t stat since it is tied to that service.


#12

Gyah! Here’s my installation output haha:

Would you like Docker to manage the Niagara’s start/restart capability(Y,N)?In order for Niagara to correctly access TCP/IP configuration data correctly, the network stack must be configured with the NetworkManager snap.
Checking the status of NetworkManager…

So I dunno it looks like I missed it somehow. There was definitely a point at which I stuttered and hit enter twice. Might have been there.

Any ideas on how I might get that back? Other than blow away the whole thing again and start over? I assume running the install script again on top of the existing installation isn’t the greatest idea.


#13

Nothing bad or wrong about re-installing. You can select option 4 to re-install the same version and then make sure to select the option to preserve all user data.


#14

This does create a backup of the original installation so it is a good idea to go back and clean up all unnecessary backups.


#15

Happen to know where those backups are stored? This is a fresh installation so I doubt it’s a significant file size, but, curious as to where it is haha.


#16

To get a list type:

sudo docker ps -a

The backups will be denoted by the name “backup” as well as a time stamp for when they were made.

To manually clean them up, you can run the command:

sudo docker rm -v

The -v flag will remove all the persistent data that was used to pull existing user profiles and station data into the new installation. If you don’t have that flag it’ll only remove the niagara software instance but leave all that data behind for future reference and/or restoring. It is quite a process to access after the container reference is removed so it’s best to remove them at the same time when you are ready to do that.


#17

So, from my latest log:

Would you like Docker to manage the Niagara’s start/restart capability(Y,N)?Y
Unless explicitly stopped or docker itself is stopped, Docker will attempt to always restart the Niagara service.

Power cycled it and the Niagara service never came back again :confused:

Anywhere else I can check to verify Docker’s setup properly for this?


#18

Bleh yeah, I think the Niagara container has the correct restart policy, but the Docker daemon is not starting on reboot.


#19

There has been occasions with network switching where the first power cycle docker seems to fail to establish its network stack.

This restarts docker:
sudo snap disable docker
sudo snap enable docker

You may also want to run and update on all current snaps to make sure everything is up to date which sometimes helps stability with the startup process. This does require internet access.

sudo snap refresh


#20

Welp, that was it. Rebooted again and it’s been working fine haha.

Thanks for all the help with this one. Pretty neat product.


#21

All right sorry to resurrect this one again haha. Customer had an issue today with this and it appears to have been related to disk space. After deleting the Niagara installation .tar off the thing, it restarted normally.

There’s this /dev/mmcblk0p4 partition that appears to be the primary writable drive. When we were having this issue, the df on it showed us this:

/dev/mmcblk0p4 3.9G 3.9G 0 100% /writable

Deleted the .tar, now we have this:

/dev/mmcblk0p4 3.9G 2.9G 806M 79% /writable

Not sure what’s really using the rest of the space, I assume files related to all the programs/services installed on this thing, and the 600MB Niagara directory with all the scripts and stuff. Does this look normal? I believe the spec sheet said this thing has an 8GB SD card, perhaps it’s a 4GB? I suppose it could be an 8GB though going from the total df list.

Anyways, looking for any input you might have on disk space. Is a gigabyte about the expected amount of usable space on these things?

I failed to get a log of the errors I was looking at (copied the wrong ones). But it was from a “sudo journalctl -u snapd.refresh.service” command, and it was a couple of these:

PANIC cannot checkpoint even after 5m0s of retries every 3s: write /var/lib/snapd/state.json.srD9QjJDy5Td~: no space left on device

Here’s the df total before deleting the .tar and rebooting:

admin@2P4W802:~$ df -h --total
Filesystem      Size  Used Avail Use% Mounted on
udev            931M     0  931M   0% /dev
tmpfs           188M  8.8M  179M   5% /run
/dev/mmcblk0p4  3.9G  3.9G     0 100% /writable
/dev/loop0       91M   91M     0 100% /
/dev/loop1      156M  156M     0 100% /lib/modules
tmpfs           936M  4.0K  936M   1% /etc/fstab
tmpfs           936M     0  936M   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           936M     0  936M   0% /sys/fs/cgroup
tmpfs           936M     0  936M   0% /mnt
tmpfs           936M     0  936M   0% /tmp
tmpfs           936M     0  936M   0% /var/lib/sudo
tmpfs           936M     0  936M   0% /media
/dev/loop2       87M   87M     0 100% /snap/core/5145
/dev/loop3       19M   19M     0 100% /snap/locationd/163
/dev/loop4       91M   91M     0 100% /snap/core/6405
/dev/loop5      4.8M  4.8M     0 100% /snap/network-manager/379
/dev/loop6       21M   21M     0 100% /snap/uefi-fw-tools/11
/dev/loop7       29M   29M     0 100% /snap/wifi-ap/208
/dev/loop8      3.4M  3.4M     0 100% /snap/udisks2/100
/dev/loop9       98M   98M     0 100% /snap/docker/321
/dev/loop10     156M  156M     0 100% /snap/caracalla-kernel/102
/dev/loop12     155M  155M     0 100% /snap/caracalla-kernel/93
/dev/loop11     5.5M  5.5M     0 100% /snap/modem-manager/222
/dev/loop13     896K  896K     0 100% /snap/caracalla/52
/dev/loop14     896K  896K     0 100% /snap/caracalla/49
/dev/loop15     2.2M  2.2M     0 100% /snap/tpm2/42
/dev/loop16     3.3M  3.3M     0 100% /snap/wpa-supplicant/41
/dev/loop17     4.2M  4.2M     0 100% /snap/bluez/166
/dev/loop18      21M   21M     0 100% /snap/uefi-fw-tools/10
/dev/loop20     6.7M  6.7M     0 100% /snap/alsa-utils/68
/dev/loop19     5.2M  5.2M     0 100% /snap/network-manager/263
/dev/loop21     5.7M  5.7M     0 100% /snap/modem-manager/139
/dev/loop22      29M   29M     0 100% /snap/wifi-ap/250
/dev/mmcblk0p2   63M  3.6M   60M   6% /boot/efi
cgmfs           100K     0  100K   0% /run/cgmanager/fs
tmpfs           188M     0  188M   0% /run/user/1000
total            13G  4.9G  7.8G  39% -

Here’s the df total after deleting the .tar and rebooting:

admin@2P4W802:~$ df -h --total
Filesystem      Size  Used Avail Use% Mounted on
udev            931M     0  931M   0% /dev
tmpfs           188M  8.9M  179M   5% /run
/dev/mmcblk0p4  3.9G  2.9G  806M  79% /writable
/dev/loop0       91M   91M     0 100% /
/dev/loop1      156M  156M     0 100% /lib/modules
tmpfs           936M  4.0K  936M   1% /etc/fstab
tmpfs           936M  136K  936M   1% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           936M     0  936M   0% /sys/fs/cgroup
tmpfs           936M     0  936M   0% /tmp
tmpfs           936M     0  936M   0% /var/lib/sudo
tmpfs           936M     0  936M   0% /media
tmpfs           936M     0  936M   0% /mnt
/dev/loop2      3.3M  3.3M     0 100% /snap/wpa-supplicant/41
/dev/loop3       29M   29M     0 100% /snap/wifi-ap/208
/dev/loop4      5.5M  5.5M     0 100% /snap/modem-manager/222
/dev/loop5      896K  896K     0 100% /snap/caracalla/52
/dev/loop6      4.8M  4.8M     0 100% /snap/network-manager/379
/dev/loop7       21M   21M     0 100% /snap/uefi-fw-tools/10
/dev/loop8      6.7M  6.7M     0 100% /snap/alsa-utils/68
/dev/loop9      2.2M  2.2M     0 100% /snap/tpm2/42
/dev/loop10      29M   29M     0 100% /snap/wifi-ap/250
/dev/loop11      19M   19M     0 100% /snap/locationd/163
/dev/loop12     4.2M  4.2M     0 100% /snap/bluez/166
/dev/loop13      87M   87M     0 100% /snap/core/5145
/dev/loop14      21M   21M     0 100% /snap/uefi-fw-tools/11
/dev/loop15     896K  896K     0 100% /snap/caracalla/49
/dev/loop16      98M   98M     0 100% /snap/docker/321
/dev/loop17     156M  156M     0 100% /snap/caracalla-kernel/102
/dev/loop18     5.7M  5.7M     0 100% /snap/modem-manager/139
/dev/loop19     5.2M  5.2M     0 100% /snap/network-manager/263
/dev/loop20     155M  155M     0 100% /snap/caracalla-kernel/93
/dev/loop21      91M   91M     0 100% /snap/core/6405
/dev/loop22     3.4M  3.4M     0 100% /snap/udisks2/100
/dev/mmcblk0p2   63M  3.6M   60M   6% /boot/efi
cgmfs           100K     0  100K   0% /run/cgmanager/fs
tmpfs           188M     0  188M   0% /run/user/1000
total            13G  3.9G  8.6G  32% -

#22

Those numbers seem about normal for a 3000 series as these are fairly small in their memory footprint. There have had similar reports of requiring the installer files be removed to clear up space for the application so this is not an unexpected requirement. I would recommend this device export its historical data to a larger supervisor regularly enough to keep the database small enough for consistent operation.

To your question about where the memory is going, there is some used by the operating system itself as well as the snaps required to operate basic functionality of the system, i.e. docker. These two things use up the majority of the available space for these systems while Niagara itself only uses a small portion of it and then the station will increase this value as as more alarms, histories, components, and logic are added.


#23

All right well that’s a bit of a relief haha. Glad I was able to figure it out.

Any suggestions on getting these things to restart better on their own? It has seemed like every time the customer has cut power or moved the device to a new location and booted it up, the Niagara daemon has never started successfully on the first try, and it has to be power cycled once or twice to get going. Has anyone tried any kind of delayed start that might help it be more consistent?


#24

This is a known issue but the cause is yet to be determined. This is either within the operating systems IP stack and/or docker. The first time a new network is connected, docker seems to be unable to access the new network and never starts and so Niagara never starts. After a restart/power cycle, docker seems to have resolved its network issues and then begins to operate normally.

I’m unaware of any attempt to delay the start of the docker snap.


#25

Cool so, just to make sure I have this straight: this should only occur when moving to a new network, so you just have to be aware when doing so that you might need to let it do its thing, fail to start Niagara, then reboot it and all should be well.


#26

Yes, that is currently a requirement when moving networks.


#27

I apologize that this ticket refuses to die heh.

So, Niagara container wasn’t starting again. Disk usage related again it seems. The mmcblk0p4, which I believe is what we’re using, is once again full:

root@2P4W802:/# df -h --total
Filesystem      Size  Used Avail Use% Mounted on
udev            931M     0  931M   0% /dev
tmpfs           188M  8.9M  179M   5% /run
/dev/mmcblk0p4  3.9G  3.9G     0 100% /writable
/dev/loop0       91M   91M     0 100% /
/dev/loop1      155M  155M     0 100% /lib/modules
tmpfs           936M  4.0K  936M   1% /etc/fstab
tmpfs           936M     0  936M   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           936M     0  936M   0% /sys/fs/cgroup
tmpfs           936M     0  936M   0% /var/lib/sudo
tmpfs           936M     0  936M   0% /media
tmpfs           936M     0  936M   0% /tmp
tmpfs           936M     0  936M   0% /mnt
/dev/loop2      6.7M  6.7M     0 100% /snap/alsa-utils/68
/dev/loop3       19M   19M     0 100% /snap/locationd/163
/dev/loop4      156M  156M     0 100% /snap/caracalla-kernel/102
/dev/loop5      3.4M  3.4M     0 100% /snap/udisks2/100
/dev/loop6       98M   98M     0 100% /snap/docker/321
/dev/loop7       21M   21M     0 100% /snap/uefi-fw-tools/10
/dev/loop8      155M  155M     0 100% /snap/caracalla-kernel/106
/dev/loop9      896K  896K     0 100% /snap/caracalla/52
/dev/loop10      91M   91M     0 100% /snap/core/6405
/dev/loop11     896K  896K     0 100% /snap/caracalla/49
/dev/loop12     5.2M  5.2M     0 100% /snap/network-manager/263
/dev/loop14     5.5M  5.5M     0 100% /snap/modem-manager/222
/dev/loop13      21M   21M     0 100% /snap/uefi-fw-tools/11
/dev/loop15      29M   29M     0 100% /snap/wifi-ap/250
/dev/loop16     4.2M  4.2M     0 100% /snap/bluez/166
/dev/loop17     3.3M  3.3M     0 100% /snap/wpa-supplicant/41
/dev/loop18     2.2M  2.2M     0 100% /snap/tpm2/42
/dev/loop19     5.7M  5.7M     0 100% /snap/modem-manager/139
/dev/loop20      87M   87M     0 100% /snap/core/5145
/dev/loop21     4.8M  4.8M     0 100% /snap/network-manager/379
/dev/loop22      29M   29M     0 100% /snap/wifi-ap/208
/dev/loop23     155M  155M     0 100% /snap/caracalla-kernel/93
/dev/mmcblk0p2   63M  3.6M   60M   6% /boot/efi
cgmfs           100K     0  100K   0% /run/cgmanager/fs
tmpfs           188M     0  188M   0% /run/user/1000
total            13G  5.0G  7.8G  40% -

I’m unsure what else I could delete. There’s still the initial, extracted 600MB installer directory that has the startup scripts and stuff. I feel like this is just going to keep happening though? The customer had a station running, few graphics, I don’t think any history collection. He said it was nothing he wouldn’t have put on the JACE, so I don’t think his station would use the 700MB I freed up last time. End-user lost the ability to login, I found it in a state of the Niagara container perpetually trying to restart. Upon reboot it just never started at all, and attempts to manually start resulted in disk space errors.

Any further ideas?