• IcyToes@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    11
    ·
    edit-2
    9 hours ago

    Is it in a data centre or someone’s house? If the latter, would they let a stranger in?

    Surely they would need a backup and replicate db to so in case of hardware failure they switch over.

    Sounds like they could improve their setup.

    Too much of a single point of failure.

    • Kris@feddit.org
      link
      fedilink
      English
      arrow-up
      33
      ·
      edit-2
      8 hours ago

      Slrpnk.net admin here.

      The failure seems to have been in the main firewall, if it had been the server itself we could have easily restored it on another server from the backups on another machine. But as it stands, remote access is entirely cut off.

      There usually is another person with hardware access, but they are on summer holidays. This seemed like an acceptable risk at the time…

      An off-site backup would have been nice of course, but due to the costs involved in running an Lemmy instance of that size on a rented server, it would have not been a great option either.

      I have plans to add a KVM to the main firewall via a secondary connection, but even that might have not helped in this case. I’ll know more when I have physical access again.

      • nickwitha_k (he/him)@lemmy.sdf.org
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 hour ago

        I’ve done a lot of SysAdmin and DCOps stuff in the past so, thought I’d give you some plausible suggestions (haven’t dug deep into Lemmy DB stuff and DNS/Federation of the stack, so not sure all is practical).

        Scenario 1 - Preserve and merge when access is restored

        Setup

        • Spin up two VMs/VPS (or one that has enough grunt for two Lemmy servers). Call them robak.slrpnk.net and slrpnk.net and point DNS appropriately.
        • Pull federated content from other instances and place it on robak, set as read-only.
        • Sync important comms to (new) slrpnk.net without content.
        • Allow users to sign up, vetting as possible (all mods). Keep a list of those that are vetted (call it vetted.list). Inform all users that any non-vetted users will have their content dropped when access is restored.

        Merge!

        • Once access is restored, ensure that (old) slrpnk.net is set to read-only.
        • Schedule a maintenance window (announce more time than you are likely to need).
        • During the maintenance window, put (new) slrpnk.net into R/O, or just block external access.
        • Query the db on (old) slrpnk.net for all users.
        • Subtract the vetted users from vetted.list from the list.
        • Drop all records from the resulting list of non-vetted users from (new) slrpnk.net.
        • Insert the records from vetted and new users (those without conflicts) into the DB on (old) slrpnk.net.
        • Validate that everything is working
        • Cut over DNS and spin down the new VMs/VPS.

        Scenario 2 - Server is in DC or Admin able to facilitate access

        • Get a db dump/backup.
        • Spin up temporary slrpnk.net on a VM/VPS.
        • Use backup of temporary server to restore data to original, when possible.