The Ultimate P2P Backup Software (or, the definition of insanity)
This week in “Things I Wish Existed”:
Free, easy to use, peer-to-peer data backup!
It’s a super good idea, right?
It would seem as though data backup would be a great business to be in–people are taking more digital photos and videos than ever before with the advent of great smartphone cameras, generating tons of data, and nowadays things that used to be physical, from mail to books, are more likely to be electronic–email and ebooks. More data. The data backup business could be worth $5.5 billion by 2023. But data backup is an unsexy business to be in, like selling seatbelts or toothbrushes or insurance–things which you probably need or probably should do or have, but maybe you don’t, at least as much as you should.
Peer-to-peer backup makes sense to me, as a software guy in an industry where lots of people have a computer or laptop (or two!), reasonably quick internet and definitely some extra storage capacity to spare, what with today’s multi-terabyte hard drives bringing the outright purchase cost of storage down to pennies per gigabyte. Heck, I’ve got three laptops, two headless servers, a few raspberry pi’s (and a partridge in a pear tree) with excess capacity running at home right now! I’ve got more than enough devices to back them all up to each other, and could trade space with friends in Boston or family in Honolulu to ensure that my data would be protected from local disasters or theft.
The idea is not new, but it seems as though the internet is littered with the debris of companies and projects trying to get some kind of P2P backup service/software to catch on that “regular people” can use.
Now I explain that everyone who has tried this has failed
Definition of Insanity: doing the same thing over and over again and expecting different results
It was a great run for CrashPlan for Home, launched in 2007 and helping me personally back up hundreds of gigabytes of data to and from computers around the world. But apparently cheapskates like me (who use the software for P2P backup instead of paying for cloud backup services), and data-hoarders who accumulate terabytes of data and back it up to the all-you-can-eat flat-rate service make consumer backup a relatively unprofitable endeavor, so they just announced they’re shutting it down.
While CrashPlan was not the only P2P backup system, it probably was one of the most widely used due to its relative ease of use, low cost ($0 if you only back up P2P) and ubiquity (runs on Windows, macOS and Linux). For sure there were things about it which were not so good–the backup client was ugly, slow, and somewhat complicated. But at least it was P2P backup and it worked. So with the news of the P2P service’s demise, much hand wringing and soul searching has commenced. People are jumping out of the sinking CrashPlan ship and looking for another horse to hitch their data wagon to.
There apparently was a service called Wuala which did P2P backup until 2015, there was something else called BuddyBackup which hasn’t been updated since 2014, and some guy wrote about P2P backup at BackupCoop.com in 2006, and there’s an abandoned project Backup P2P still downloadable, and a P2P backup utility DataHaven.NET which the author sunsetted to start a P2P backup company called BitDust (insert joke about what his company is eventually going to do here)…
So many failed companies and abandoned products! Does this serve as evidence that A) there is no market for it, or B) the cost and hassle of setting up backup is still too high, or that C) all attempts to date at cracking this problem have been substandard? What would a backup product look like which might have a chance of succeeding where so many attempts have not?
The part where I dream a little and oversimplify a lot
This is getting a little long but my first reaction as a software developer was to start imagining the wonderful future where something awesome exists, and how hard it would be to implement this, so let me just brain-dump some ideas for roughly what you’d need to do and then drop the mic:
The Ultimate P2P backup system:
- Everyone should use it! Therefore it must be easy to use, and have incentives to be used.
- It should be super cheap! So it should be P2P to utilize cheap excess capacity.
- No one should be able to hack it! Accordingly it should be end-to-end encrypted.
- Not all data needs hardcore backup! Ergo the level redundancy should match the importance of the data.
Here are some developer-speak ideas for making this thing a reality.
Step 1: Create backup client
- Cannot stress enough the importance of ease-of-use in the discovery, installation and usage of the client UI being paramount to widespread adoption, make it beautiful, simple and powerful.
- Intelligent consumer/user defaults – backup user profile directories, exclude temp/swap/transient files. Make it easy to choose what to back up and how many copies to make.
- Tech stack – do the client in something like Electron so you can reuse most UI code across platforms
Step 2: Implement Service
- The backup service code needs to be fast but portable since it’s doing a lot of stuff, something cross platform and productive like .NET Core or Go.
- Data Deduplication is super important, need to either do it file- or block-level – crib something from borg or sdfs or btrfs.
- Map files to file versions to blocks to block hashes in a database (mariadb?) – make keeping version history as cheap as possible (only need to store changed blocks for multiple versions). Also need to have a way of pruning old versions periodically (along with unused blocks).
- File/Directory level redundancy specification, like Storage Spaces failure tolerance levels. Limited by storage quota and number of peers.
- Should accommodate re-distribution/leveling for adding and removing peers/quota (whether intentionally or unintentionally)
- Ought to be able to run headless on Smart NAS devices like Synology or QNAP or IoT devices like Raspberry Pi, so ARM support for runtime is important
Step 3: Create P2P network
- Reuse something like Satoshi Client Node Discovery or PEX to find peers
- Gotta bust through firewalls/NATs with UPnP or NAT-PMP
- Allow users to proffer storage quotas to friends, or trade storage with strangers. Assumption is that friends are reliable/stable storage targets but strangers less so
- Store file/version/blockhash database locally for fast access, need to also replicate encrypted copies to peers for recovery in case of computer crashes
- Peers receive encrypted blocks/db-replicas, not able to read them without the key
- Controls for throttling bandwidth based on network/time/cpu conditions
There’s the master plan, will you please go make it?
It’ll be really hard (the deduplication and quota/redundancy leveling problems sound super extra hard, whereas making a passable P2P discovery/networking and working with various OS filesystems is only normal very hard). You might have to run and pay for a few servers to help with peer discovery. You’d definitely want to open source it to get the crypto nerd early adopters to trust it. That would also help with continuity and maintenance if you could recruit enough developers with a vested interest in seeing it succeed. And in the end you’ll be paid in zero dollars but also in the currency of satisfaction in that you have helped solve one of the computing world’s biggest problems. The data-safety equivalent of helping us all get a little insurance, brush our teeth and buckle our safety belts.
Addendum: Additional ideas to maybe make money (or “Step 4: Profit?”)
- Build the P2P backup software into a branded NAS or partner with a hard drive company to make drop-in self contained peers, sell them like hotcakes
- Or maybe put some hard drives into a Smart Home device like Google Home or Echo Dot and make those drop in peers. Or do like Apple does it for their Airport Time Capsule (hard drive backup in a WiFi router), only this time with P2P backup they’d be better utilized and internet connected. (Sell them like hotcakes)
- Start tons of companies to sell backup space! With the P2P backup network, anyone could create a CrashPlan/BackBlaze/Carbonite backup service with low overhead by simply firing up the software on their servers and offering to “peer” storage quota to you for a price. If lots of companies started up to do this, they could compete the price of offering quota down to reasonable levels. And they could differentiate themselves by economics (making it dirt cheap by any means necessary) or security (ultra-secure datacenters deep underground) or geography (servers in politically insulated countries with strict data extradition policies), while all using a common network and software platform
- Or maybe you already have a company with a datacenter with excess capacity! Rent the full storage capacity out immediately to help recover the sunk cost, while dialing back quotas when you need the storage space back (and letting the P2P network figure out re-leveling the redundancy for those affected)
- Get Acquired! Maybe you decide to do this as a VC funded startup or build a prototype and then have Microsoft or Apple or Google buy you up to build the technology directly into their operatin systems. That would not be super great since then the P2P backup software wouldn’t be open and free any more, but I guess you could cry about that all the way to the bank.
Mic Drop
OK, that was long enough. I really hope someone goes ahead and builds something like this. If no one else builds it by the time CrashPlan Home stops working in October 2018, maybe I will. Just because everyone else has failed, doesn’t mean I would too, does it? Or was that the definition of insanity?