P2P Backup Network

From Strugglers
Revision as of 03:21, 13 November 2005 by Andy (Talk | contribs)

Jump to: navigation, search

“Only wimps use tape backup: real men just upload their important stuff on ftp, and let the rest of the world mirror it ;)” — Linus Torvalds, 1996

Why?

Quite often you have some data that you would like to be widely backed up, either because you have no backups (bad idea!) or because it's hard for you to arrange for offsite backups.

If there was some sort of peer to peer network you could join where people would automatically take copies of your files, then that would be better than nothing, right?

Why would people download your data?

They would do so in order that people would download their data as well. The only people participating in the network presumably would be doing so in order to get their own files backed up.

There would be some sort of credits system. When you connect to the netwok your client would automatically download the most useful files from your peers. While you are online hosting those files you would get credits from those peers for doing so. Simultaneously those peers are attempting to get credits from you for downloading your files.

Your total space set aside for hosting other people's files makes an upper limit on the amount of credits you have and therefore once you run out, no one else will be backing up any more of your files.

Whenever someone goes offline, they are no longer hosting anything so the credits they earnt go back into the system being given back to whoever they got them from. If they came back online then their client could authenticate that they still have these files and thus get the credits back again quickly.

A new user to the system with no files hosted for anyone would enter the system with no credits at all. Presumably this situation would be quickly remedied as their client would automatically begin downloading files from others and gaining credits.

You would always be able to download your own files from the network for free; that is how you would do a restore.

Perhaps credits are not the best mechanism but it just gives an idea of the economy of the system.

Wouldn't people be able to view and even modify my files?

No; there would obviously have to be encryption and strong crypto-based signing of files. Also it wouldn't really be files but anonymous blocks of data that are passed around. The data is useless to the people mirroring it except as a means of getting credits so their own files will be mirrored.

Each user's client would be keeping track of the hash of each block of data offered so it would know that remote blocks of data are still intact and unmodified. Existing P2P networks already do this.

How do you know what is mirrored at any given time?

There would have to be client-side features to tell at any given moment how many copies of a given file are online. As the number of online copies for a given file drops to zero the client should be raising the offer price of that file so that new peers will download it. As the number of online copies goes very high the price should lower, asymptotically approaching zero.

Ideally this would mostly be completely automatic although it will probably be a good idea to have some knobs so that the user can designate certain files as being of higher or lower backup value than others. For example, a college student may decide their dissertation is worth 10 times as much, their general home directory default value, and their music collection is one hundredth normal value. Given a large enough peer group it would be expected then that that the dissertation would be mirrored 10 times as widely as other files, and the music collection one hundredth as widely.

What about bandwidth?

Indeed, bandwidth is very important, but it's not clear how to model it. 1GB of files remotely mirrored is not so useful if the only copy is behind a 33.6Kbit/sec dialup connection.

It may be possible to measure the user's bandwidth and factor that into their credit score, but as the only time the bandwidth really matters is when a peer needs a copy of their files back, this seems far too open to abuse.

please add more thoughts / ideas / questions