I'm one of the creators of the Beaker browser[1] and the reason we use Dat is th...

rspeer · on Sept 27, 2017

The use case really speaks to me, but I'm not convinced that decentralization is going to help datasets not to get lost.

I spent a while trying to download recent updates to the Reddit comment corpus [1], which is hosted on BitTorrent. The downloads never seem to finish.

It seems to me that decentralization means that, when a dataset stops being new and exciting, it will disappear. How will Dat counter this?

[1] https://www.reddit.com/r/datasets/comments/65o7py/updated_re...

yoshuaw · on Sept 27, 2017

Because Dat is just a protocol, decentralization is a choice. For quick, ephemeral exchanges direct P2P works brilliantly. For longer lived data sets, sharing it with a (commercial) mirror might make sense. Or perhaps you host it yourself. The beauty is that you, as a user of the protocol, get to decide what works best for you.

filiwickers · on Sept 27, 2017

We have a few approaches to the disappearing data.

First, we are working with libraries, universities, or other groups with large amounts of storage/bandwidth. They'd help provide hosting for datasets used inside their institutes or other essential datasets.

Second, we started to work on at-home data hosting with Project Svalbard[1]. This is kind of a SETI@home idea where people could donate server space at home to help backup "unhealthy" data (data that doesn't have many peers).

Finally, for "published" data (such as data on Zenodo or Dataverse), we can use those sites as a permanent HTTP peer. So if no data is available over p2p sites then you can get it directly from the published source.

As others said, decentralization is an approach but not a solution. It gives you the flexibility to centralize or distribute data as necessary without being tied to a specific service. But we still need to solve the problem!

[1] https://medium.com/@maxogden/project-svalbard-a-metadata-vau...

tbv · on Sept 27, 2017

That’s something we think about a lot, and decentralization isn’t a silver bullet solution to data loss, but I do think it’s more resilient than what we typically do now.

To counter that, you can take measures to mirror important datasets with a dedicated peer. It requires effort, but it at least makes it much, much harder for example, for a government agency to take down public data without warning.

kevinSuttle · on Sept 27, 2017

Why dat or quilt and not blockchain?

tbv · on Sept 27, 2017

A blockchain is an over-engineered solution to the problems we’re trying to solve. Blockchains provide shared global state. We don’t need that.

https://beakerbrowser.com/docs/inside-beaker/other-technolog...

jhoechtl · on Sept 27, 2017

A blockchain is a rather weak database in itself. However to store pointers into a dht like Dat would be fine

kevinSuttle · on Sept 27, 2017

A rather weak database? How do you figure?

pfraze · on Sept 27, 2017

This may not always be the case, but, so far, blockchains have low throughput and fat datasets that you have to sync. Compared to other databases, they don't perform that well, so if you don't need decentralized strict consensus, a blockchain isn't a good choice.