Storj Founder Shawn Wilkinson Discusses Project
Storj is a decentralized protocol, in development, designed to facilitate data persistence without the need for a centralized server. In this interview, I talk with the ...
What's up, party people? Chris DeRose here, community director of the Counterparty Foundation and on today's episode of the show I wanted to have an interview with Shawn Wilkinson of the Storj Project. Shawn Wilkinson is a phenomenal developer, somebody who's been in the crypto space for a good while now and I think it's a good person to really interview about he's using Counterparty and his project in general. So, let's get started. Shawn, how are you? I'm doing just fine, and yourself? I'm doing very well. We've seen each other at the conferences and such and I've seen your presentations and you've seen mine and we've never really done an interview together. So, I just wanted to get a little exposure to your project and answer some of my questions as well.
So, let's get started I guess. Why don't you give the audience a very quick description of what is Storj. Sure, sure. So the basic premise behind Storj, we're trying to do two things. We're trying to create an encrypted platform where you can store your files, store your data but that third party, that person who's actually hosting and holding your data doesn't have access to it. Only you have access to it.
The second part of that is that we want to more commodify data storage. So, the idea is you have a hard drive. You have some extra space that's useful to somebody somewhere. So the idea is that you could then sell that to somebody else somewhere and utilize that extra resource. So, you'd get rewarded but the person on the other end of that would get a lot cheaper service. So, they don't have to go to some company that obviously has to pay for the bandwidth and the hard drives and the maintenance and the cooling.
You just have it sitting in your computer, so why not sell it at a huge discount that will make everybody happy? Those are two parts of the equation. Right. So I read the white paper, it's not fresh in my mind. I remember reading it a little bit back. I guess just to jump right into some of the dev stuff. How is the project progressing in terms of the original white paper's design and the current design? Have there been many deviations? No deviations? Well, it's a tough load.
It's been about the same. I mean, the things it uses essentially will to transfer the data. So, you need to be able to encrypt the data locally. Which is fairly simple, right? Just, I need you to encrypt this file. There's a million libraries for that. And you need to transfer that data.
So, in the early days of the protocol and the white paper it really didn't go too much in depth in terms of transfer to another peer or someone who's hosting your file. So, we decided to mostly go with libtorrent and bittorrent like technologies to get those files out there and back. And then the other portion to this, which is the more complicated portion, is the audience. So, the idea is you take the file, you encrypt it, you put it on someone else's computer but you have to make sure that it's still there, available, hasn't been modified, hasn't been dropped cool out of site. So, the more challenging parts are the first two. We have libraries for it.
The third part is where it's taking up most of our time in terms of "What's the best way to do that?" So, we've just been doing scalable tests for now. So, we did our first test that we released a couple months ago. That's test group A. So, those people are giving us some ideas and some numbers of how these things should look. And right then we were just doing 100 byte verification. We scaled that out in our next release, it's going to be about a gigabyte.
So, a couple factors of improvement but locally we've got that out to about four terabytes per farmer. So, we're slowly scaling up that algorithm and learning as we go. So, the whole idea behind the storage project and how we're building these out is that we should do it one step at a time and that everything should be modular and be very easy to plug and play. Auditing does sound hard to me for a number of reasons. I read through the paper a while back and I remember you were computing I think, you were splitting files up into shards and then you were computing, if I'm mistaking, hashes of those shards in some capacity. Now, the concern that I have off the bat is that someone could cheat at some level and could start storing lots of these hashes and then reporting them back to the owner and perhaps receiving some credit.
People actually do get some credit for reporting that they have the files, no? And then what protections are in place to keep that from happening? Sure, thanks. So, the users are getting paid essentially per challenge. So, the basic idea is that you have my file, you want to get paid for hosting my file so I say "Okay, prove to me, mathematically and cryptographically that you have my file and then I'll give you a small payment." So, prove to me you have it. I pay you a cent and we do this every hour or so. So, the protections in place to protect someone from cheating is really in the audit mechanism.
So, the idea is that I can pass you some data that you need to add to the file to give me back a response but if you don't have that data, or you modified it in any way, that challenge will fail. So, we're really making heavy use of those basic hashing functions. But is there some kind of nods or something that keeps the check unique? Again, what I'm trying to understand is how you're going to ensure that that person hasn't remembered the challenge and then just replays it back consistently. So, the data is the same but the challenge is unique. Got you. So, the idea is I pass you unique challenge to that data so I get a unique response.
So that data stays static whereas the challenge is in flux. And if the person who is requesting the file doesn't have the data. How do they issue new challenges every time? Is that some algorithm that's available right now? It sounds cool. I don't know if that's something you're working on? Or if I'm envisioning the challenge itself. One area of thought is that you pregenerate them beforehand. So, that takes up a lot of computation on the side and a lot of overhead, but it's very secure.
There's no little pitfalls in that algorithm. Now, the other side of that is that you creatively use merkle roots and merkle trees essentially. Prove to me the subsection of my data. So there's multiple different ways of doing it. One is pre generated and one is on the fly. So, you just have to balance those computations so that you're not doing your audits too often to not make it worthwhile.
So, which one of those systems have you done currently in the code base? Or is that what you're working on right now? So, we have both of those systems, those auditing systems and abstract in our downstream node and downstream farm are all libraries. So, and heart beat is-, the lighter heart beat is the heart of that. We have a couple of those algorithms in there so we're playing around with different methods. Trying to figure out, "Okay, what's the most efficient?" Got you. Because you've got to balance the security with the overhead and the transfer speed and all that kind of stuff. Right.
So, it seems to me if you precomputed a number of these hashes in a merkle root or something you would still have some fitted number of hashes. So, I guess the question I would have is, how many do you think you would have and have you calculated what the recurrence interval would be? If at some point, let's say, would it be four years from now that would we repeat the tape, so to speak? Or is that configurable by the user? Where are you on that? Yes, so this is all configurable by the user but let's go through an example use case. So, at the end of the day it's a hash. So, the challenge is just a hash so that's a very small bit of data. So, I calculated out for a year you'd only need about 50 kilobytes if I remember correctly, I did this test a couple months ago. So, someone can double check my math.
So, if you did it, I think it's every hour, I think you'd be good with about 50 kilobytes. So, it's 50 kilobytes for the year for one shard with doing challenges every hour. And, so, that's the same if that shard, that piece of that pile, is a megabyte or if it's 50 terabytes. So, the idea is that you, as much as you can, you want to combine those things to actually larger chunks because there's less overhead to actually do the verifications. Okay. Conceivably if you ran out of those hashes, you would just reupload the file, I guess.
Cancel the last file. Recalculate it and then respin up. That makes sense. Okay, interesting. One thing too, this is a slight diversion from that, I've noticed in the project that you've really taken a very Dropbox oriented approach to the user experience. I feel like that in your mind that's what you've set out to achieve is something users can use on their laptops to display what they're using in a Dropbox environment.
For me, it's always been the realm of decentralization and crypto to start in the realm of that that which has a lot of Counterparty risk. So, you apply that to the information space and that would typically be something like copyright works or something like that. Where trusting a centralized source, like YouTube, is a source of a lot of risk. Do you aim to tackle that end of the project? Do you not like to discuss that end of the project? What's your feeling there? I mean, the basic idea behind that is that if we want people to use. . .
these are replacing an existing system, accepting applications existing use cases. So, if you have a newer system that comes along that's better, faster, cheaper, more secure, what not. And we're like, "Okay, that's great. But what is that hurdle to get me to use that?" If you look at Bitcoin in early 2010 or 11, the user interface and the tools behind it just weren't there. So, we really want to start this off as, "Let's design the tools really nicely and make them very easy to use for the users," because that's what you need to get a mass-adoption. That's what you need to get a million or more people into it.
A million people or more are not gonna use a command line interface. No, they want a clicky, draggy, droppy, looks nice, looks shiny, looks what they're used to kind of feel. So, it's that idea of abstraction. We want to abstract all the technical stuff, all the crypto, all the bitcoin, all the block chain kind of stuff. When you go to your car and you turn the keys and it just starts and it goes, you don't have to worry about what's happening under the hood. You just start it on and it goes.
Right So, we want the same thing for storage. We want something that someone just downloads a client, they don't have to worry about any of the underlying aspects. They just use it for data storage. And that's how you get a user base. That's how you get people using the software. So, and this is my problem maybe not everybody's, but I have a lot of problems with a lot of DMCA YouTube takedown notices.
That's my censorship. That's my Counterparty that I want to see Storj help me with. Is that the kind of environment that I could foresee, maybe? Is that there would be some level of HTTP gateways that the interface with a storage network that can be used to display content, for which people would be paying for or something like that? Or is that scenario not really going to be a part of the Storj initiative? Well, so, users are paid for both their storage space as storing data and combining bandwidth. Those are the two components that you need to make things work. So, in terms of distributing data with copyright organizations and their growing blanket requests to remove this data that half the time isn't even theirs. So, there's a lot of misuse of DMCA and other type portions.
But you really get into a tricky use case, right? So, how do you compare let's say, somebody that who a parody video under fair use and they should be able to do that versus let's say, ISIS decides to host beheading videos or something like that in Storj. So, the idea is that we want to make the content distribution and hosting mechanism a voting mechanism. Or the idea that if you know what that content is, well it's up to you whether you want to display, or host it, or share, or broadcast. So, by default, all the files on the network are encrypted and split up into multiple chunks. So, at the end of the day we want all the file to behave like a blackbox, right? You're passing this piece of data to another person. You don't care what's in that.
But for people who are publicly sharing this information, I know it's this chunk and I know the description keys and all this kind of stuff. Then the users can voluntarily say, "You know what, I don't want to host this particular content that's known or this is against my country's laws, blah blah blah." But, we let the user decide rather than adding in censorship mechanisms and all those kinds of things. So, the idea the user should be able to self-censor themselves and decide what content they want to put on the network. So, things like the parody videos that no one really has a problem with, those are going to stay on the network. But you'll probably be a little more harder pressed to find someone who's willing to host ISIS beheading videos.
So, you get some mathematical string, a hash, series of letters and numbers that fit mathematically identify cat.jpeg with this piece of data. So, you identify the file by what it is rather than it's a file. So, this solves a lot of problems in terms of if I'm requesting a file somewhere, how do I know I'm actually getting that file? Well, if I'm referring to that file by its contents I can always check that portion. So, that opens up that portion where you can open up all these gate things, right? So, the idea is that I can go to any server, any node, anywhere, whether it be HTTP, FTP, torrents, any kind of protocol I can use to retrieve this file from this network because now I can check that that's valid. So, I don't have to get it from a trusted source anymore.
I can get it from any source. And so that's where these cool concepts in terms of gateway nodes and getting around broken portions or censorships of firewalls or what not is that you can really communicate on any protocol as long as you can pass data through it. And then you can say, "Okay. That's my content. That's what I need." Right.
Okay, so then let's change a couple other things too. It's not really clear to me with the project how you were compensating people. Has the economics behind the project been well delineated somewhere? Is that available in a paper or in code form? So, what are people getting paid in and how much? When? And so on so forth? Sure, sure. So, there's lots of changes in terms of that. I think I've briefly touched on the reward mechanism both in the metanet paper and the Storj paper. I'm probably going to be.
. .I started the paper more describing the rewards but we've been working on something like the actual technical architecture and stuff. Okay. So, I can briefly summarize it. So, the idea is that consensus driven networks really don't work well with file storage.
Let's say, you have a gigahash.IO running wild and having 51% of the Bitcoin app. Who cares? Unless you're transacting right then and there, it's really not going to affect you or your Bitcoin. On the other side of that, if you use that kind of network where you have files which are active, which have stake. So, that 51% impactor could say, "Delete that file, or modify it, or do all kinds of bad things, in terms of the network." So, they might be able to do that on a small scale but the problem is that if people's files are getting deleted you lose a lot of faith in the robustness of the system.
Of course. So, you really can't use consensus for that. The idea is that the users themselves are purchasing a service from a network, so they should be the ultimate authority. You should be the ultimate authority over your files. So, you get to choose what algorithms are used to determine that. What encryption algorithms, who you're hosting your data with.
So, the idea is that the user themselves can decide what happens. Not a larger network. Okay. So, are you going to use Bitcoin or Storj credits? What kind of coinage would you be using? Do you have an idea of that part of the compensation side of things? So, there's two parts of that. One, we're using multiple block chains. So, we're using Bitcoin, for example, we're playing a little bit with Florincoin and Datacoin and some other coins because we need the block change to essentially notarize some of this information.
So the idea is that if you and I exchange some data, or do a contract where you're holding my data, we probably want to keep track of that as it happens. So, if there's any problems we can say, "Okay, this person or that person is at fault blah blah blah." Introduce those into the board. So, we need a way of keeping track of some of that information. So, we do a lot of usage of various blockchains to that can do that. Now, the other portion is what are the rewards delineated in.
So, we'll even use any token in the system. So, Bitcoin, Storj Coin. But in terms of that we're providing more specific value of use of extra storage or Storj Coin. So, the idea is that that's also mixed into our reputation system a little bit. So, the idea is that we're using that in various ways so you can say, "This person is not going to drop my little bitty chunk of file and I won't have any problems with them." Technically, but we're using them as a reward.
But that doesn't abstract the way stuff like using Bitcoin or all these other crypto currencies. So, we'd also, if it came along, we'd be open to using side chains as well. But, who knows if that's going to pop up 2016 plus... Right.
So, Counterparty is definitely the best platform we can use right now. It serves terms of function and getting things done. There's still many things that have to be built out on both sides of the fence to use on our platform. For example, we need pretty solid micro payments. So, that's been built out a little bit for Bitcoin and not much else. It's been built out of BitcoinJ but not many people use them.
So, that needs to be a micro channel implementation that any platform that could come up with that very nicely in a nice package would be very useful to us. So, we've been playing around with Counterparty and just talking with me how you could best just mirror the micro channel support in Bitcoin and Counterparty. But we're not to that point yet. Right. It's a complicated problem that we'd rather deal with the easier problems for now. The micropayment support.
Makes sense. So, I hear you guys are working with Factom right now? Is that something that is new? Do you want to tell me how that started? What's going on with that? Sure, sure. So, I've known for a while a project a long time ago, back when it was called Mary Jane. So, our particular use case in terms of storage with Factom is the metadata. So, before I was talking about me-, when we do actions on the system or for the Blockchain on the Storj network, we need to be able to notarize those, we need to be able to record those. And so Factom is very useful in terms of making a habit.
If you were to do it on Bitcoin, it just wouldn't work. We don't have the pass through put, and it costs you per every record insertion. So, the Factom allows you to do that very quickly and so it solves a lot of problems in terms of dealing metadata on Storj and on the platform. So, yeah, we're working with Factom as they go about to use their tools, to solve some of the problems we need to solve along the line. So, have you actually used any of their software yet or is this very preliminary? So we haven't started using the software yet, no. They're still trying to get out the kinks and working on their platform and all that kind of stuff.
So, when it's released and available and safe then we'll use it. That makes sense. Talking about the order of stuff, one thing that kind of confused a lot by, was it Greg Maxwell who did an original Storj project? Is this something that was related to you guys? What happened there? I never got a good understanding. Am I right about that? Sure, so that's actually Gregory Maxwell's post is the thing that inspired the whole project purpose. The idea that that I looked behind Gregory Maxwell's post, Bitcoin core developer, is that you give your files to this autonomous computer program and you pay it per hour or whatever for hosting these files for you and this autonomous computer program can go ahead and pay its own server space, hosting and all that kind of stuff itself. So the idea is that you have a service that's paid for its own upkeep and not run by any person or human or what not.
It gets into a little bit of science fiction kind of thing. Got you. The Terminator lives in Bitcoin or something. But the idea is that you have this autonomous agent, this fighting service, that can pay for itself. And so it's like, that's a really, really cool concept and I'd love to build something like that but first you have to have maybe a decentralized file sharing protocol. And so, that was the basic Storj.
We'd love to get to this cool concept where you can have these services that are completely AI controlled but let's build a base file storage network first and then we can get to that portion. Got you. That's where we're trying to push out. That's where we took the name, the concept and everything. Cool. Okay.
Well, I mean... I encourage anyone to read that post. It's really eye opening in terms of-, it's really short it's like this or so. So, it really opens your eyes to what's possible in terms of Bitcoin technology.
I mean, we're playing around here with all these various other coins and these meta protocols. Decentralizing exchanges as per contracts. But autonomous AI systems, that's really like five steps down the line and that's really, really long time, in my opinion. Right, right, right. Well, I mean that's what we're all working toward in this movement though. I appreciate what you guys are doing over there.
I think you answered a lot of my questions here for the quick interview that I wanted to do. The video recording on my end isn't as great as I'd like, so maybe we'll have to finish this in person in one of the next conferences and do some more questions. Is there anything else you maybe want to promote at the time? Is there something you'd like to suggest that people should look into? Or should they just go to your website for more info? Yeah, in terms of our project if you want to learn more go to STORJ.IO. That's STORJ.IO.
And so you can engage with our community, read our white papers, watch the video. I mean, if you're pretty curious about Storj start with the video. It's 60 seconds and then go from there. And then if you'd like to get more involved or really have some questions we have an email at hello@STORJ.IO. H-E-L-L-O@S-T-O-R-J-I-O.
IO. Ask us questions there. We can write you too, or chat, or help you out with any questions or comments that you might need to address. We're open to suggestions. Where we should take the protocol? What cool things you'd like to see? Would you like to purchase a beta? Being a farmer, all these kinds of things. So, stop by our forums.
Thank you, Shawn. I appreciate your time. And that's it, party people. I think that covers this interview and stay tuned for the next interview. Let me know if you like this video and if you have any questions for me, as usual you can leave them in the comments below or you can tweet me @DeRosetech on Twitter. And if you like this video, subscribe to the channel for more.
Later party people.