File sharing and Content distribution in P2P Systems jinyang@cs.nyu.edu Evolution of P2P Systems Napster, Gnutella, Kazaa, Bittorrent Nodes "share" music files with each other; each downloads from others as well as uploads to others What are different in these systems than traditional systems (NFS, Web)? What is p2p? 1. (p2p design) All nodes have the same functionality (completely decentralized) and all communication is symmetric - in contrast to systems where there is a single point of coordination (e.g. slave/master, client/server) Pros: * more resilient to failure (no single point of failure) * more load balanced (no single point of coordination) * easy to scale (add more nodes) * more suitable for wide area distribution (because wide area connection to a single site is not as reliable) 2. (p2p deployment) Nodes administered/owned by different users (administrative domains) cooperate for mutual benefition. - in contrast to systems where all nodes have a single owner. (e.g. datacenter distributed systems, local area network file systems etc.) - how are cooperation achieved? (w/ incentives? through explicit node admission?) Pros: * cheap; aggregate idle resources (no dedicated infrastructure) * self-reliant (dedicated infrastructure may be down because of various reasons) Cons: * selfish, malicious, unreliable nodes. Paper I: Wide area cooperative storage Provide a scalable wide-area storage infrastructure for contention distribution Deployment scenario: (explicitly admitted) volunteer nodes all over the Internet Usage: Sourceforge distributes all software projects bookpublishers distribute all electronic books CFS exports a file system (hierarchical organization of files) interface to clients. How to distribute a file system over many servers? - chop every file into small (8K) blocks (why not at file granularity? load balance the retrieval of popular big files, higher download throughput for big files) - balance the storage load among different servers - In a sense, CFS is like a virtual harddisk; each server holds a portion of the data blocks How to identify data? How to authenticate data one gets back from CFS? (publisher does not completely control the volunteer nodes) - one publisher w/ a wellknown public-key - identify a data block by its content hash - use a merkle tree of content hashes to produce a single root block - root block is signed by publisher's public-key and named by publisher's public-key - read-only by clients, publishers can update FS How to find a data block given its identifier? - recap of consistent hashing and chord routing How to deal with server failures? - replicating each data block on k successors - A key's successor is in charge of regenerating new replicas when existing ones fail Load balance when different servers have different storage capacity - many virtual servers on a single physical host PAST distributes storage among servers at file granularity Additional mechanism to explicitly balance loads among different servers. Paper II: CoDoNS CoDoNS: Cooperative storage for DNS records Why is a p2p design better than the hierarchical organization of the current DNS system? * limited redundancy few name servers (<= 2) responsible for most domain Furthermore, redundant name servers often belong to the same site * static hierarchy results in load imbalance and fewer points of failures (e.g. failure of TLD servers) * TTL-based caching scheme trades off low latency for slow update Goal: a p2p design for DNS. Deployment scenario: large # of volunteer nodes over the Internet; e.g. all of the currently running name servers Why not using CFS or PAST for this purpose? * passive caching in CFS/PAST is not effective - takes a long time to warm up the cache - caching along the lookup patch results in many unnecessary cached copies. * update of cached records is slow in CFS/PAST (TTL-based) - same tradeoffs for read latency/update latency as in legacy DNS Design: * What is the key for a DNS record? * Each DNS record corresponds to a (set of) home node(s) * Home node decides on a replication level i based on object popularity; popular records have higher replications - DNS records popularity has zipf distribution; - DNS popularity does not change suddenly * Home node proactively pushes replicate records to all nodes with the same i prefixes; lower i --> higher replications Other issues: * Who are authorized to update DNS records? Paper III: CoralCDN Many web sites are not well-provisioned and suffer from "slashdot" effect. * over provisioning is expensive (similarly, using a commercial CDN like Akamai is costly) * popularity of contents served is unpredictable How to design a content distribution network out of numerous cooperative volunteer nodes? (Who are the volunteer nodes? e.g. all the underprovisioned web sites cooperate, all ISPs volunteer a number of nodes) Overall set up of CoralCDN * DNS redirection * Coral proxies cache Web contents and try to serve requests out of each other's cache. Why passive caching instead of proactive replication (e.g. CoDoNS, Akamai) ? * web content popularity is hard to predict (much harder than DNS records) Goal of Coral Design: 1. try to always serve (cached) contents near web clients - lower latency - higher throughput How to find a proxy near to the client? explicit hierarchical clustering of nodes based on network latency 2. Try to avoid contacting the origin server by obtaining contents from other proxies' cache whenever possible. - store pointers to cached web objects in the DHT rather than the objects themselves. * key --> hash of URLs * value --> a set of IP addresses for those Coral proxies that have a cached copy of the URL content - in Coral, numerous proxies might want to store/update pointers, i.e. one key corresponds to many values --> hot spot for store operations (In contrast, CFS/PAST/CoDoNS only allows owner of the data objects to store/update data) Key insights: - update/store can be sloppy (a subset, not all, of the stored values is enough to retrieve a cached web object) - iteratively contact nodes closer and closer to the root of a key, stop not at the root, but at any full and loaded node - store the pointer at the node one DHT hop further from the loaded node to the key.