Topic: Routing in wireless and overlay networks jinyang, rtm Paper III: Scalable P2P Lookups (Chord) Internet routing, wireless routing: routing (looking up) for nodes by their IP addresses/destination P2P routing: routing (looking up) for data by names Characteristics of P2P systems * huge number of nodes connected by Internet (e.g. Skype ~5 million nodes) cannot know all nodes and all data items. * nodes belong to different independent users, no natural central authority central authority is prone to failure and expensive to maintain Problems: * routing: Where to find a particular data item? - how to name data? - how to name nodes? how to organize nodes? * storage: How to prevent data from being lost and keeping it available? consistent? * security and incentives How to organize nodes to help find data? * centralized e.g. Napster * hierarchy DNS * flat Distributed Hashing Tables (DHTs) DHT Interface: put(key, value) i.e. put (name_of_the_document, document) get(key, value) i.e. get (name_of_the_document, document); DHT basics: * routing layer handles efficient name to responsible node mappping * storage layer handles putting data and maintaining data Potential Applications: * publishing a data item: DNS, searching for files in a file sharing app. * file system storage: DHT is like a big virtual disk * location tracking: updating a node's current location w/ DHTs What does a complete algorithm need to do? 1. Name nodes and documents 2. Define document ID to node ID assignment 3. Define per-node routing table contents and how to lookup using tables 4. How to include new nodes in existing nodes' tables (Join procedure) 5. Failure recovery 6. Move data around when nodes join 7. Make new replicas when nodes fail Chord Naming: - Each node has a unique (flat) ID e.g. SHA-1 of their IP addresses - Each document has a unique (flat) ID (document key) e.g. SHA-1 of document contents Assignment: - a ring of ID-space - a key is assigned to its successor node (if node and document IDs are random, there's reasonable load balance) Other assignments: Tree based (Tapestry, Pastry), Hypercube (CAN) etc. Routing (lookup): - Query is destined for a document key - Node needs to forward the query to a node closer to the key in ID-space Strawman: - Each node knows its immediately successor Query travels clockwise to get closer to its successor node in O(n) time Make it faster: - Each node knows a few further-away nodes (finger table) In Chord, each node knows O(log n) exponentially further away nodes Hand off query to the next hop that is closest to the destination Intuitively, there's a binary tree rooted at every node Starts at that node's row 0, Continues through another node's row 1, etc. Since every node is a root, there's no root hotspot How to acquire/maintain routing table? General approach: Assume system starts out w/ correct routing tables. Use existing routing tables to help the new node find information. Add new node in a way that maintains correctness. To initialize a new node's routing table: New node issues a lookup for its own node ID to any existing node Finds new node's successor. Ask that node for its finger table. To include new node in others' routing table: Each node keeps track of its current predecessor. New node tells its successor that its predecessor has changed. Periodically ask your successor who its predecessor is; switch if that node is closer to you What about concurrent joins? Two nodes with similar IDs find the same existing successor upon join. What about node failures? Node crash/leave without warning. Problematic: Other nodes' finger table refer to the dead node. Dead nodes' predessor refers to a dead successor If you try to route via dead node, detect timeout, treat as empty table entry. I.e. route to numerically closer entry instead. Repair: ask any node on same row for a copy of its corresponding entry. Or any node on rows below. All these share the right prefix. For missing successor: Failed node might have been closest to key ID! Need to know next-closest. Maintain a _list_ of successors: r successors. If you expect really bad luck, maintain O(log N) successors. We can route around failure. The system is effectively self-correcting. Locality: Lookup takes log(n) hop, but each hop is often half way across the world! Can we route through nodes close to us on underlying network? This boils down to whether we have choices: If multiple correct next hops, we can try to choose closest. Chord doesn't allow much choice. Observe: Strict successor for finger not necessary. Sample nodes in successor list of true finger, pick closest. Security: Self-authenticating data, e.g. key = SHA1(value) Malicious nodes cannot forge data. Data is immutable. Can malicious nodes claim that data does not exist? yes, if they can choose node IDs maliciously.. Discussions: Why not keep complete routing table as all nodes so one can always route in one hop? How about persistent data storage? Are DHTs only suitable for storing small "pointer"-type data in systems w/ churn? Paper IV: Virtual Ring Routing Differences between DHT routing and Internet/Wireless routing DHT looks for data, Internet(wireless) routing looks for node DHT relies on Internet routing (1 overlay hop requires multiple hops in the underlying net) Key idea of VRR: Make use of ID-ring to guide routing progress But a node must know (i.e. be able to route to) its successor, which is multiple hops away A VRR node explicitly sets up a path to its ring neighbors upon joining. Why does it work? VRR can always find a destination by getting closer in ID-space VRR's stretch is not bad because intermediate nodes are likely to shortcut to the destination node How much per-node state? (Recall that DHT has log n per-node routing state) O(sqrt n) A constant density wireless network of n nodes has average path length of O(sqrt n). Each node sets up routes to r ID-space neighbors, each O(sqrt n) hops away; hence n * r * sqrt n such routing state in total ==> n * r * sqrt n / n = r * sqrt n per-node state VRR uses hard state instead of soft state Soft state: relies on periodic updating, okay to get out of date. Hard state: explicitly setup and tear down, should always be correct. Each path between two ID-space neighbors involve sqrt n hops ==> vulnerable; if any sqrt n links change, the path needs to be torn down and re-set up Local repair make use of massive redundancy in wireless network to locally repair a broken link.