P2P Tehnoloogiad MTAT.08.002 (2 AP) Omadused ja P2P Mudelid Ilja Livenson ilja@nicpb.ee 1
Eelmine kord Esimene tähtaeg 02.10.2006 (aga mida varem, seda parem!) Servent olem, mis saab nii teha päringut, kui ka vastata sellele Puhas vs hübriid P2P süsteem Communication > Group Management -> Robustness -> Class-specific -> Application specific tasemed 2
Seekord P2P omadused P2P mudelid Centralized Directory Model Flooded Requests Model Document Routing Model Chord, CAN, Tapestry, Pastry Projektid (mida varem hakkad, seda kergem on sess!) 3
Decentralization Decentralization 4
Decentralization Pros: Price, scalability, perfomance Cons: Security, Joining the system 5
Scalability Scalability Synchronization of central services Maintance of states Programming model of computation Decentralization 6
Anonymity Anonymity Scalability Decentralization 7
Anonymity forms Author - dokumendi autorit ei saa määrata Publisher - dokumendi avaldajat ei saa määrata Reader - kasutajat, kes tõmbab dokumendi, ei saa määrata Server - dokumendi põhjal ei saa määrata servereid, kus ta asub Document serverid ei tea, mis faile nad hoiavad Query server ei tea, mis dokumenti ta kasutab, kui vastab päringule 8
Tehnikad 9
Self-organization OceanStore routing Pastry failide replikad Scalability FastTrack, Skype supernodes Anonymity Self-organization Decentralization 10
Cost of Ownership Anonymity Scalability Cost of ownership Self-organization Väga väike võrreldes klient-server rakendustega Decentralization 11
Ad-hoc Connectivity Anonymity Scalability Cost of ownership Self-organization Ad-hoc connectivity Decentralization 12
Ad-hoc Connectivity Ressursside pool P2P süsteemis on ebastabiilne Ligipääs failidele on ebastabiilne SLA puhul osa teenuspakkujast võib olla maas Koostöö süsteemid Mobiilsed seadised Läbipaistev suhtlemine offline süsteemidega (proxies, sender relays,...) 13
Perfomance Anonymity Scalability Cost of ownership Self-organization Ad-hoc connectivity Perfomance Decentralization 14
Perfomance Processing Storage Networking 15
Perfomance Keskselt koordineeritud süsteemid DNS Hajutatud süsteemid Message forwarding Võrgutraffic läheb suuremaks 16
Perfomance Replication Luuakse koopiad otsijale lähemale Uuendusi on vaja propageerida (consistency) Caching FreeNet'is kui fail on leitud ning tagastatud soovijale, iga vahesõlm puhverdab tagastatud andmeid 17
Perfomance Intelligent routing On vaja aru saada kuidas sõlmed omavahel suhtlevad (sotsioloogia vaatenurgast) Small-world phenomenon (Milgram 1967) Sõlmed, millel on sarnased huvid, võiksid olla seotud otseselt Võrgukulud langevad, otsingu kiirus kasvab 18
Security Anonymity Scalability Cost of ownership Self-organization Ad-hoc connectivity Perfomance Security Decentralization 19
Security Multi-key encryption Public key, multiple private keys Sandboxing Koodi käivitamine sõlmes on ebaturvaline On vaja tagada, et kood ei tee midagi halba Virtuaalmasinad, proof-carrying code, certifying compilers 20
Security Digital Rights Management On vaja tagada, et autorit saaks alati määrata Watermarking (steganogrpahy): faili lisatakse signatuur Reputation and Accountability On vaja määrata, kui hea sõlm on Jagad palju muusikat -> oled hea Freeloader -> oled halb 21
Security Firewalls P2P vajab otseühendust sõlmede vahel (duh) Inbound TCP on väga tihti blokeeritud NAT Kui mõlemad sõlmed on peidetud NATi/firewalli taha, võib kasutada kolmanda sõlme 22
Transparency Anonymity Scalability Cost of ownership Self-organization Ad-hoc connectivity Perfomance Security Decentralization Transparency 23
Fault Resilience Fault-resilience Anonymity Scalability Cost of ownership Self-organization Ad-hoc connectivity Perfomance Security Decentralization Transparency 24
Fault-Resilience Central design point Vältida central point of failure! Erisõlmed relays Groove Sõnumite järjekord 25
Interoperability Fault-resilience Interoperability Anonymity Scalability Cost of ownership Self-organization Ad-hoc connectivity Perfomance Security Decentralization Transparency 26
Interoperability Peer-to-Peer Working Group (Internet2) Not too active JXTA Katse teha de facto standardit Järgmise loengu teema Hea baas projekti tegemiseks (olemas ka C/C++ realisatsioon)! 27
P2P Omadused Fault-resilience Interoperability Anonymity Scalability Cost of ownership Self-organization :) Ad-hoc connectivity Perfomance Security Decentralization Transparency 28
P2P Mudelid Centralized Directory Model Flooded Requests Model Document Routing Model 29
Centralized Directory Sõlmed avalikustavad infot enda kohta tsentraalses serveris Kui tuleb päring, server valib hulgast parima peer'i Mõned skaleeruvuse probleemid Samas Napster'i näide näitab, et see ei ole eriti suur probleem 30
Flooded Requests Gnutella mudel Võrgukoormus on väga suur Super-peer'id võivad aidata 31
Document Routing FreeNet'i lähenemine Iga peer saab ID P Iga peer teab teatud hulk teisi peer'i Dokumendi publitseerimisel saab dokument samuti ID D = h(sisu, nimi) Dokument on siis saadetud edasi kuni ta jõuab peer'ini, mille ID P on ID D 'ga kõige sarnasem 32
Document Routing 33
Document Routing Otsimine Päring läheb peer'ile kõige sarnasema ID'ga kuni dokument on leitud Dokument on transatud tagasi, iga transaktsioonis osalev peer salvestab oma koopiat Problems On vaja teada ID enne otsimist Islanding problem (segmenteerimine) 34
Document Routing Chord, CAN, Tapestry ja Pastry Põhisiht vähendada hop'ide arvu otsimisel Need algoritmid kas garanteerivad või väidavad, et suure tõenäosusega otsing on O(log) keerukusega Järgmised slaidid on võetud siit: http://www.cs.bgu.ac.il/~ccsh032/ 35
CAN CAN is Content-Addressable Network Interface insert(key, value) Value = retrieve(key) Properties Scalable Operationally simple Good perfomance 36
CAN: basic idea
CAN: basic idea insert (K 1,V 1 )
CAN: basic idea insert (K 1,V 1 )
CAN: basic idea (K 1,V 1 )
CAN: basic idea retrieve (K 1 )
CAN: solution virtual Cartesian coordinate space entire space is partitioned amongst all the nodes every node owns a zone in the overall space abstraction can store data at points in the space can route from one point to another point = node that owns the enclosing zone
CAN: simple example 1
CAN: simple example 1 2
CAN: simple example 1 3 2
CAN: simple example 1 3 2 4
CAN: simple example
CAN: simple example I
CAN: simple example node I::insert(K,V) I
CAN: simple example node I::insert(K,V) (1) a = h x (K) I x = a
CAN: simple example node I::insert(K,V) (1) a = h x (K) b = h y (K) I y = b x = a
CAN: simple example node I::insert(K,V) (1) a = h x (K) b = h y (K) I (2) route(k,v) -> (a,b) Following the straight line path from the source to the destinantion
CAN: simple example node I::insert(K,V) (1) a = h x (K) b = h y (K) I (2) route(k,v) -> (a,b) (K,V) (3) (a,b) stores (K,V)
CAN: simple example node J::retrieve(K) (1) a = h x (K) b = h y (K) (2) route retrieve(k) to (a,b) (K,V) J
CAN Data stored in the CAN is addressed by name (i.e. key), not location (i.e. IP address)
2d neighbors CAN: routing table
CAN: routing (a,b) (x,y)?
CAN: routing A node only maintains state for its immediate neighboring nodes
CAN: node insertion Bootstrap node new node 1) Discover some node I already in CAN
CAN: node insertion I new node 1) discover some node I already in CAN
CAN: node insertion (p,q) 2) pick random point in space I new node
CAN: node insertion I (p,q) J new node 3) I routes to (p,q), discovers node J
CAN: node insertion Periodic updates: send zone id to its neighbors New node obtains routing from J J new 4) split J s zone in half new owns one half
CAN: node failures Need to repair the space Explicit hand over recover database soft-state updates use replication, rebuild database from replicas repair routing takeover algorithm
CAN: takeover algorithm Simple failures know your neighbor s neighbors when a node fails, one of its neighbors takes over its zone Periodic update include: zone id + neighbors Absense: singals failure TAKEOVER message to all failed node neighbors and sets a takeover timer Receipt of TAKEOVER: compare volume and either cancel or reissue TAKEOVER message
CAN: takeover algorithm More complex failure modes simultaneous failure of multiple adjacent nodes scoped flooding to discover neighbors hopefully, a rare event
CAN: node failures Only the failed node s immediate neighbors are required for recovery
Design recap Basic CAN completely distributed self-organizing nodes only maintain state for their immediate neighbors Additional design features multiple, independent spaces (realities) background load balancing algorithm simple heuristics to improve performance
Multi-Demensioned Spaces Increase the number of dimensions Result: reduce path length A node NOW has more neighbors
Realities Multiple coordinate space A node is assigned r coordinate zones Content is replicated to all zones Result: can route to (x,y,z) on any reality and at each hop, can use different reality Each value is kept at r nodes and each node has r neighbor sets
Outline Introduction Design Evalution Ongoing Work
Evaluation Scalability Low-latency Load balancing Robustness
CAN: scalability For a uniformly partitioned space with n nodes and d dimensions per node, number of neighbors is 2d average routing path is (dn 1/d )/4 hops simulations show that the above results hold in practice Can scale the network without increasing per-node state Chord/Plaxton/Tapestry/Buzz log(n) nbrs with log(n) hops
CAN: low-latency Problem latency stretch = (CAN routing delay) (IP routing delay) application-level routing may lead to high stretch Solution increase dimensions heuristics RTT-weighted routing multiple nodes per zone (peer nodes) deterministically replicate entries
Overloading Zones Multiple nodes per zone up to MAXPEERS Split zone: only if over MAXPEERS Each peer in zone knows all others in zone, but still keep one neighbor per zone. Periodically: requst list of peers from neighbor and select a new neighbor with best RTT Content: divided or replicated
CAN: load balancing Two pieces Dealing with hot-spots popular (key,value) pairs nodes cache recently requested entries overloaded node replicates popular entries at neighbors Uniform coordinate space partitioning uniformly spread (key,value) entries uniformly spread out routing load
Uniform Partitioning Added check at join time, pick a zone check neighboring zones pick the largest zone and split that one
Uniform Partitioning 100 65,000 nodes, 3 dimensions Percentage of nodes 80 60 40 w/o check w/ check V = total volume n 20 0 V 16 V 8 V 4 V 2 Volume V 2V 4V 8V
CAN: Robustness Completely distributed no single point of failure Not exploring database recovery Resilience of routing can route around trouble
Routing resilience destination source
Routing resilience
Routing resilience destination
Routing resilience
Routing resilience Node X::route(D) If (X cannot make progress to D) check if any neighbor of X can make progress if yes, forward message to one such nbr
Routing resilience
CAN: node insertion Inserting a new node affects only a single other node and its immediate neighbors or O(d) neighbors
Lõpp? Järgmine kord JXTA 87