P2P Tehnoloogiad MTAT (2 AP) Omadused ja P2P Mudelid Ilja Livenson 1

P2P Tehnoloogiad MTAT.08.002 (2 AP) Omadused ja P2P Mudelid Ilja Livenson ilja@nicpb.ee 1

Eelmine kord Esimene tähtaeg 02.10.2006 (aga mida varem, seda parem!) Servent olem, mis saab nii teha päringut, kui ka vastata sellele Puhas vs hübriid P2P süsteem Communication > Group Management -> Robustness -> Class-specific -> Application specific tasemed 2

Seekord P2P omadused P2P mudelid Centralized Directory Model Flooded Requests Model Document Routing Model Chord, CAN, Tapestry, Pastry Projektid (mida varem hakkad, seda kergem on sess!) 3

Decentralization Decentralization 4

Decentralization Pros: Price, scalability, perfomance Cons: Security, Joining the system 5

Scalability Scalability Synchronization of central services Maintance of states Programming model of computation Decentralization 6

Anonymity Anonymity Scalability Decentralization 7

Anonymity forms Author - dokumendi autorit ei saa määrata Publisher - dokumendi avaldajat ei saa määrata Reader - kasutajat, kes tõmbab dokumendi, ei saa määrata Server - dokumendi põhjal ei saa määrata servereid, kus ta asub Document serverid ei tea, mis faile nad hoiavad Query server ei tea, mis dokumenti ta kasutab, kui vastab päringule 8

Tehnikad 9

Self-organization OceanStore routing Pastry failide replikad Scalability FastTrack, Skype supernodes Anonymity Self-organization Decentralization 10

Cost of Ownership Anonymity Scalability Cost of ownership Self-organization Väga väike võrreldes klient-server rakendustega Decentralization 11

Ad-hoc Connectivity Anonymity Scalability Cost of ownership Self-organization Ad-hoc connectivity Decentralization 12

Ad-hoc Connectivity Ressursside pool P2P süsteemis on ebastabiilne Ligipääs failidele on ebastabiilne SLA puhul osa teenuspakkujast võib olla maas Koostöö süsteemid Mobiilsed seadised Läbipaistev suhtlemine offline süsteemidega (proxies, sender relays,...) 13

Perfomance Anonymity Scalability Cost of ownership Self-organization Ad-hoc connectivity Perfomance Decentralization 14

Perfomance Processing Storage Networking 15

Perfomance Keskselt koordineeritud süsteemid DNS Hajutatud süsteemid Message forwarding Võrgutraffic läheb suuremaks 16

Perfomance Replication Luuakse koopiad otsijale lähemale Uuendusi on vaja propageerida (consistency) Caching FreeNet'is kui fail on leitud ning tagastatud soovijale, iga vahesõlm puhverdab tagastatud andmeid 17

Perfomance Intelligent routing On vaja aru saada kuidas sõlmed omavahel suhtlevad (sotsioloogia vaatenurgast) Small-world phenomenon (Milgram 1967) Sõlmed, millel on sarnased huvid, võiksid olla seotud otseselt Võrgukulud langevad, otsingu kiirus kasvab 18

Security Anonymity Scalability Cost of ownership Self-organization Ad-hoc connectivity Perfomance Security Decentralization 19

Security Multi-key encryption Public key, multiple private keys Sandboxing Koodi käivitamine sõlmes on ebaturvaline On vaja tagada, et kood ei tee midagi halba Virtuaalmasinad, proof-carrying code, certifying compilers 20

Security Digital Rights Management On vaja tagada, et autorit saaks alati määrata Watermarking (steganogrpahy): faili lisatakse signatuur Reputation and Accountability On vaja määrata, kui hea sõlm on Jagad palju muusikat -> oled hea Freeloader -> oled halb 21

Security Firewalls P2P vajab otseühendust sõlmede vahel (duh) Inbound TCP on väga tihti blokeeritud NAT Kui mõlemad sõlmed on peidetud NATi/firewalli taha, võib kasutada kolmanda sõlme 22

Transparency Anonymity Scalability Cost of ownership Self-organization Ad-hoc connectivity Perfomance Security Decentralization Transparency 23

Fault Resilience Fault-resilience Anonymity Scalability Cost of ownership Self-organization Ad-hoc connectivity Perfomance Security Decentralization Transparency 24

Fault-Resilience Central design point Vältida central point of failure! Erisõlmed relays Groove Sõnumite järjekord 25

Interoperability Fault-resilience Interoperability Anonymity Scalability Cost of ownership Self-organization Ad-hoc connectivity Perfomance Security Decentralization Transparency 26

Interoperability Peer-to-Peer Working Group (Internet2) Not too active JXTA Katse teha de facto standardit Järgmise loengu teema Hea baas projekti tegemiseks (olemas ka C/C++ realisatsioon)! 27

P2P Omadused Fault-resilience Interoperability Anonymity Scalability Cost of ownership Self-organization :) Ad-hoc connectivity Perfomance Security Decentralization Transparency 28

P2P Mudelid Centralized Directory Model Flooded Requests Model Document Routing Model 29

Centralized Directory Sõlmed avalikustavad infot enda kohta tsentraalses serveris Kui tuleb päring, server valib hulgast parima peer'i Mõned skaleeruvuse probleemid Samas Napster'i näide näitab, et see ei ole eriti suur probleem 30

Flooded Requests Gnutella mudel Võrgukoormus on väga suur Super-peer'id võivad aidata 31

Document Routing FreeNet'i lähenemine Iga peer saab ID P Iga peer teab teatud hulk teisi peer'i Dokumendi publitseerimisel saab dokument samuti ID D = h(sisu, nimi) Dokument on siis saadetud edasi kuni ta jõuab peer'ini, mille ID P on ID D 'ga kõige sarnasem 32

Document Routing 33

Document Routing Otsimine Päring läheb peer'ile kõige sarnasema ID'ga kuni dokument on leitud Dokument on transatud tagasi, iga transaktsioonis osalev peer salvestab oma koopiat Problems On vaja teada ID enne otsimist Islanding problem (segmenteerimine) 34

Document Routing Chord, CAN, Tapestry ja Pastry Põhisiht vähendada hop'ide arvu otsimisel Need algoritmid kas garanteerivad või väidavad, et suure tõenäosusega otsing on O(log) keerukusega Järgmised slaidid on võetud siit: http://www.cs.bgu.ac.il/~ccsh032/ 35

CAN CAN is Content-Addressable Network Interface insert(key, value) Value = retrieve(key) Properties Scalable Operationally simple Good perfomance 36

CAN: basic idea

CAN: basic idea insert (K 1,V 1 )

CAN: basic idea (K 1,V 1 )

CAN: basic idea retrieve (K 1 )

CAN: solution virtual Cartesian coordinate space entire space is partitioned amongst all the nodes every node owns a zone in the overall space abstraction can store data at points in the space can route from one point to another point = node that owns the enclosing zone

CAN: simple example 1

CAN: simple example 1 2

CAN: simple example 1 3 2

CAN: simple example 1 3 2 4

CAN: simple example

CAN: simple example I

CAN: simple example node I::insert(K,V) I

CAN: simple example node I::insert(K,V) (1) a = h x (K) I x = a

CAN: simple example node I::insert(K,V) (1) a = h x (K) b = h y (K) I y = b x = a

CAN: simple example node I::insert(K,V) (1) a = h x (K) b = h y (K) I (2) route(k,v) -> (a,b) Following the straight line path from the source to the destinantion

CAN: simple example node I::insert(K,V) (1) a = h x (K) b = h y (K) I (2) route(k,v) -> (a,b) (K,V) (3) (a,b) stores (K,V)

CAN: simple example node J::retrieve(K) (1) a = h x (K) b = h y (K) (2) route retrieve(k) to (a,b) (K,V) J

CAN Data stored in the CAN is addressed by name (i.e. key), not location (i.e. IP address)

2d neighbors CAN: routing table

CAN: routing (a,b) (x,y)?

CAN: routing A node only maintains state for its immediate neighboring nodes

CAN: node insertion Bootstrap node new node 1) Discover some node I already in CAN

CAN: node insertion I new node 1) discover some node I already in CAN

CAN: node insertion (p,q) 2) pick random point in space I new node

CAN: node insertion I (p,q) J new node 3) I routes to (p,q), discovers node J

CAN: node insertion Periodic updates: send zone id to its neighbors New node obtains routing from J J new 4) split J s zone in half new owns one half

CAN: node failures Need to repair the space Explicit hand over recover database soft-state updates use replication, rebuild database from replicas repair routing takeover algorithm

CAN: takeover algorithm Simple failures know your neighbor s neighbors when a node fails, one of its neighbors takes over its zone Periodic update include: zone id + neighbors Absense: singals failure TAKEOVER message to all failed node neighbors and sets a takeover timer Receipt of TAKEOVER: compare volume and either cancel or reissue TAKEOVER message

CAN: takeover algorithm More complex failure modes simultaneous failure of multiple adjacent nodes scoped flooding to discover neighbors hopefully, a rare event

CAN: node failures Only the failed node s immediate neighbors are required for recovery

Design recap Basic CAN completely distributed self-organizing nodes only maintain state for their immediate neighbors Additional design features multiple, independent spaces (realities) background load balancing algorithm simple heuristics to improve performance

Multi-Demensioned Spaces Increase the number of dimensions Result: reduce path length A node NOW has more neighbors

Realities Multiple coordinate space A node is assigned r coordinate zones Content is replicated to all zones Result: can route to (x,y,z) on any reality and at each hop, can use different reality Each value is kept at r nodes and each node has r neighbor sets

Outline Introduction Design Evalution Ongoing Work

Evaluation Scalability Low-latency Load balancing Robustness

CAN: scalability For a uniformly partitioned space with n nodes and d dimensions per node, number of neighbors is 2d average routing path is (dn 1/d )/4 hops simulations show that the above results hold in practice Can scale the network without increasing per-node state Chord/Plaxton/Tapestry/Buzz log(n) nbrs with log(n) hops

CAN: low-latency Problem latency stretch = (CAN routing delay) (IP routing delay) application-level routing may lead to high stretch Solution increase dimensions heuristics RTT-weighted routing multiple nodes per zone (peer nodes) deterministically replicate entries

Overloading Zones Multiple nodes per zone up to MAXPEERS Split zone: only if over MAXPEERS Each peer in zone knows all others in zone, but still keep one neighbor per zone. Periodically: requst list of peers from neighbor and select a new neighbor with best RTT Content: divided or replicated

CAN: load balancing Two pieces Dealing with hot-spots popular (key,value) pairs nodes cache recently requested entries overloaded node replicates popular entries at neighbors Uniform coordinate space partitioning uniformly spread (key,value) entries uniformly spread out routing load

Uniform Partitioning Added check at join time, pick a zone check neighboring zones pick the largest zone and split that one

Uniform Partitioning 100 65,000 nodes, 3 dimensions Percentage of nodes 80 60 40 w/o check w/ check V = total volume n 20 0 V 16 V 8 V 4 V 2 Volume V 2V 4V 8V

CAN: Robustness Completely distributed no single point of failure Not exploring database recovery Resilience of routing can route around trouble

Routing resilience destination source

Routing resilience

Routing resilience destination

Routing resilience

Routing resilience Node X::route(D) If (X cannot make progress to D) check if any neighbor of X can make progress if yes, forward message to one such nbr

Routing resilience

CAN: node insertion Inserting a new node affects only a single other node and its immediate neighbors or O(d) neighbors

Lõpp? Järgmine kord JXTA 87