# **Distributed Systems**

# Fundamentals – Shared Store

Fabienne Boyer Reprise du cours d'Olivier Gruber

Université Joseph Fourier

Projet ERODS (LIG)

### **Message Fundamentals**

Today's Lecture

 The problem:
 Sharing data between multiple processes (read and write access)
 When accessing something means acquiring a copy through messages
 When updating a local copy requires sending notification messages
 Discussing consistency
 How are the different processes seeing each other reads and writes?

**Multi-cores – memory** Illustration Could be **JVM – object store** 





## **Useful Notations**

Notations
 Read operation at process *Pi* on data *x* returning the value *a* -Ri(x)a
 Write operation at process *Pi* on data *x* writing the value *b* -Wi(x)b
 Time flows from left to write
 All data items are initialized to *NIL*



λProcesses
 λWe have N processes
 λWe have a shared data store with data items



 $_{\lambda}$ Discussion

λLike a normal shared memory, requires the use of synchronization λPoor performance (latencies)



Introducing local caches
 One local cache per process
 Shorter latencies, but we need a consistency protocol
 Store must remember who has a cached copy of each data item
 Must be able to callback caches to install updates



alntroducing cache flushes

Caches may overflow, we need to flush some cached data items



λCache Design
 λHow do we maintain the store copy lists?
 λSo to avoid unnecessary update messages





Cache Design
 Use FIFO communication channels!
 With TCP/IP, requires to keep sockets open
 With UDP, you need to implement FIFO/lossless



λOverview
 λShared data store
 λCache with flush and consistency protocol over FIFO channels



## The problem

| P1: W1(            | x)a                            |                                   |                                   |              |                |
|--------------------|--------------------------------|-----------------------------------|-----------------------------------|--------------|----------------|
| P2:                | R2(x)a W2(                     | x)b                               | •                                 | W1(x)a       |                |
| P3:                |                                | R3(x)a R3(x)                      | )b                                |              |                |
| P4:                |                                | R4(x)a R4(x)                      | b                                 | P1 P2<br>x=a | P <sub>3</sub> |
| <b>P</b> 1<br>x=a; | <b>P2</b><br>if (x==a)<br>x=b; | <b>P3</b><br>switch(x)<br>case a: | <b>P4</b><br>switch(x)<br>case a: | P4           | · · · · · ·    |
|                    |                                | case b:                           | case b:                           |              |                |
|                    |                                |                                   |                                   |              |                |

## The problem

| P1: W1(x)a         |                                 |                                   |                                   |     |
|--------------------|---------------------------------|-----------------------------------|-----------------------------------|-----|
| P2:                | R2(x)a W2(x                     | ()b                               | <b>P</b>                          |     |
| P3:                |                                 | R3(x)a R3(x                       | )b                                |     |
| P4:                |                                 | R4(x)a R4(x                       | )b                                | P1  |
| <b>P</b> 1<br>x=a; | <b>P</b> 2<br>if (x==a)<br>x=b; | <b>P3</b><br>switch(x)<br>case a: | <b>P4</b><br>switch(x)<br>case a: | x=a |
|                    |                                 | case b:<br>                       | case b:<br>                       |     |

write request R2(x)a

W2(x)b

x=b

P3

P2

x=a

x=b

## **Consistency Models**

Consistency definition
 Lessentially a contract between processes and a data store
 Different models are possible
 *Low would we describe the behavior of our shared store?*



Defined by Lamport (1979)

Memory works as expected with multiple processes

The result of any execution is the same as if the read and write operations by all processes on the data store were executed in some sequential order and the operations of each individual process appear in this sequence in the order specified by its program.

| P1: | W1(x)a |        |        |
|-----|--------|--------|--------|
| P2: | W2(x)b |        |        |
| P3: |        | R3(x)b | R3(x)a |
| P4: |        | R4(x)b | R4(x)a |

sequentially consistent

possible equivalent sequential order:

W2(x)b R3(x)b R4(x)b W1(x)a R3(x)a R4(x)a

| P1: | W1(x)a |        |        |
|-----|--------|--------|--------|
| P2: | W2(x)b |        |        |
| P3: |        | R3(x)b | R3(x)a |
| P4: |        | R4(x)b | R4(x)a |



Different possible equivalent sequential orders, as if executed by a single processor, on the same memory content



Different possible equivalent sequential orders, as if executed by a single processor, on the same memory content



The result of any execution is the same as if the read and write operations by all processes on the data store were executed in some sequential order and the operations of each individual process appear in this sequence in the order specified by its program.











**Non-Sequential Executions** 

λlmpossible when a totally-ordered multicast is used to multicast updates λln other words, the system we just built only permit sequential executions



**Weaker Consistency** 

A Harder to use, potentially more parallelism

 $_{\lambda}$ Causality

-If an event E2 is caused or may be influenced by an event E1

-Causality requires that everyone sees the event E1 before the event E2

Writes that are **potentially** causally related must be seen by all processes in the same order. Concurrent writes may be seen in a different order on different machines.





removed the causal dependency: R2(x)a

| P1: | W1(x)a             |                    |        |                | _                                 |
|-----|--------------------|--------------------|--------|----------------|-----------------------------------|
| P2: |                    | W2(x)b             |        |                | <b>F</b>                          |
| P3: |                    |                    | R3(x)b | R3(x)a         |                                   |
| P4: |                    |                    | R4(x)a | R4(x)b         |                                   |
|     | <b>P</b> 1<br>x=a; | <b>P</b> 2<br>x=b; |        | tch(x)<br>e a: | <b>P4</b><br>switch(x)<br>case a: |
|     |                    |                    | cas    | e b:           | case b:                           |

#### **Sequential Consistency:**

both **P3** and **P4** see (a,b) both **P3** and **P4** see (b,a)

#### **Causal Consistency:**

**P**<sub>3</sub> see either (a,b) or (b,a) **P**<sub>4</sub> see either (a,b) or (b,a) Causal consistency covers 99% of the natural programming cases...

#### If the assignments of P1 and P2 are not ordere how could the rest of the program depends on that order?



Obviously, the two writes are causally depende Hence they must be seen in the order...

Since the assignments of P1 and P2 are ordere it would be obviously wrong to see the order (b on process P3 and P4

## **Peterson's Algoritm**

```
void lock(int i) {
    flags[i] = true;
    last = i;
    while (flags[1-i] && last==i);
}
```

```
void unlock(int i) {
    flags[i] = false;
```

An example of a code that does not work on a memory with causal consistency...

Because writes are causally independent, they can be seen in different orders by different processes



### **Peterson's Algorithm**



**Design** 

Requires that each process keeps tracks of which write operations it has seen -One may use vector clocks for this

 $_{\lambda}$ Replica coherence

#### -One vector clock per data item

Clock ticks on writes (as in causally-ordered multicast)
 On local writes, multicast update messages
 Causally-ordered multicast of the value
 Timestamped with the vector clock







## **Eventual Consistency**

**AEventual Consistency** 

Weaker consistency, but rather easy to use

-Corresponds to a class of systems with simpler requirements DNS example:

-Everybody reads, only the domain owner updates DNS records

alt is ok to read out of date records for a while

λUse lazy background update messages

-Eventually, copies will get consistent

Web example:

-Same reality about the Web and web page updates

-Even including in-network caching à la Akamai

Are they related?

<sup>a</sup>Consistency defines the behavior of your "distributed memory"

<sup>λ</sup>Synchronization provides tools to control a "distributed execution" λAbsolutely

<sup>λ</sup>Even with a sequentially consistent memory, one needs mutual exclusion <sup>λ</sup>Consistency protocols applied on every memory operations are costly <sup>λ</sup>Considered memory consistency and synchronization is more efficient

Let's discuss the Compare-And-Swap instruction...

Jused in multi-core environment to implement mutual exclusion and other locks

 $_{\lambda}$  To illustrate the difference between concurrency control and memory consistency  $_{\lambda}CAS$  principle

 $_{\lambda}$ C-like code given below

aOnly works if the CAS instruction is atomic (hardware locking the memory bus)

```
boolean CAS(int *lock, int value, int new_value) {
    if (*lock==value) {
        *lock = new_value;
        return true;
        }
    return false;
}
void lock(int *lock, int owner) {
    while(!CAS(*lock, EMPTY, owner);
}
```



**ABasic Idea** 

Avoid double work... since we need both consistency and synchronization protocols For instance, we have since that acquiring a lock and sequential consistency both require something based on the same technique as our totally-ordered multicast based on logical clocks

 $_{\lambda}$ Principles

Associate a monitor with one or more data items

-Called protected data items

aCoordinate consistency and synchronization protocols

-Monitor operations must respect sequential consistency

Enter and leave critical sections are seen in the same order by all processes -Data consistency

All writes on protected items must be visible locally before one aenters the critical section

Accesses (reads or writes) on protected items outside the critical asection are undefined

#### Examples

| р1 | E(x) W(x)a W(x)b L(x) |                  |
|----|-----------------------|------------------|
| p2 | R(x)a                 | R(x)b E(x) R(x)b |
| р3 |                       | R(x)a E(x) R(x)b |
| p4 |                       | R(x)b E(x) R(x)b |

possible execution



#### **Release Consistency**



## **Entry Consistency**



P1: E(x) W1(x)a L(x)

## **Discussing Consistency**

P3:

**JUndefined Semantics** Accessing protected data items outside critical sections stale read because it is not within a critical section for data item y P1: E(x)E(y) W1(x)a W1(y)b L(y)L(x) **x(a)** y(b) P2:  $R_2(x)a R_2(y)NIL E(y)$ E(x) R2(y)b L(y)L(x)

E(y)

**y(b)** 

R2(y)b L(y)