Alluxio 정리 중

Alluxio란?

Memory-speed Virtual Distributed Storage

Memory-speed: memory-speed access to data
Virtual: virtualized across different storage types under a unified namespace
Distributed: scale out architecture
Storage: file system API Software Only

Architecture

기본적으로 Alluxio는 하나의 primary master, 다수의 worker로 구성 (Alluxio Server)
크게 3개의 components로 구성: master, worker, client
- master, workers: Alluxio서버 - 유지보수/관리를 하는 system admin 역할
- clients: application(such as Spark, MR jobs, FUSE layer)에 위치하며 Alluxio 서버와 통신을 가능케 한다.

arch-overview

Master

primary master와 여러개의 standby master(for fault tolerance)로 구성.
primary master 다운되면, standby master중 하나가 leader master(primary master)가 됨

간략 정리
1. 통합된 namespace 유지 (ex: file system tree)
2. 사용가능한 worker에 대해 지속적 기록/관찰

arch-master

Primary Master

전체 system의 metadata를 전체적으로 관리 (ex: file system metadata, block metadata, workers metatadata)
client는 metatdata를 읽고 변경하기 위해 leader master랑 의사소통
모든 worker는 주기적으로 heartbeat정보를 primary master에게 보낸다. (클러스터에서 worker의 참여를 유지하기 위해)

Standby Master

standby master는
1. primary master가 작성한 journal을 재생한다. (journal: alluxio가 metadata operation을 유지하기 위한 것)
2. 주기적으로
3. 빠른 recovery를 위해 checkpoint를 쓴다.
periodically squashes the journal entires
writes checkpoints for faster recovery in the future
it does not process any requests from any Alluxio components

Worker

alluxio에 할당된 user-configurable local resources를 관리한다. (memory, ssd, hdd .. etc)
alluxio worker는 data를 block의 형태로 저장한다.
worker는 local resource를 포함한 새로운 block을 만들고 읽음으로써 data를 읽고 쓰기위한 client request를 다룬다. (workers serve client requests that read or write data by reading or creating new blocks within its local resources.)
worker는 block안에 있는 data만을 책임진다. (file과 block의 실제 mapping은 master에만 저장 된다.)
Alluxio worker는 under storage에서 data operation을 수행한다.
- 중요포인트:
  1. under storage로부터 읽은 data는 worker에 저장되고, client는 data를 즉시 사용이 가능하다.
  2. client can be lightweight 그래서, under storage connector의 영향을 받지 않는다.
간략 정리
1. 해당 local resource 관리
2. data를 저장
3. under storage에서 data 가져옴
4. client의 요청에 응답
5. master에게 heartbeat를 주기적으로 리포트

RAM은 제한적이기 때문에 worker에 있는 block은 공간이 full일때, 방출된다. worker는 eviction policies(방출정책)에 따라 Alluxio space를 유지한다.

arch-worker

Client

Alluxio client는 Alluxio server와 의사소통하기 위한 gateway를 user에게 제공한다.
client는 metadata operation을 수행하기 위해 primary master와 의사소통 한다.
client는 alluxio에 저장된 data를 읽고 쓰기 위해 worker와 의사소통 한다.
client는 native filesystem api in java를 제공하고 다양한 client language(REST, go, + python)을 서포트한다.
alluxio는 HDFS API와 Amazon S3 API와 호환이 되는 API를 지원한다.

정리

Decoupling
- physical storage로 부터 app 분리 가능
- app은 alluxio와의 연결만 있으면 되고, 자동으로 alluxio로 부터 지원되는 physical storage를 지원한다.
- Alluxio가 다양한 인터페이스를 제공함(HDFS, key/value, file system interface)으로 간단하게 통합할 수 있다.
Speed
- Alluxio는 app과 physical storage에 위치해 있고, 실제 storage에서 data를 가져오는 것처럼, in-memory에 data를 저장 할수 있고 가져올 수도 있다.
- Alluxio는 메모리 뿐만 아니라 SSD 및 Disk를 위한 계층형 스토리지도 지원한다.
Names
- Unified naming은 실제로 disk를 file system에 마운트하는 것과 같은 방식으로 작동한다.
  - alluxio://hostname:port
  - hdfs://hostname:port
  - s3n://hostname:port

Data Flow in Alluxio

기본적인 Alluxio의 동작에 대한 설명 (read, write)

Data Read

Alluxio는 under storage와 computation framework사이에서 data reads를 위한 Caching layer 역할을 한다.

Local Cache Hit

data의 위치가 worker의 위치와 같을때
requested data가 local Alluxio worker에 있을때 일어난다. (computation은 local cache hit을 얻는다.)
application이 Alluxio client를 통해 data access 요청할때, client는 Alluxio master와 함께 data가 있는 worker 위치를 체크한다.
data가 local에 위치한다면, Alluxio client는 Alluxio worker를 지나치기 위해 short-circuit read를 사용하고, local filesystem(RAM)에서 직접 file을 읽는다.
short-circuit read는 TCP소켓을 통한 data transfer를 회피하고, memory speed의 data access를 제공한다.
Short-circuit는 Alluxio에서 data를 읽기 위한 가장 효과적인 방법이다.
기본적으로, short-circuit read는 허용된 permission을 요구하는 local filesystem operation을 사용한다.
때때로, worker와 client가 dockerize될때 불가능하다. (부정확한 resource acccounting 때문)
short circuit이 불가능할때, Alluxio는 worker가 미리 디자인된 domain socket path를 통 data를 client에게 전달하기위해 short circuit에 위치한 domain socket을 제공한다. Running Alluxio on Docker.

dataflow-local

Remote Cache Hit

data의 위치가 local worker에 없고, 다른 worker에 있을때
data가 local Alluxio worker에서 없고 cluster상에서 다른 Alluxio worker에 위치한다면, Alluxio client는 다른 머신의 worker로 부터 data를 읽는다.
client는 master에서 확인하고, remote worker로 부터 data가 가능한지를 찾는다.
local worker는 remote worker로부터 data를 읽고, client에게 data를 넘긴다.
worker는 또한 copy를 로컬에 쓰며 이후 같은 data를 read할때 memory에서 local로 제공 된다.
Remote Cache hit은 network-speed의 data read를 제공한다.
Alluxio는 under storage보다 remote worker를 먼저 읽는 것을 우선순위로 한다. 왜냐면 worker와 under storage사이의 속도보다 더 빠르기 때문

dataflow-remote

Cache Miss

Alluxio내에서 data호출이 불가능 할때.
client는 읽기 작업을 local worker에게 넘기고 그 worker는 under storage에서 데이터를 읽는다.
worker는 차후 read 작업을 위해, data를 local 메모리에 저장하고 관련 정보를 client에 전달한다.
주로 data를 처음 읽을 때 cache miss가 발생한다.

dataflow-cache-miss

Data Write

Alluxio API 또는 client에 있는 property 설정(alluxio.user.file.writetype.default)을 통해 write type이 결정된다.

MUST_CACHE (default)

write type 중 MUST_CACHE는 기본값이며 Alluxio Client는 local worker에만 쓴다. (under storage에는 쓰지 않는다.)
쓰기전에 client는 master에 metadata를 만들고, 빠른 쓰기가 가능하다.
client는 local RAM disk에 있는 파일에 직접 쓴다 (속도를 느리게 하는 network transfer를 피하기 위해 worker를 우회한다.) --> short-circuit write (memory speed 실행가능)
under storage에 data가 쓰여지지않은 상태에서 기기에러나 새로운 쓰기를 위한 free-up이 필요해지면, 데이터손실을 가져올 수 있다.

dataflow-must-cache

CACHE_THROUGH

Alluxio worker와 under storage에 data가 동기화 된다.
client는 write 작업을 하기위해 local worker을 대리자로 지정하고, 그 worker는 local memory와 under storage에 동시에 쓴다.
당연히 local storage보다 under storage에 쓰는게 훨씬 느림
client의 write speed는 under storage의 write speed와 같다.
CACHE_THROUGH는 data 보관을 위해 필요한 추천 write type이다.

dataflow-cache-through

ASYNC_THROUGH

ASYNC_THROUGH는 실험적인 write type
data가 동기로 alluxio worker에 쓰여지고, 비동기로 under storage에 쓰여진다.
data가 살아 있는 동안, data는 memory speed로 계속적으로 쓰여진다.
AYNC_THROUGH는 실험적인 feature로써, 몇가지 제한이 있다.
1. data가 under storage에 비동기로 저장되기 전에 machine 충돌이 발생하면 손실이 발생한다.
2. 모든 block은 같은 worker에 반드시 머물러야 한다.

dataflow-async-through

Caching

설정 config: alluxio.user.file.readtype.default=CACHE_PROMOTE(default), CACHE, NO_CACHE (참고)

default (CACHE_PROMOTE)

이미 alluxio storage에 data가 있는 경우, highest tier로 data를 이동시킨다.
data를 under storage에서 읽어야 할 경우, local alluxio의 highest tier에 data를 쓴다.

Partial Caching (CACHE)

local system에서 block 읽는게 불가능 할때, client에 의해 block의 일부만 요청되도 local worker가 block을 읽고 캐쉬한다.
block에서 필요한 부분만 client에게 전달
1.7이전버전
- partial caching이 설정되어 있으면, client가 전체 block을 동시에 읽고 캐시하므로 블록이 local worker에 완전히 캐시 될 때까지 client의 읽기 작업을 기다려야한다.
- client는 읽기를 worker에게 위임하고, worker는 block의 시작부터 끝까지 읽은 후, local RAM Disk에 쓴다.
- 그리고, worker는 client에 의해 요청된 block의 일부 data를 client에게 보낸다.
default가 on, off로 변경하려면 alluxio.user.file.cache.partially.read.block를 false 설정

No Caching (NO_CACHE)

Alluxio caching을 끄고, client가 under storage에서 직접 data를 읽는다. (property 설정: alluxio.user.file.readtype.default in the client to NO_CACHE)

Storage Unification and Abstraction

bigdata의 발전으로 많은 양의 data가 여러 회사에서 제공하는 다른 형태의 storage system에 저장된다. 여기서 문제는, 효율적인 측면에서 data의 통합연결된 view를 보기는 매우 힘들다.
Data Lake를 구축하는게 흔한 해결책이지만, data를 영구적으로 보관하려면, 많은 비용이 든다.
Alluxio의 Unified namespace feature를 통해, 다른 시스템에 접근을 용이하게 하고, computation framework와 under storage에 끊어짐 없는 연결을 가능케 한다.
Application은 under storage에 저장된 data에 접근하기위해 Alluxio하고만 통신한다.
Alluxio는 각기 다른 데이터 소스로 부터 모든 data의 통합적인 view를 제공하는 마치 "virtual data lake"와 같다. (not creating permanet copies of that data)

alluxio-unifies-access

"Virtual data lake"로써 사용되는 Alluxio의 장점

Unified access
- App은 모든 data를 위해 single system과 single namespace하고만 통신한다.
- App은 다른 system이 어떻게 data에 접근하는지 신경쓸 필요 없다.
No ETL
- Alluxio는 요구가 있을때만 존재하는 storage system으로부터 데이터를 pull한다.
Configuration Management
- application과 storage들은 특별한 설정이 필요없다. Alluxio에만 접속
Modern, flexible architecture
- alluxio unified namespace는 storage로부터 computing의 분리를 돕는다.
- 이런 타입의 architecture는 최신의 데이터 처리를 위한 resource의 더 큰유연함을 가능케 한다.
storage API Independence
- Alluxio는 HDFS, S3와 같은 common storage interface를 지원한다.
- Alluxio unified namespace를 사용해, Application은 Source data의 API와 무관하게 원하는 interface를 통해 모든 data에 접근 할 수 있다.
Performance
- local caching과 eviction strategy는 중요하고 자주사용되는 data에 빠른 local access를 제공한다. (w/o permanent copies of data)

Remote Data Acceleration

coupled compute-storage arcitecture는 compute엔진이 가까운 곳에서 data를 가져올 수 있게해준다. 하지만 이 architecture를 유지하고 관리하는 데

Alluxio

Install / Use

README