Skip to content

OpenStack Swift Concepts

Cristi Pufu edited this page Nov 6, 2015 · 1 revision

Intro - What is Swift?

Swift consists of a collection of services (the OpenStack Object Storage services) that work together to provide object storage and retrieval through a REST API. It's ideal for storing data that can grow without bound, being built for high scalability and optimized for durability, availability and concurrency.

Components

Basically, a Swift cluster consists of two types of nodes: Proxy and Storage nodes.

  • Proxy nodes provide an external interface to the Swift backend, performing authentication and providing a REST API to access or alter the data. You can setup multiple proxy nodes and load balance them using a standard load balancer.

  • Storage nodes consist of a number of storage devices and are used to store the actual Swift entities (data items) such as accounts, containers and objects. It is recommended you make this node accessible only via the proxy node. A Swift cluster can have multiple accounts, an account can have multiple containers, and a container can have multiple objects. Containers cannot be nested, so a container can only contain objects, not other containers.

A Proxy node includes the following components:

  • Proxy Servers that handle http requests to create containers, upload or download objects (files) and modify metadata. They can use an optional cache (memcache) to improve performance. The proxy service relies on an authentication and authorization mechanism such as an Identity Service (Keystone, Swauth, Swift3). However, it also offers an internal mechanism that allows it to operate without any other OpenStack services (TempAuth). It basically works like this: you make an authentication request and receive an auth token. You use that token as a request header to authorize your requests. The token does not change from request to request, but it does expire. Swift will make calls to the auth system to validate it and will receive a response with an overall expiration time (in seconds from now) and will cache the token up to that expiration time.

  • The Rings determine where data should reside in the cluster. There is a separate ring for accounts, containers and objects but they all work the same. They are managed by a utility called the swift-ring-builder and they are a little more than sqlite databases which contain (among other things) a mapping between the hashed id of a Swift entity and it's physical location. The storage nodes could (and should) be grouped in a single failure domain (based on the physical locations, network separations or any other attribute that would lessen multiple replicas being unavailable in the same time) using regions and zones, as data is never replicated to another disk or server in the same zone.

A Storage node consists of the following components:

  • Object Servers that manage the actual objects (files) on the storage nodes. They are very simple blob storage services that save, retrieve and delete objects. The objects are stored as binary files with metadata stored in the file's extended attributes (xattrs). Each object is stored using a path composed from the hashed filename and the timestamp.

  • Container Servers that manage the mappings of containers. They are responsible for listing objects. They don't know where the objects actually are, just what objects are in a specific container. They are stored as sqlite database files and replicated across the cluster. They also track statistics about the total number of objects and total number of bytes usage for that container

  • Account Servers that manage the mappings of accounts. They are responsible for listing containers associated to a specific account. They also track statistics about the total number of containers, and storage usage.

  • Other periodic processes such as a Replication Process that is designed to keep the system in a consistent state. The updates are push based using RSYNC and quickly done using hashes to compare objects. Auditors crawl the local server checking the integrity of objects, containers and accounts. If a file is corrupted, the replication process will replace the bad file with another replica. When a file is deleted the Reaper Process replaces the file with a 0 byte tombstone file with the same name, ensuring that the replication process doesn't replace the deleted file.


Ubuntu - General configuration