Skip to content

Commit

Permalink
[Storage] Consolidate storage implementations + drop libcloud depende…
Browse files Browse the repository at this point in the history
…ncy (#640)
  • Loading branch information
rzvoncek authored Sep 19, 2023
1 parent 6f9452e commit e653bc6
Show file tree
Hide file tree
Showing 37 changed files with 1,387 additions and 1,764 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,7 @@ jobs:
pip install ccm
case '${{ matrix.it-backend }}' in
'azure'|'azure-hierarchical')
pip install -r requirements-azure.txt
echo "No extra requirements for now."
;;
'ibm'|'minio'|'s3')
echo "No extra requirements for now."
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ Medusa is a command line tool that offers the following features:
* Cluster wide in place restore (restoring on the same cluster that was used for the backup)
* Cluster wide remote restore (restoring on a different cluster than the one used for the backup)
* Backup purge
* Support for local storage, Google Cloud Storage (GCS) and AWS S3 through [Apache Libcloud](https://libcloud.apache.org/). Can be extended to support other storage providers supported by Apache Libcloud.
* Support for local storage, Google Cloud Storage (GCS), Azure Blob Storage and AWS S3 (and its compatibles)
* Support for clusters using single tokens or vnodes
* Full or differential backups

Expand Down
6 changes: 3 additions & 3 deletions docs/Installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,11 @@ Running the installation using `sudo` is necessary to have the `/usr/local/bin/m
If your Cassandra servers do not have internet access:

- on a machine with the same target os and python version, clone the cassandra-medusa repo and cd into the root directory
- run `mkdir pip_dependencies && pip download -r requirements.txt -d medusa_dependencies` to download the dependencies into a sub directory (do the same thing with requirements-azure.txt if that's your storage backend)
- run `cp requirements.txt medusa_dependencies/` (plus requirements-azure.txt)
- run `mkdir pip_dependencies && pip download -r requirements.txt -d medusa_dependencies` to download the dependencies into a sub directory
- run `cp requirements.txt medusa_dependencies/`
- run `tar -zcf medusa_dependencies.tar.gz medusa_dependencies` to compress the dependencies
- Upload the archive to all Cassandra nodes and decompress it
- run `pip install -r medusa_dependencies/requirements.txt --no-index --find-links` to install the dependencies on the nodes (do the same thing with requirements-azure.txt depending on your storage)
- run `pip install -r medusa_dependencies/requirements.txt --no-index --find-links` to install the dependencies on the nodes
- install Medusa using `python setup.py install` from the cassandra-medusa source directory

#### Example of Offline installation using pipenv on RHEL, centos 7
Expand Down
2 changes: 1 addition & 1 deletion docs/minio_setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ secure = False
```

Medusa should now be able to access the bucket and perform all required operations.
*Note: By default, MinIO and other self hosted S3 compatible storage systems can only be used in unsecured (non SSL) mode with Medusa due to limitations in Apache Libcloud. To enable SSL access to self hosted S3 compatible storage systems, you will need to set the environment variable `SSL_CERT_FILE` to the path of a valid certificate file containing trusted CA certificates of your S3 service. In order to get cluster wide commands working properly, you will need to set this in the `/etc/default/cassandra-medusa` file on all nodes running Medusa containing:*
*Note: By default, MinIO and other self hosted S3 compatible storage systems might work with secured (SSL) connections. To enable SSL access to self hosted S3 compatible storage systems, you will need to set the environment variable `SSL_CERT_FILE` to the path of a valid certificate file containing trusted CA certificates of your S3 service. In order to get cluster wide commands working properly, you will need to set this in the `/etc/default/cassandra-medusa` file on all nodes running Medusa containing:*

```bash
export SSL_CERT_FILE=/path/to/certfile
Expand Down
4 changes: 0 additions & 4 deletions k8s/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -37,10 +37,6 @@ RUN python3 -m pip install -U pip && pip3 install --ignore-installed --user \
-r /build/requirements.txt \
-r /build/requirements-grpc-runtime.txt

# Azure
RUN pip3 install --ignore-installed --user -r /build/requirements-azure.txt \
&& pip3 install --ignore-installed --user azure-cli

# Build medusa itself so we can add the executables in the final image
RUN pip3 install --ignore-installed --user /build

Expand Down
6 changes: 1 addition & 5 deletions k8s/Dockerfile-azure
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM ubuntu:18.04 as base
k8s/Dockerfile-azure FROM ubuntu:18.04 as base

RUN mkdir /install
WORKDIR /install
Expand Down Expand Up @@ -37,10 +37,6 @@ RUN python3 -m pip install -U pip && pip3 install --ignore-installed --user \
-r /build/requirements.txt \
-r /build/requirements-grpc-runtime.txt

# Azure
RUN pip3 install --ignore-installed --user -r /build/requirements-azure.txt \
&& pip3 install --ignore-installed --user azure-cli

# Build medusa itself so we can add the executables in the final image
RUN pip3 install --ignore-installed --user /build

Expand Down
113 changes: 60 additions & 53 deletions medusa/backup_cluster.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,77 +30,84 @@

def orchestrate(config, backup_name_arg, seed_target, stagger, enable_md5_checks, mode, temp_dir,
parallel_snapshots, parallel_uploads, orchestration_snapshots=None, orchestration_uploads=None,
cassandra_config=None, monitoring=None, storage=None, cql_session_provider=None):
cassandra_config=None, monitoring=None, existing_storage=None, cql_session_provider=None):
backup = None
backup_name = backup_name_arg or datetime.datetime.now().strftime('%Y%m%d%H%M')
monitoring = Monitoring(config=config.monitoring) if monitoring is None else monitoring
try:
backup_start_time = datetime.datetime.now()
if not config.storage.fqdn:
err_msg = "The fqdn was not provided nor calculated properly."
logging.error(err_msg)
raise Exception(err_msg)

if not temp_dir.is_dir():
err_msg = '{} is not a directory'.format(temp_dir)
logging.error(err_msg)
raise Exception(err_msg)
if existing_storage is None:
storage = Storage(config=config.storage)
else:
storage = existing_storage

with storage as storage:

try:
# Try to get a backup with backup_name. If it exists then we cannot take another backup with that name
storage = Storage(config=config.storage) if storage is None else storage
cluster_backup = storage.get_cluster_backup(backup_name)
if cluster_backup:
err_msg = 'Backup named {} already exists.'.format(backup_name)
backup_start_time = datetime.datetime.now()
if not config.storage.fqdn:
err_msg = "The fqdn was not provided nor calculated properly."
logging.error(err_msg)
raise Exception(err_msg)
except KeyError:
info_msg = 'Starting backup {}'.format(backup_name)
logging.info(info_msg)

backup = BackupJob(config, backup_name, seed_target, stagger, enable_md5_checks, mode, temp_dir,
parallel_snapshots, parallel_uploads, orchestration_snapshots, orchestration_uploads,
cassandra_config)
backup.execute(cql_session_provider)
if not temp_dir.is_dir():
err_msg = '{} is not a directory'.format(temp_dir)
logging.error(err_msg)
raise Exception(err_msg)

backup_end_time = datetime.datetime.now()
backup_duration = backup_end_time - backup_start_time
try:
# Try to get a backup with backup_name. If it exists then we cannot take another backup with that name
cluster_backup = storage.get_cluster_backup(backup_name)
if cluster_backup:
err_msg = 'Backup named {} already exists.'.format(backup_name)
logging.error(err_msg)
raise Exception(err_msg)
except KeyError:
info_msg = 'Starting backup {}'.format(backup_name)
logging.info(info_msg)

logging.debug('Emitting metrics')
backup = BackupJob(config, backup_name, seed_target, stagger, enable_md5_checks, mode, temp_dir,
parallel_snapshots, parallel_uploads, orchestration_snapshots, orchestration_uploads,
cassandra_config)
backup.execute(cql_session_provider)

logging.info('Backup duration: {}'.format(backup_duration.total_seconds()))
tags = ['medusa-cluster-backup', 'cluster-backup-duration', backup_name]
monitoring.send(tags, backup_duration.total_seconds())
backup_end_time = datetime.datetime.now()
backup_duration = backup_end_time - backup_start_time

tags = ['medusa-cluster-backup', 'cluster-backup-error', backup_name]
monitoring.send(tags, 0)
logging.debug('Emitting metrics')

logging.debug('Done emitting metrics.')
logging.info('Backup of the cluster done.')
logging.info('Backup duration: {}'.format(backup_duration.total_seconds()))
tags = ['medusa-cluster-backup', 'cluster-backup-duration', backup_name]
monitoring.send(tags, backup_duration.total_seconds())

except Exception as e:
tags = ['medusa-cluster-backup', 'cluster-backup-error', backup_name]
monitoring.send(tags, 1)
tags = ['medusa-cluster-backup', 'cluster-backup-error', backup_name]
monitoring.send(tags, 0)

logging.error('This error happened during the cluster backup: {}'.format(str(e)))
traceback.print_exc()
logging.debug('Done emitting metrics.')
logging.info('Backup of the cluster done.')

if backup is not None:
err_msg = 'Something went wrong! Attempting to clean snapshots and exit.'
logging.error(err_msg)
except Exception as e:
tags = ['medusa-cluster-backup', 'cluster-backup-error', backup_name]
monitoring.send(tags, 1)

delete_snapshot_command = ' '.join(backup.cassandra.delete_snapshot_command(backup.snapshot_tag))
pssh_run_success_cleanup = backup.orchestration_uploads\
.pssh_run(backup.hosts,
delete_snapshot_command,
hosts_variables={})
if pssh_run_success_cleanup:
info_msg = 'All nodes successfully cleared their snapshot.'
logging.info(info_msg)
else:
err_msg_cleanup = 'Some nodes failed to clear the snapshot. Cleaning snapshots manually is recommended'
logging.error(err_msg_cleanup)
sys.exit(1)
logging.error('This error happened during the cluster backup: {}'.format(str(e)))
traceback.print_exc()

if backup is not None:
err_msg = 'Something went wrong! Attempting to clean snapshots and exit.'
logging.error(err_msg)

delete_snapshot_command = ' '.join(backup.cassandra.delete_snapshot_command(backup.snapshot_tag))
pssh_run_success_cleanup = backup.orchestration_uploads\
.pssh_run(backup.hosts,
delete_snapshot_command,
hosts_variables={})
if pssh_run_success_cleanup:
info_msg = 'All nodes successfully cleared their snapshot.'
logging.info(info_msg)
else:
err_msg_cleanup = 'Some nodes failed to clear the snapshot. Please clean snapshots manually'
logging.error(err_msg_cleanup)
sys.exit(1)


class BackupJob(object):
Expand Down
88 changes: 43 additions & 45 deletions medusa/backup_node.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,15 +22,15 @@
import traceback
import psutil

from libcloud.storage.providers import Provider
from retrying import retry

import medusa.utils
from medusa.backup_manager import BackupMan
from medusa.cassandra_utils import Cassandra
from medusa.index import add_backup_start_to_index, add_backup_finish_to_index, set_latest_backup_in_index
from medusa.monitoring import Monitoring
from medusa.storage import Storage, format_bytes_str, ManifestObject
from medusa.storage import Storage, format_bytes_str
from medusa.storage.abstract_storage import ManifestObject


class NodeBackupCache(object):
Expand Down Expand Up @@ -91,7 +91,7 @@ def replace_or_remove_if_cached(self, *, keyspace, columnfamily, srcs):
else:
fqtn = (keyspace, columnfamily)
cached_item = None
if self._storage_provider == Provider.GOOGLE_STORAGE or self._differential_mode is True:
if self._storage_provider.lower() == 'google_storage' or self._differential_mode is True:
cached_item = self._cached_objects.get(fqtn, {}).get(self._sanitize_file_path(src))

threshold = self._storage_config.multi_part_upload_threshold
Expand Down Expand Up @@ -172,48 +172,46 @@ def handle_backup(config, backup_name_arg, stagger_time, enable_md5_checks_flag,
backup_name = backup_name_arg or start.strftime('%Y%m%d%H%M')
monitoring = Monitoring(config=config.monitoring)

try:
logging.debug("Starting backup preparations with Mode: {}".format(mode))
storage = Storage(config=config.storage)
cassandra = Cassandra(config)

storage.storage_driver.prepare_upload()

differential_mode = False
if mode == "differential":
differential_mode = True

node_backup = storage.get_node_backup(
fqdn=config.storage.fqdn,
name=backup_name,
differential_mode=differential_mode
)
if node_backup.exists():
raise IOError('Error: Backup {} already exists'.format(backup_name))

# Starting the backup
logging.info("Starting backup using Stagger: {} Mode: {} Name: {}".format(stagger_time, mode, backup_name))
BackupMan.update_backup_status(backup_name, BackupMan.STATUS_IN_PROGRESS)
info = start_backup(storage, node_backup, cassandra, differential_mode, stagger_time, start, mode,
enable_md5_checks_flag, backup_name, config, monitoring)
BackupMan.update_backup_status(backup_name, BackupMan.STATUS_SUCCESS)

logging.debug("Done with backup, returning backup result information")
return (info["actual_backup_duration"], info["actual_start_time"], info["end_time"],
info["node_backup"], info["node_backup_cache"], info["num_files"],
info["start_time"], info["backup_name"])

except Exception as e:
logging.error("Issue occurred inside handle_backup Name: {} Error: {}".format(backup_name, str(e)))
BackupMan.update_backup_status(backup_name, BackupMan.STATUS_FAILED)

tags = ['medusa-node-backup', 'backup-error', backup_name]
monitoring.send(tags, 1)
medusa.utils.handle_exception(
e,
"Error occurred during backup: {}".format(str(e)),
config
)
with Storage(config=config.storage) as storage:
try:
logging.debug("Starting backup preparations with Mode: {}".format(mode))
cassandra = Cassandra(config)

differential_mode = False
if mode == "differential":
differential_mode = True

node_backup = storage.get_node_backup(
fqdn=config.storage.fqdn,
name=backup_name,
differential_mode=differential_mode
)
if node_backup.exists():
raise IOError('Error: Backup {} already exists'.format(backup_name))

# Starting the backup
logging.info("Starting backup using Stagger: {} Mode: {} Name: {}".format(stagger_time, mode, backup_name))
BackupMan.update_backup_status(backup_name, BackupMan.STATUS_IN_PROGRESS)
info = start_backup(storage, node_backup, cassandra, differential_mode, stagger_time, start, mode,
enable_md5_checks_flag, backup_name, config, monitoring)
BackupMan.update_backup_status(backup_name, BackupMan.STATUS_SUCCESS)

logging.debug("Done with backup, returning backup result information")
return (info["actual_backup_duration"], info["actual_start_time"], info["end_time"],
info["node_backup"], info["node_backup_cache"], info["num_files"],
info["start_time"], info["backup_name"])

except Exception as e:
logging.error("Issue occurred inside handle_backup Name: {} Error: {}".format(backup_name, str(e)))
BackupMan.update_backup_status(backup_name, BackupMan.STATUS_FAILED)

tags = ['medusa-node-backup', 'backup-error', backup_name]
monitoring.send(tags, 1)
medusa.utils.handle_exception(
e,
"Error occurred during backup: {}".format(str(e)),
config
)


def start_backup(storage, node_backup, cassandra, differential_mode, stagger_time, start, mode,
Expand Down
Loading

0 comments on commit e653bc6

Please sign in to comment.