Skip to content

Parcels: What and Why?

philipl edited this page Feb 19, 2014 · 1 revision

What are Parcels?

Parcels are an alternative binary distribution format, supported for the first time in Cloudera Manager 4.5, originally developed to provide improved lifecycle management for CDH. There are a few notable differences between parcels and traditional CDH rpm/deb packages:

  • CDH is provided as a single package. In contrast to having a separate package for each part of CDH, when using Cloudera Manager 4.5 and later, there is just a single parcel to install.
  • Parcels can be installed side-by-side. Each parcel is self-contained and installed in a separate versioned directory. This means that multiple versions of a given parcel can be installed at the same time. You can then select one of these installed versions as the “active” one. (With traditional CDH packages, only one package can be installed at a time so there’s no distinction between what’s “installed” and what’s “active”.)
  • Parcels can run from arbitrary locations. Parcels can be installed at any location in the filesystem.
  • Parcels are gzipped tar files with metadata. From a strict implementation point of view, a parcel is simply a tarball containing the program files, along with some additional metadata that allows Cloudera Manager to understand what it is and how to use it.

What are the benefits of parcels?

As a consequence of the functional characteristics noted above, parcels offer a number of benefits:

  • Simplified distribution: As a parcel is a single file, it’s much easier to move around than the dozens of packages that make up CDH. This is especially useful when managing a cluster that isn’t connected to the Internet.
  • Internal consistency: By distributing CDH as a single parcel, we can help ensure that all CDH components are properly matched and that there isn’t a danger of different parts coming from different versions of CDH.
  • Installation outside of /usr: In some IT environments, Hadoop admins do not have privileges to install system packages. In the past, these admins had to fall back to CDH tarballs, which deprived them of a lot of infrastructure that packages provide. With parcels, admins can install to /opt or anywhere else without having to step through all the additional manual steps of regular tarballs.
  • Installation of CDH without sudo: Parcel installation is handled by the CM Agent already running as root so it’s possible to install CDH without needing sudo, which can be very helpful.
  • Decoupling of distribution from activation: Thanks to side-by-side install capabilities delivered by parcels, it is now possible to stage a new version of CDH across the cluster in advance of switching over to it. This allows the longest running part of an upgrade to be done ahead of time without affecting cluster operations, consequently reducing upgrade downtime.
  • Rolling upgrades: With the new version staged side-by-side, switching to a new minor version is simply a matter of changing which version of CDH is used when restarting each process. It then becomes practical to do upgrades with rolling restarts, where service roles are restarted in the right order to switch over to the new version with minimal service interruption. Note that major version upgrades (CDH3 -> CDH4) require full service restarts due to the substantial changes between the versions.
  • Easy downgrades: With the old version still available, moving back to it can be as simple as upgrading. (Note that some CDH components may require explicit additional steps due to things like schema upgrades.)

What new capabilities in Cloudera Manager are premised on parcels?

Thanks to the introduction of parcels, a host of new capabilities are now delivered by Cloudera Manager:

  • End-to-end deployment life-cycle management: Starting with 4.5, Cloudera Manager can now fully manage all the steps involved in a CDH version upgrade. (In contrast, with traditional packages, Cloudera Manager can only help with initial installation.)

Parcel Lifecycle

  • Download: Parcels are published to Cloudera’s repository. Cloudera Manager will then download the parcel to the CM Server machine.

  • Distribution: Once the Server has the parcel, Cloudera Manager can distribute the parcel out to all the hosts in the cluster. This process can be tuned in terms of how many hosts receive the parcel at the same time and the total aggregate bandwidth used for the process.

  • Activation: Once a parcel is distributed, you can activate it. Once activated, it will be used for any processes that are subsequently started or restarted.

  • Deactivation: Similarly, a parcel can be deactivated (and will automatically be deactivated if another one is activated).

  • Removal: This is the reverse of distribution. A parcel that has been deactivated and is not serving any current processes is eligible for removal from the hosts in the cluster.

  • Deletion: Finally, once removed from the cluster, the parcel can be deleted from the CM server, which completes the life-cycle of a parcel.

  • The following screenshot shows:

    • One active CDH and one active Impala parcel
    • One CDH parcel being downloaded
    • One CDH parcel being distributed
    • One CDH parcel available for download

Parcel Management UI

  • End-to-end capabilities are optional: If there are specific reasons to use other tools for download and/or distribution, you can do so, and Cloudera Manager will work alongside your other tools. For example, you can handle distribution with something like Puppet. Or, if you want to download the parcel to CM Server manually (perhaps because your cluster has no Internet connectivity) and then have Cloudera Manager distribute the parcel to the cluster, you can do that too.
  • Rolling upgrades: These are only possible with parcels, thanks to their side-by-side nature. Traditional packages would require shutting down the old process, upgrading the package, and then starting the new process. This can be hard to recover from in the event of errors and requires extensive integration with the package management system to function seamlessly.
  • Distributing additional components: Parcels are not limited to CDH. Impala is available as a parcel too and we’ve just published an LZO parcel that provides the LZO plugins for both Hadoop and Impala.