HDFS Snapshots

Documentation for Apache Hadoop 2.7.1 on HDFS Snapshots can be found here.

HDFS Snapshots are read-only point-in-time copies of the file system. They can be taken on any level of the file system.

Cases where snapshots can be useful:

  • Backup
  • Disaster recovery

Blocks and DataNodes are not copied. Block list and file size are recorded.

The snapshot data is computed by subtracting the modifications from the current data. The modifications are recorded in chronological order, so that the current data can be accessed directly.

In order to take snapshots, the directory has to be set as snapshottable. If there are snapshots in a snapshottable directory, the directory cannot be deleted nor renamed.

Snapshot path is stored under the snapshottable directory.
Example: directory /user is a snapshottable directory, which means snapshots can be found in /user/.snapshot.

 

Snapshot commands

Superuser privileges are needed for these commands.

Allow snapshot

sudo -u hdfs hdfs dfsadmin -allowSnapshot /user

Dissallow snapshot

Snapshots must be deleted before this command can successfully execute.

sudo -u hdfs hdfs dfsadmin -disallowSnapshot /user

Create snapshot

sudo -u hdfs hdfs dfs -createSnapshot /user s0

Result:

Created snapshot /user/.snapshot/s0

Delete snapshot

sudo -u hdfs hdfs dfs -deleteSnapshot /user s0

Rename snapshot

Rename snapshot s0 to s1.

sudo -u hdfs hdfs dfs -renameSnapshot /user s0 s1

Get list of snapshottable directory

Return list of all directories where the user has snapshot permission.

hdfs lsSnapshottableDir

Get difference between two snapshots

Command:

hdfs snapshotDiff <path> <fromSnapshot> <toSnapshot>

path is the path of the snapshottable directory.
fromSnapshot is the name of the “from” snapshot.
toSnapshot is the name of the “to” snapshot.

Explaining output:

+             The file/directory has been created.
              The file/directory has been deleted.
M            The file/directory has been modified.
R             The file/directory has been renamed.

Example:

  1. Create snapshot s0
    sudo -u hdfs hdfs dfs -createSnapshot /user s0
  2. Create a file and put it in HDFS under /user directory
    echo aaa > a.txt
    hadoop fs -put a.txt /user
  3. Create snapshot s1
    sudo -u hdfs hdfs dfs -createSnapshot /user s1
  4. Return the difference between the snapshot
    sudo -u hdfs snapshotDiff /user s0 s1

    Output:

    Difference between snapshot s0 and snapshot s1 under directory /user:
    M       .
    +       ./a.txt
    

    Line 2: The directory /user has been modified
    Line 3: File a.txt has been added.