Documentation for Apache Hadoop 2.7.1 on HDFS Snapshots can be found here.
HDFS Snapshots are read-only point-in-time copies of the file system. They can be taken on any level of the file system.
Cases where snapshots can be useful:
- Backup
- Disaster recovery
Blocks and DataNodes are not copied. Block list and file size are recorded.
The snapshot data is computed by subtracting the modifications from the current data. The modifications are recorded in chronological order, so that the current data can be accessed directly.
In order to take snapshots, the directory has to be set as snapshottable. If there are snapshots in a snapshottable directory, the directory cannot be deleted nor renamed.
Snapshot path is stored under the snapshottable directory.
Example: directory /user is a snapshottable directory, which means snapshots can be found in /user/.snapshot.
Snapshot commands
Superuser privileges are needed for these commands.
Allow snapshot
sudo -u hdfs hdfs dfsadmin -allowSnapshot /user
Dissallow snapshot
Snapshots must be deleted before this command can successfully execute.
sudo -u hdfs hdfs dfsadmin -disallowSnapshot /user
Create snapshot
sudo -u hdfs hdfs dfs -createSnapshot /user s0
Result:
Created snapshot /user/.snapshot/s0
Delete snapshot
sudo -u hdfs hdfs dfs -deleteSnapshot /user s0
Rename snapshot
Rename snapshot s0 to s1.
sudo -u hdfs hdfs dfs -renameSnapshot /user s0 s1
Get list of snapshottable directory
Return list of all directories where the user has snapshot permission.
hdfs lsSnapshottableDir
Get difference between two snapshots
Command:
hdfs snapshotDiff <path> <fromSnapshot> <toSnapshot>
path is the path of the snapshottable directory.
fromSnapshot is the name of the “from” snapshot.
toSnapshot is the name of the “to” snapshot.
Explaining output:
+ The file/directory has been created.
– The file/directory has been deleted.
M The file/directory has been modified.
R The file/directory has been renamed.
Example:
- Create snapshot s0
sudo -u hdfs hdfs dfs -createSnapshot /user s0
- Create a file and put it in HDFS under /user directory
echo aaa > a.txt hadoop fs -put a.txt /user
- Create snapshot s1
sudo -u hdfs hdfs dfs -createSnapshot /user s1
- Return the difference between the snapshot
sudo -u hdfs snapshotDiff /user s0 s1
Output:
Difference between snapshot s0 and snapshot s1 under directory /user: M . + ./a.txt
Line 2: The directory /user has been modified
Line 3: File a.txt has been added.