Copying Directories of Files: rsync vs cp
Copying Directories of Files: rsync vs cp
This page provides a detailed comparison between using the cp and rsync commands for backing up directories containing files, particularly in scenarios such as copying from a network-mounted drive to a removable storage device like an SD flash drive. The discussion is based on usage in Linux Mint 21.1 with the Cinnamon desktop environment.
Background and Example Scenario
A common task is to back up a directory of files using the cp command. An example command might look like this:
cp -adv ./SourceDirectory /media/user/BackupDestination/Subfolder
This command copies the directory "SourceDirectory" recursively to the specified destination path. The options used are:
- -a: archive mode, which preserves symbolic links, permissions, timestamps, and other attributes
- -d: preserves symbolic links as links (similar to --no-dereference in some contexts)
- -v: verbose mode, which displays the files being copied
This method works effectively for straightforward copies, providing feedback through verbose output. However, if the copy process is interrupted (for example, due to a network disconnection or accidental removal of the drive), it does not resume from the point of interruption, and partial progress may be lost.
The objectives in improving this process include ensuring the integrity of copied files, allowing resumption after interruptions, and providing visible progress information during the transfer.
Recommended rsync Equivalent
A more robust alternative is to use rsync. The equivalent command for the above scenario is:
rsync -av --progress --partial --whole-file ./SourceDirectory /media/user/BackupDestination/Subfolder
This command includes the following options:
- -a: archive mode, which recursively copies directories while preserving symbolic links, permissions, timestamps, ownership, and other file attributes in a comprehensive manner
- -v: verbose mode, providing detailed output about the files being processed
- --progress: displays progress information for individual files as well as an overall transfer summary
- --partial: retains partially transferred files, enabling rsync to resume interrupted transfers efficiently
- --whole-file: disables delta-transfer algorithm, which is unnecessary and slightly slower for initial full copies to an empty destination directory, thus optimizing speed in this use case
For maximum file integrity verification, the --checksum option could be added, which compares files based on checksums rather than just size and modification time. However, this introduces a significant performance penalty due to the computational overhead of calculating checksums for every file, making it unsuitable when transfer speed is a priority.
Performance and Speed Comparison
When performing an initial full copy to an empty destination directory:
- The cp -adv command is typically slightly faster, with an advantage of approximately 5–20% in many cases. This is because cp copies files directly without additional preparatory steps.
- rsync incurs minor overhead from building a file list and performing quick checks based on file size and timestamps, even when all files need to be copied anew.
- In real-world scenarios involving a network-mounted source and a removable destination such as an SD card, the primary bottleneck is usually the input/output speed of the devices rather than the tool's overhead. As a result, the practical difference in transfer time is often small and may not be noticeable for most users.
To further reduce rsync's overhead in initial full copies, the --whole-file option (as included in the recommended command) is beneficial.
Advantages of Each Method
Advantages of cp -adv
- Slightly higher speed for initial full copies due to minimal overhead
- Simpler syntax with fewer options required
- No preliminary scanning or file list generation
- Marginally lower CPU resource consumption during the operation
Advantages of rsync -av --progress --partial --whole-file
- Automatic resumption of transfers interrupted by issues such as network failures, power loss, or removal of the storage device
- Detailed progress display, including per-file and overall statistics, which is helpful for monitoring long transfers
- Ability to safely re-execute the exact same command on subsequent runs; rsync will skip files that are already identical in the destination
- Superior preservation and handling of file attributes, including permissions, timestamps, symbolic links, and special files
- Option to perform dry runs (with --dry-run) to verify what would be copied or to confirm that the destination matches the source without transferring data
- More informative verbose output, aiding in the identification of potential issues during the copy process
- Retention of partially transferred files upon interruption, ensuring that completed portions are not lost
Conclusion and Recommendation
For one-time backups where interruptions are unlikely and maximum speed is desired, the cp -adv command remains a straightforward and efficient choice.
However, for recurring backups, large directory structures, or environments prone to interruptions (such as those involving network drives or removable media), rsync with the recommended options provides greater reliability, resumability, and user feedback, making it the preferred tool despite the minor performance trade-off.