Download sequencing data¶
After sequencing of your data, all the fastq files and associated analysis files (BAM, etc…) are uploaded on an sftp server. You can either download them using a software with a graphical interface (FileZilla for example), or via a terminal. Both methods are described above.
Using FileZilla (or another software)¶
If you’re not familiar with the use of command lines, FileZilla is a free and easy to use FTP client alternative in order to download files from a remote server.
In order to fetch the data, you will need to fill in the following fields on the upper bar of the main window:- Host : sftp.igf.cnrs.fr
- Username : login
- Password : password given in the QC-Analysis-Report demand in your project
- Port : 22
Once done, click on QuickConnect
to connect to the server. If the connection is successful, a list of files will appears on the right side of the main window. Then, you can select all the files et drag-and-drop them on the directory of your choice (left side of the main window).
Here's an example of a successful connection:
Using a terminal¶
If you’re used to command lines, the following commands can be used to download the files from the server :
# connection
$ sftp <login>@sftp.igf.cnrs.fr
$ <enter password>
# see the files available
sftp> ls -l
# fetch all files available
sftp> get *
# disconnection
sftp> exit
Here's an example of a successful downloading:
[al@dell ~/Documents/fastqs]$ sftp mgx28\@sftp.igf.cnrs.fr
*********************************************************************************
Welcome on IGF sftp server
The use of this system is restricted to authorized users,
unauthorized access is forbidden.
All information and communications on this system are subject to review,
monitoring and recording at any time, without notice or permission.
Users should have no expectation of privacy.
*********************************************************************************
mgx28@sftp.igf.cnrs.fr password:
Connected to sftp.igf.cnrs.fr.
sftp> get *
Fetching /home/mgx28/1.fastq.gz to 1.fastq.gz
Fetching /home/mgx28/2.fastq.gz to 2.fastq.gz
Fetching /home/mgx28/3.fastq.gz to 3.fastq.gz
Fetching /home/mgx28/4.fastq.gz to 4.fastq.gz
Fetching /home/mgx28/md5sum.txt to md5sum.txt
sftp> exit
[al@dell ~/Documents/fastqs]$
Check data integrity¶
The md5sum tool calculates what we call a file print. This fingerprint, message-digest or checksum is a 128 bits value corresponding to a control sum calculated from a file. This signature is unique to each file. By comparing the MD5 digest of a file to the value supplied that we give you in the md5sum.txt
file, you can make sure that the files you downloaded are free from damage and tampering (for example if a network issue have occured).
Under Linux¶
Under Linux, the md5sum
tool is usually integrated to your distribution. You just have to go to the directory containing the files to check, and run the following command:
md5sum -c md5sum.txt
Result has to be "*OK*" for all files.
Under Windows¶
Third parties softwares¶
Under Windows, one can use some softwares like md5-sha-checksum-utility. Generally these software allow to check the files one by one, not all at the same time.
PowerShell (command-line)¶
Another option is to use a specific command line on PowerShell.
Here's the steps to follow :
- 1. Open PowerShell by typing it on the search bar
- 2. Go to the directory where the files were downloaded using the following command line:
cd full/path/of/where/the/directory/is/
Note: make sure there is no space in the folders name, you might have an error if there are spaces.
Once you are in the directory, you can list the files using the command linels
.
- 3. Generate the md5sums keys for each downloaded files (recursively if there are subdirectories) at the same time, using the following command-line (you can copy-paste it to the PowerShell window):
$lines = Get-ChildItem -File -Recurse | Where-Object { $_.Name -notlike "md5*" } | ForEach-Object { $hash = Get-FileHash -Algorithm MD5 -Path $_.FullName $base = Get-Location $rel = $_.FullName.Substring($base.Path.Length + 1).Replace('\', '/') "$($hash.Hash.ToLower()) $($rel)" } $lines | Set-Content -Encoding ASCII md5sums.txt
Once it is done, a md5sums.txt file will be written in the directory. /!\ Depending of the size of the data, it may take several hours to generate all the keys, do not close the PowerShell window while the command is still running.
- 4. Compare the generated keys to the ones we gave you in the md5sum_*.txt file. You can use online websites to compare the two files, such as diffchecker. Be carefull, the order of the listed files must be the same in both md5sums files, because the lines are compare one by one. Otherwise, you can ask any IA to find differences.
If the keys of a same file isn't the same, then your file might be corrupted, so come back to us. Otherwise everything's good.