Path: blob/master/part-1/allas/allas-bio-data.md
1229 views
------Using Allas in CSC's HPC environment
Before the actual exercise, open a view to the Allas service in your browser using the Puhti web interface.
Go to https://www.puhti.csc.fi and login with your account.
Configure an Allas S3 connection using the Cloud storage configuration tool.
You need to first authenticate by providing your CSC password.
If you have several projects available, choose one that you want to use in this exercise.
Once you've configured a connection, select
s3allas-project_<id>from the Files dropdown menu in the top navigation bar. Replace<id>with the number of the project you chose to use (e.g. 2001234).During the exercise, you can use this web interface to get another view to the buckets and objects in Allas.
1. Login to Puhti
Login to Puhti (open a login node shell if using the web interface):
In Puhti, check your environment with the command:
Move to the
/scratchdirectory of your projectCreate your own subdirectory named with your username:
Move to the directory:
2. Download data with wget
Next, download a dataset and uncompress it
The dataset contains some pythium genomes with related BWA indexes
3. Using Allas
Open a connection to Allas:
If you have several Allas projects available, select the same project as earlier
Upload case 1: rclone
Upload the data from Puhti to Allas with
rclone:How long did the data upload take?
What was the transfer rate?
How long would it take to transfer 100 GiB assuming the same speed?
Study what you have uploaded to Allas with the commands:
Check how this looks like in the Puhti web interface. Open a browser and go to https://www.puhti.csc.fi/
In the Puhti web interface, go to the Files app and select
s3allas-project_<id>to list the buckets of your project (replace<id>as needed).Locate your own
$USER-genomes-rcbucket and download one of the uploaded fasta files to your local computer
š” You can read more about moving files at Docs CSC: Copying files using scp and Moving data with rclone
Upload case 2: a-put
Upload the pythium directory from Puhti to Allas using a-commands
Case 1: Store everything as a single object (replace
<project number>with your CSC project number, e.g. 2001234):Case 2: Each subdirectory (species) as a separate object (replace
<project number>with your CSC project number, e.g. 2001234):Case 3: Use a custom bucket name (replace
<project number>with your project number, e.g. 2001234):Can you see the difference between the three
a-putcommands above?Study the
<project number>-$USER-genomes-apbucket with commands:Why do the two commands above list a different amount of objects?
Try the command (replace
<project number>with your project number, e.g. 2001234):This command is actually the same as:
Finally, try the command:
Try opening the public link that
a-flipproduced with your browser
Upload case 3: allas-backup
Run the commands:
What did these commands do to your data?
4. Exit
The data in the
pythiumdirectory is now stored in many ways in Allas, so we can remove the data from Puhti and log out:
5. Downloading data from Allas to Puhti
Login to Puhti and move to your personal directory in your project's
/scratch:In Puhti, check you projects with the command:
Set up the Allas connection:
Then run the commands (we will use the same bucket that was created earlier):
Next, download the data in different ways:
1. Download with rclone
Copy everything:
Copy a set of objects:
Copy just one object:
2. Download with a-get
Return to your
$USERdirectory under your project's/scratchon Puhti (Thepwdcommand should print/scratch/<project/$USER):Make a new directory:
Create a directory
alland move there:List your default
SCRATCHbucket (replace<project number>with your project number, e.g. 2001234):Look for the file
pythium_vexans.fastain your PuhtiSCRATCHbucket:Download the full dataset with command:
Check what you got:
Now, download just a single genome dataset:
3. Downloading data from allas-backup
Return to your main scratch directory and make a new directory:
Use the commands below to find out the ID of the most recent backup version of your pythium directory:
Use
allas-backup restoreto download the data: