Upload / Download

Standard upload and download

When a local project directory exists (created using mc init or mc clone), files can be transferred between the local project and the remote project.

File uploading and downloading is performed with the mc up and mc down command, respectively. To upload one or more files the basic usage is:

mc up [-r] local_file [local_file ...]

And to download one or more files the basic usage is:

mc down [-r] remote_file [remote_file ...]

The -r option is used to upload and download directory contents recursively. For mc up, any files given as “path” arguments existing in the local project directory will be uploaded to the corresponding directory in the remote project. Similarly, for mc down any files given as “path” arguments existing in the remote project directory will be downloaded to the corrsponding directory in the local project. Any intermediate directories that do not exist will be created automatically.

For example, consider a project on Materials Commons named “MyProject” that looks like this:

MyProject/:
├── file_A.txt
├── file_B.txt
└── level_1
    ├── file_A.txt
    ├── file_B.txt
    └── level_2
        ├── file_A.txt
        └── file_B.txt

Imagine the local project directory looks like this:

/Users/me/local/mc_projects/MyProject/:
├── file_A.txt
├── file_C.txt
└── level_1
    ├── file_A.txt
    └── file_C.txt

If the current working directory is /Users/me/local/mc_projects/MyProject/, then the following are all possible:

# upload one file
mc up file_C.txt

# upload two files
mc up file_C.txt level_1/file_C.txt

# upload a directory, and its subdirectories, recursively
mc up -r level_1

# upload everything in the current directory, and subdirectories, recursively
mc up -r .

# download one file
mc down file_B.txt

# download two files
mc down file_B.txt level_1/file_B.txt

# download a file (creating necessary directory level_1/level_2):
mc down level_1/level_2/file_A.txt

# download a directory and subdirectories, recursively:
mc down -r level_1

# download all files and subdirectories, recursively:
mc down -r .

By default, mc up/mc down will check MD5 checksums and not transfer files that already exist. This can be skipped with the --no-compare option.

There is a limit to the size of files that can be uploaded using the standard file upload process. that depends on the configuration of the particular instance of Materials Commons. The mc program will skip (with a warning message) uploading any file larger than the size given by the --limit option (given in MB). The default limit is 50MB.

When uploading a file results in “overwriting” an existing file at the same location, Materials Commons saves the previously existing file as a “version”. Access to previous file versions will be enabled in a subsequent release, along with details about how mc rm and mc mv effect versions.

Globus installation and configuration

The Globus transfer service can be used when transferring large files or many files. The Globus client should already be installed as part of the materials-commons-cli installation process. But, if it is not installed, install with:

pip install globus-sdk
pip install globus-cli

The following assumes you have a valid Globus account.

First, add your Globus account name in the Account Settings section on the Materials Commons website. Typically, this is the email address associated with your Globus account.

Second, configure the client side to use Globus. There are two typical cases:

  • Case 1) If you are using Globus Personal Connect, for instance on your personal computer, you must have Globus Personal Connect running and log in to Globus:

    globus login
    
  • Case 2) If you are using a Globus endpoint managed by someone else, for instance on a shared cluster, mc must be configured with the Globus endpoint ID. This can be done using the following steps:

    • Find the Globus endpoint ID for the endpoint you will use. Endpoint UUIDs can be found on the Globus endpoints web interface.

    • Configure mc to use the Globus endpoint ID:

      mc globus --set-globus-endpoint-id <endpoint_id>
      

Uploading files with Globus

Globus transfers to Materials Commons take place via a temporary intermediary directory that is created on Materials Commons. The process is as follows:

Step 1, Request a Globus upload directory

Imagine the initial state of your local and remote projects, before uploading, looks like the following:

../site-images/up.1.png

Imagine that you want to upload the files named “file_B.txt”. Request that Materials Commons creates a new Globus upload for the project. This creates a new, empty, upload directory on Materials Commons. It is possible to have multiple upload directories existing at the same time, for the same project, to help manage multiple transfers. Each upload directory requested from mc is given a name made up of three random words along with an ID and UUID. Access control is set to allow only you to privately access the upload directory via Globus.:

$ mc globus upload --create
Created Globus upload: 293
    project_name      project_id  type    name                   id  uuid                                  created               status
--  --------------  ------------  ------  -------------------  ----  ------------------------------------  --------------------  --------
*   MyProject                589  upload  rivage-popish-bonze   293  c3474f16-b357-4b6c-92ca-fd93a1e37840  2021 Jan  1 03:52:49  Ready
../site-images/up.2.png

Step 2, Transfer files

At this point, Globus transfers to the upload directory can be initiated with mc up by adding the -g/--globus option. Multi-file and recursive uploads can be initiated for Globus uploads just as with standard uploads. Since a user can have multiple existing Globus upload directories, the mc program stores, for each local project, the ID of a “current” Globus upload indicating which upload directory to transfer files to.

If no “current” upload exists when mc up is called, then mc will request a new upload directory with a random name and initiate a Globus transfer to it. The newly created upload directory becomes the “current” upload directory and is used when mc up -g is called subsequently.

The first time a transfer is initiated Globus will generate an authorization token for the mc program, specific to the computer making the transfer. A browser should be opened automatically taking you to a page with the authorization code which mc will prompt for. Copy and paste the code into the command line and the transfer should begin.

$ mc up -g file_B.txt
Using current globus upload (name=rivage-popish-bonze, id=293).
Please login. If a webpage does not open automatically, go here:

https://auth.globus.org/v2/oauth2/authorize?... (long link)


Please enter the code you get after login here: abc123abc123abc123abc123abc123
Globus task_id: c127a968-57b1-11eb-87bb-02187389bd35
Globus transfer task initiated.
Use `globus task list` to monitor task status.
Use `mc globus upload` to manage Globus uploads.
Multiple transfer tasks may be initiated.
When all tasks finish uploading, use `mc globus upload --id 293 --finish` to import all uploaded files into the Materials Commons project.

Any number of Globus transfers can be performed to transfer files into the upload directory. Files will be placed into the upload directory mirroring the project’s directory struture, but only directories containing newly uploaded files need to be created.

The command globus task list can be used to check the status of all initiated transfers.

$ globus task list
Task ID                              | Status    | Type     | Source Display Name | Dest Display Name          | Label
------------------------------------ | --------- | -------- | ------------------- | -------------------------- | --------------------------------
c127a968-57b1-11eb-87bb-02187389bd35 | SUCCEEDED | TRANSFER | my_MacbookAir       | materials-commons-2-upload | MyProject-rivage-popish-bonze

If a task will not complete (Status != SUCCEEDED), check that Globus Personal Connect is running, your internet connection is working, or check and address error messages in the Globus transfer activity page.

At any point before finishing the upload, you can as a convenience also open the Globus file manager in a web browser with the --goto command to transfer files using that interface. For example:

$ mc globus upload --id 293 --goto
    project_name      project_id  type    name                   id  uuid                                  created               status
--  --------------  ------------  ------  -------------------  ----  ------------------------------------  --------------------  --------
    MyProject                589  upload  rivage-popish-bonze   293  c3474f16-b357-4b6c-92ca-fd93a1e37840  2021 Jan 16 03:52:49  Ready

You want to goto these uploads in a web browser? ('Yes'/'No'): Yes

Step 3, Finish upload

If all of the “file_B.txt” files are uploaded, the directories will look like the following:

../site-images/up.3.png

Once all desired transfers are completed, the Materials Commons upload directory can be closed and files processed into your project with the --finish option for mc globus upload:

mc globus upload --id 293 --finish

The processing time required before files appear in your project will depend on the size of the transfer. The status can be checked with mc globus upload:

$ mc globus upload
    project_name      project_id  type    name                   id  uuid                                  created               status
--  --------------  ------------  ------  -------------------  ----  ------------------------------------  --------------------  ---------
    MyProject                589  upload  rivage-popish-bonze   293  c3474f16-b357-4b6c-92ca-fd93a1e37840  2021 Jan 16 03:52:49  Finishing

Once processing is finished, the upload directory will no longer appear in mc globus upload results, and all files should appear in the project directory.

Then, the final state should be:

../site-images/up.4.png

Additional Notes

If you wish to delete an upload directory and not process the files that have already been uploaded, use the --delete option with mc globus upload.

The current Globus upload directory can be managed with the --set and --unset options for mc globus upload.

In the example uploading all the “file_B.txt” files resulted in the remote project matching the local project. This could have also been accomplished by attempting to upload all the local files, recursively:

mc up -g -r .

In this case, mc will compare the MD5 checksum of the local and remote files and only request that Globus upload the files that are different.

Downloading files with Globus

The Globus download process is similar to the Globus upload process. The first step is requesting that Materials Commons creates a temporary download directory using mc globus download --create. This will create the temporary download directory and populate it with links to all of the files in the project. Access control is set to allow only you to privately access the download directory via Globus.

$ mc globus download --create
Created Globus download: 295
    project_name      project_id  type      name                   id  uuid                                  created               status
--  --------------  ------------  --------  -------------------  ----  ------------------------------------  --------------------  --------
*   MyProject                589  download  teerer-armed-gynics   295  db89ab27-92cf-4e12-9b4d-a4c3162f56da  2021 Jan 16 04:52:24  Waiting

The status of the download directory can be checked with mc globus download:

$ mc globus download
    project_name      project_id  type      name                   id  uuid                                  created               status
--  --------------  ------------  --------  -------------------  ----  ------------------------------------  --------------------  --------
*   MyProject                589  download  teerer-armed-gynics   295  db89ab27-92cf-4e12-9b4d-a4c3162f56da  2021 Jan 16 04:52:24  Waiting

Note that it is a known issue that creating the download directory for projects with a very large number of files may be very slow and even fail. An improved download method is under development.

Once the status is “Ready”, downloads can be initiated by adding the the -g/--globus option to mc down. Multi-file and recursive downloads can be initiated for Globus downloads just as with standard downloads.

mc down -g file_B.txt

As with uploads, multiple transfers may be initiated and --goto may be used to open the Globus file manager for the download directory in a web browser. Once all desired transfers are completed, the Materials Commons download directory can be closed with the --delete option for mc globus download:

mc globus download --id 295 --delete

The download directory may be left as long as desired, but it will not reflect any file or directory changes to the project since the time the download directory was created.

The current Globus download directory can be managed with the --set and --unset options for mc globus download.

Reference

For a complete list of options, see: