Datasets
A dataset in Materials Commons is a way to gather files, activities, entities and their attributes together into a bundle and eventually publish them. A published dataset is available publicly, and can be browsed. The files are made available for download, packaged into zip file, and also available through Globus. A dataset can be annotated with tags, authors, papers, a description and other data. A published dataset is also available in Google’s Dataset Search site.
# Get a list of all datasets in a project
datasets = c.get_all_dataset(project.id)
# Create a new dataset in a project
dataset = c.create_dataset(project.id, "dataset-name")
# Create a new dataset in a project and add additional information to the dataset
req = mcapi.CreateDatasetRequest(description="dataset description")
dataset = c.create_dataset(project.id, "ds-name", req)
# Have Materials Commons create a DOI and associate it with the dataset
dataset = c.assign_doi_to_dataset(project.id, dataset.id)
# Publish a dataset
dataset = c.publish_dataset(project.id, dataset.id)
# Unpublish a dataset
dataset = c.unpublish_dataset(project.id, dataset.id)
Published Datasets
A published dataset is publicly available. The API allows you to interact with published datasets and download their files.
# Get all published datasets
published_datasets = c.get_all_published_datasets()
# Get file objects for published dataset
files = c.get_published_dataset_files(published_datasets[0].id)
# Download a single file from a published dataset to /tmp
c.download_published_dataset_file(published_datasets[0].id, files[0].id, "/tmp/file.txt")
# Download the published dataset's zipfile to /tmp
c.download_published_dataset_zipfile(published_datasets[0].id, "/tmp/ds.zip")
Get Published Datasets That Have Tag
You can get all the published datasets associated with a tag:
mg_tagged_datasets = c.get_published_datasets_for_tag("mg")
Other Operations
Miscellaneous other operations:
# Get all tags used in published datasets
tags = c.list_tags_for_published_datasets()
# Get all authors that have published on Materials Commons site
authors = c.list_published_authors()