Datasets

A dataset in Materials Commons is a way to gather files, activities, entities and their attributes together into a bundle and eventually publish them. A published dataset is available publicly, and can be browsed. The files are made available for download, packaged into zip file, and also available through Globus. A dataset can be annotated with tags, authors, papers, a description and other data. A published dataset is also available in Google’s Dataset Search site.

# Get a list of all datasets in a project
datasets = c.get_all_dataset(project.id)

# Create a new dataset in a project
dataset = c.create_dataset(project.id, "dataset-name")

# Create a new dataset in a project and add additional information to the dataset
req = mcapi.CreateDatasetRequest(description="dataset description")
dataset = c.create_dataset(project.id, "ds-name", req)

# Have Materials Commons create a DOI and associate it with the dataset
dataset = c.assign_doi_to_dataset(project.id, dataset.id)

# Publish a dataset
dataset = c.publish_dataset(project.id, dataset.id)

# Unpublish a dataset
dataset = c.unpublish_dataset(project.id, dataset.id)

Published Datasets

A published dataset is publicly available. The API allows you to interact with published datasets and download their files.

# Get all published datasets
published_datasets = c.get_all_published_datasets()

# Get file objects for published dataset
files = c.get_published_dataset_files(published_datasets[0].id)

# Download a single file from a published dataset to /tmp
c.download_published_dataset_file(published_datasets[0].id, files[0].id, "/tmp/file.txt")

# Download the published dataset's zipfile to /tmp
c.download_published_dataset_zipfile(published_datasets[0].id, "/tmp/ds.zip")

Get Published Datasets for Author

You can ask for all the published datasets by a particular author. This will do a search using the string you supplied.

allison_datasets = c.get_published_datasets_for_author("allison")

Get Published Datasets That Have Tag

You can get all the published datasets associated with a tag:

mg_tagged_datasets = c.get_published_datasets_for_tag("mg")

Other Operations

Miscellaneous other operations:

# Get all tags used in published datasets
tags = c.list_tags_for_published_datasets()

# Get all authors that have published on Materials Commons site
authors = c.list_published_authors()