Preparing Your Data
Preparing Your Data
The first step to depositing data is determining what data to deposit. Research projects often generate a lot of data throughout the life of the project. It is not always feasible to deposit all of the data from the project. When selecting data to deposit in KiltHub, you should consider:
- the importance of the data
- the reusability of the data
- the necessity of the data to validating research results
In addition, you must address whether the data includes personally identifiable information and whether you have the rights to make the dataset public.
KiltHub will accept any file format. In order to facilitate basic preservation services, compressed data is discouraged (.zip, .gz, tar.gz). File format recommendations for preservation can be found in the Library of Congress Recommended Formats Statement (https://www.loc.gov/preservation/resources/rfs/).
If you are working with proprietary or less-sustainable formats, consider converting your data to an open, widely-used format when you save and share your data. Many software programs allow for converting datasets into open formats (e.g. save SPSS dataset as CSV). This will better ensure that your data is accessible and usable by yourself and others and into the future.
There are some constraints on the size of files deposited:
- The maximum size of a file uploaded through the online interface is 5GB.
- For files exceeding 5GB, please contact your Liaison Librarian so we may work with you to ingest your data.
When depositing data to KiltHub, you will be asked to apply one of several Creative Commons licenses. KiltHub also allows for licensing of software code. A complete list of licenses available can be viewed and compared before making a decision. Depositors authorize the Library to distribute the data under the terms of the license they have selected.
KiltHub requires descriptive metadata for each dataset deposited. The following are the minimum requirements for deposit into KiltHub:
- Title: Give your research a title that is more descriptive than a filename.
- Authors: Include the names of those involved in creating the data, either by name, full email or ORCID.
- Categories: Please select the subject or subjects that best represent your data. More than one can be selected.
- Item Type: You can upload all of your research outputs to KiltHub. Explore Item Types.
- Keywords: Add keywords that will make your research more discoverable.
- Description: Add as much context as possible so that others can interpret your research and reproduce it. Include methodology and techniques used.
Research data from all fields, subjects, and disciplines at Carnegie Mellon University may be published and/or archived in KiltHub, provided the following conditions are met:
- The work must be produced, submitted, or sponsored by a valid Carnegie Mellon University faculty members, researchers, staff, or student as previously stated in the collecting criteria.
- The author/copyright owner must grant Carnegie Mellon University the nonexclusive right to preserve and distribute the data in perpetuity.
- The data submitted is permissible according to the policies set by Carnegie Mellon University and the criteria and practices established for deposit by the University Libraries, including ensuring that research data involving the use of human subjects is in accordance with Carnegie Mellon University’s Institutional Review Board (IRB).
- You have fulfilled any right of review, confidentiality, or other obligations required by contract or agreement if the work was sponsored or supported by an agency or organization other than Carnegie Mellon University.