CHEMeDATA Initiative

Main website of the CHEMeDATA umbrella organization

Chemistry object

Perpective

We describe here to describe the content of a zip file including chemical data found in repositories such as Zenodo, University archive, suplementary information pages of jounals, etc.

Repository service typically use FITS providing information about the files. FITS identifies image, spreadsheets, the type of .pdf, etc. But this method does not recognize the files commonly used in the specialized field of chemistry.

Listing chemistry objects

Listing chemical structures, NMR spectra, etc. would allow the visitor to:

whithoug having to download the entire archive. He could be be able to download only the part of interest.

In order to be able to list chemical structures, NMR spectra, etc. criteria need to be listed to identify them. A method aiming at identifying the Chemistry Objects found in an archive file (.zip) is the object of the “Chemistry research object identification”. In short, the procedure consists in searching, in each folder of the archive, for a specific file name, or a pattern of files and fulfil additional criteria (see “Short list” Table).

Giving chemistry objects flesh and blood

Allowing to extact part of an archived makes the loss of context possible problematic downstream, when the the researcher will need to refer to the source of the data, or find related information later on. The donwloaded “chemistry objects” could include additional information (for example, the chemical shifts for each carbon) with the 3D structure of the compound. It should also carry with it, metadata about its origin, how to cite the work, its INCHI (to faciliate search of additional information about the compound, etc.)

Short list of Chemistry object

This is just a short list for illustration purpose. More info can be found in the “Chemistry research object identification” project.

# Chemistry object Criteria Type of data visualization
2 Bruker 1D 1H NMR spectrum file_name==”1r” & exist_file:”../../fid” & exist_file:”../../acqus” & find_line=”##$NUC1= <1H>” in “../../acqus x/y plot (ppm/intensity) JCAMP-DX, simple x/y plot
7 IR spectrum file_name==”*.sp x/y plot (energy in nm non-homogeneous scale/intensity) JCAMP-DX, simple x/y plot
8 X-ray crystallography structure file_name==”*.cif 3D chemistry structure visualization) JSmol, etc.
9 2D molecular structure file_name==”*.cdx 2D chemistry structure visualization) JSmol, etc. after conversion!

Examples of the above-mentionned objects can be found in the files:

/researchdata/NMR Files per compound/3r_(R)-Me-FBn-18C6/2/pdata/1/1r
/researchdata/Other Data per compound/3n_(S)-Me-1-naphth-18C6/3n_X-Ray/3n_(S)-Me-1-naphth-18C6-NaBArF.cif
/researchdata/Other Data per compound/3n_(S)-Me-1-naphth-18C6/2nd eluted/3n_(S)-Me-1-naphth-18C6_2nd elt_FT-IR.sp 
/researchdata/Other Data per compound/3n_(S)-Me-1-naphth-18C6/3n_(S)-Me-1-naphth-18C6.cdx

from this yareta record.