HAL Id: hal-01289794
https://hal.archives-ouvertes.fr/hal-01289794v2
Preprint submitted on 21 Mar 2016
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Why not a Crystallographic Mark-up Language ?
Alain Soyer
To cite this version:
Alain Soyer. Why not a Crystallographic Mark-up Language ?. 2016. �hal-01289794v2�
Why not a Crystallographic Mark-up Language ?
Alain Soyer ([email protected])
Université Paris 6 - IMPMC CNRS UMR7590 - Case 115 Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie
Tour 23, 5ème étage, couloir 23-13, pièce 542 4 place Jussieu - 75005 Paris
Since 1991 CIF format (http://www.iucr.org/resources/cif/documentation) has proved to be very useful to crystallographers to exchange their data all over the world. In parallel to its popularization, the XML language and its derivatives (particularly HTML) has become universally adopted, with the huge development of internet.
In view of this evolution the following question arise : why not a Crystallographic Mark-up Language CrysML ? (I don't call it CML because Chemical Mark-up Language already exist).
Despite its quality, CIF have some inconveniences in comparison with XML: the most important is probably that it is not “object oriented”, and also that CIF files are not a “well formed” documents.
So I was wondering if it is not a good idea to rewrite CIF in XML format in the future.
Below is a first small file quartz.xml (created by hand with a basic vi editor !) : of course it is not a true valid crystallographic file, but just a piece of example to show what a CrysML file would look like . You may open this file in your browser.
One may note that :
- each block of data is delimited by an open <tag> and have a corresponding close </tag>, so a CrysML file seems more “robust” than CIF. A bloc corresponds to an “object” (in the meaning of computer language) and will probably be represented by a “structure” (in C or FORTRAN
language) with the same organisation in memory of programs that will read or create CrysML files.
- tag names may be identical to pieces of CIF names and consequently familiar to crystallographers.
An exception is for CIF “loop” to be replaced by list of items. I have added an “attribute” named length to indicate the number of items in a list. I know that this is not mandatory but I think that it is a “good practice” because a program reading the file can perform dynamical memory allocation of an array of structures to store the items of the list (and also may verify that the number of items really in the list is equal to the number announced in the attribute).
After quartz.xml is a second file CrisML_schema.xsd, referenced in the first file, containing a very rudimentary and incomplete “XML schema” , to play the role of the CIF dictionary.
If such an evolution is considered, the good news is that it will be quite easy to add an option to create XML files in a current software that creates CIF files.
Modifying programs that read CIF files to support CrysML files will need a more important work (but I think it is probably less difficult to get data you are interesting from a XML file than from a CIF one).
Of course the bad news will be the need to create an XML schema to replace CIF dictionary, witch
is certainly a very tedious important work.
quartz.xml :
<?xml version="1.0" encoding="UTF8"?>
<CrysML xmlns:xsi="http://www.w3.org/2001/XMLSchemainstance"
xsi:noNamespaceSchemaLocation="CrisML_schema.xsd">
<data_global>
<journal>
<name_full> Acta Crystallographica B </name_full>
<year> 1976 </year>
<volume> 32 </volume>
<pages>
<first> 2456 </first>
<last> 2459 </last>
</pages>
</journal>
<publ>
<section>
<title>
Refinement of the crystal structure of lowquartz </title>
</section>
<author_list length="2">
<author>
<name> Yvon Le Page </name>
<address>
Department of Geological Sciences, McGill University, Montreal, Quebec, H3C 3G1, Canada
</address>
</author>
<author>
<name> Gabrielle Donnay </name>
<address>
Department of Geological Sciences, McGill University, Montreal, Quebec, H3C 3G1, Canada
</address>
</author>
</author_list>
</publ>
</data_global>
<data>
<chemical>
<formula> Si O2 </formula>
<compound_source> synthesised at Bell Lab </compound_source>
</chemical>
<space_group>
<crystal_system>Trigonal</crystal_system>
<name_HM> P 32 2 1 </name_HM>
</space_group>
<cell>
<lengths>
<a> 4.9134 </a>
<b> 4.9134 </b>
<c> 5.4052 </c>
</lengths>
<angles>
<alpha> 90 </alpha>
<beta> 90 </beta>
<gamma> 120 </gamma>
</angles>
<volume> 113.01 </volume>
<formula_units_Z> 3 </formula_units_Z>
<measurement>
<temperature> 293 </temperature>
<reflns_used> 342 </reflns_used>
</measurement>
</cell>
<atom_site_list length="2">
<atom_site>
<label> O </label>
<fract_x> 0.4141 </fract_x>
<fract_y> 0.2681 </fract_y>
<fract_z> 0.785467 </fract_z>
<occupancy> 1 </occupancy>
<Uij>
<U_11> 0.0156 </U_11>
<U_22> 0.0115 </U_22>
<U_33> 0.0119 </U_33>
<U_12> 0.0092 </U_12>
<U_13> 0.0029 </U_13>
<U_23> 0.0046 </U_23>
</Uij>
</atom_site>
<atom_site>
<label> Si </label>
<fract_x> 0.46987 </fract_x>
<fract_y> 0.0 </fract_y>
<fract_z> 0.666667 </fract_z>
<occupancy> 1 </occupancy>
<Uij>
<U_11> 0.0066 </U_11>
<U_22> 0.0051 </U_22>
<U_33> 0.0060 </U_33>
<U_12> 0.00255 </U_12>
<U_13> 0.00015 </U_13>
<U_23> 0.0003 </U_23>
</Uij>
</atom_site>
</atom_site_list>
</data>
</CrysML>
<! MD5 signature without this line= f1227e979d666f6b9918625ab0e2e16b >
CrisML_schema.xsd :
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="CrysML" type="CrysMLType"/>
<xsd:complexType name="CrysMLType">
<xsd:sequence>
<xsd:element name="data_global" type="data_globalType"/>
<xsd:element name="data" type="dataType" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="data_globalType">
<xsd:sequence>
<xsd:element name="journal" type="journalType"/>
<xsd:element name="publ" type="publType"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="journalType">
<xsd:sequence>
<xsd:element name="name_full" type="xsd:string"/>
<xsd:element name="year" type="xsd:positiveInteger"/>
<xsd:element name="volume" type="xsd:positiveInteger"/>
<xsd:element name="pages" type="pagesType"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="pagesType">
<xsd:sequence>
<xsd:element name="first" type="xsd:positiveInteger"/>
<xsd:element name="last" type="xsd:positiveInteger"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="publType">
<xsd:sequence>
<xsd:element name="section" type="sectionType"/>
<xsd:element name="author_list" type="author_listType"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="sectionType">
<xsd:sequence>
<xsd:element name="title" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="author_listType">
<xsd:sequence>
<xsd:element name="author" type="authorType" maxOccurs="unbounded"/>
</xsd:sequence>
<xsd:attribute name="length" type="xsd:positiveInteger"/>
</xsd:complexType>
<xsd:complexType name="authorType">
<xsd:sequence>
<xsd:element name="name" type="xsd:string"/>
<xsd:element name="address" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="dataType">
<xsd:sequence>
<xsd:element name="chemical" type="chemicalType"/>
<xsd:element name="space_group" type="space_groupType"/>
<xsd:element name="cell" type="cellType"/>
<xsd:element name="atom_site_list" type="atom_site_listType"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="chemicalType">
<xsd:sequence>
<xsd:element name="formula" type="xsd:string"/>
<xsd:element name="compound_source" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="space_groupType">
<xsd:sequence>
<xsd:element name="crystal_system" type="crystal_systemType"/>
<xsd:element name="name_HM" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
<xsd:simpleType name="crystal_systemType">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="Triclinic"/>
<xsd:enumeration value="Monoclinic"/>
<xsd:enumeration value="Orthorombic"/>
<xsd:enumeration value="Tetragonal"/>
<xsd:enumeration value="Trigonal"/>
<xsd:enumeration value="Hexagonal"/>
<xsd:enumeration value="Cubic"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:simpleType name="Space_groupType">
<xsd:restriction base="xsd:positiveInteger">
<xsd:maxInclusive value="230"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:complexType name="cellType">
<xsd:sequence>
<xsd:element name="lengths" type="lengthsType"/>
<xsd:element name="angles" type="anglesType"/>
<xsd:element name="volume" type="xsd:float"/>
<xsd:element name="formula_units_Z" type="xsd:positiveInteger"/>
<xsd:element name="measurement" type="measurementType"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="lengthsType">
<xsd:sequence>
<xsd:element name="a" type="lengthType"/>
<xsd:element name="b" type="lengthType"/>
<xsd:element name="c" type="lengthType"/>
</xsd:sequence>
</xsd:complexType>
<xsd:simpleType name="lengthType">
<xsd:restriction base="xsd:decimal">
<xsd:minExclusive value="0.0"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:complexType name="anglesType">
<xsd:sequence>
<xsd:element name="alpha" type="angleType"/>
<xsd:element name="beta" type="angleType"/>
<xsd:element name="gamma" type="angleType"/>
</xsd:sequence>
</xsd:complexType>
<xsd:simpleType name="angleType">
<xsd:restriction base="xsd:decimal">
<xsd:minExclusive value="0.0"/>
<xsd:maxInclusive value="180.0"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:complexType name="measurementType">
<xsd:sequence>
<xsd:element name="temperature" type="xsd:positiveInteger"/>
<xsd:element name="reflns_used" type="xsd:positiveInteger"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="atom_site_listType">
<xsd:sequence>
<xsd:element name="atom_site" type="atom_siteType" maxOccurs="unbounded"/>
</xsd:sequence>
<xsd:attribute name="length" type="xsd:positiveInteger"/>
</xsd:complexType>
<xsd:complexType name="atom_siteType">
<xsd:sequence>
<xsd:element name="label" type="xsd:string"/>
<xsd:element name="fract_x" type="xsd:float"/>
<xsd:element name="fract_y" type="xsd:float"/>
<xsd:element name="fract_z" type="xsd:float"/>
<xsd:element name="occupancy" type="occupancyType"/>
<xsd:choice>
<xsd:element name="Biso" type="xsd:float"/>
<xsd:element name="Bij" type="BijType"/>
<xsd:element name="Uij" type="UijType"/>
</xsd:choice>
</xsd:sequence>
</xsd:complexType>
<xsd:simpleType name="occupancyType">
<xsd:restriction base="xsd:decimal">
<xsd:minExclusive value="0.0"/>
<xsd:maxInclusive value="1.0"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:complexType name="therm_disType">
<xsd:choice>
<xsd:element name="Biso" type="xsd:float"/>
<xsd:element name="Bij" type="BijType"/>
<xsd:element name="Uij" type="UijType"/>
</xsd:choice>
</xsd:complexType>
<xsd:complexType name="BijType">
<xsd:sequence>
<xsd:element name="B_11" type="xsd:float"/>
<xsd:element name="B_22" type="xsd:float"/>
<xsd:element name="B_33" type="xsd:float"/>
<xsd:element name="B_12" type="xsd:float"/>
<xsd:element name="B_13" type="xsd:float"/>
<xsd:element name="B_23" type="xsd:float"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="UijType">
<xsd:sequence>
<xsd:element name="U_11" type="xsd:float"/>
<xsd:element name="U_22" type="xsd:float"/>
<xsd:element name="U_33" type="xsd:float"/>
<xsd:element name="U_12" type="xsd:float"/>
<xsd:element name="U_13" type="xsd:float"/>
<xsd:element name="U_23" type="xsd:float"/>
</xsd:sequence>
</xsd:complexType>
</xsd:schema>
Note :
These two files can be validated using the following site : http://www.freeformatter.com/xmlvalidatorxsd.html