• Aucun résultat trouvé

Why not a Crystallographic Mark-up Language ?

N/A
N/A
Protected

Academic year: 2021

Partager "Why not a Crystallographic Mark-up Language ?"

Copied!
8
0
0

Texte intégral

(1)

HAL Id: hal-01289794

https://hal.archives-ouvertes.fr/hal-01289794v2

Preprint submitted on 21 Mar 2016

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Why not a Crystallographic Mark-up Language ?

Alain Soyer

To cite this version:

Alain Soyer. Why not a Crystallographic Mark-up Language ?. 2016. �hal-01289794v2�

(2)

Why not a Crystallographic Mark-up Language ?

Alain Soyer ([email protected])

Université Paris 6 - IMPMC CNRS UMR7590 - Case 115 Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie

Tour 23, 5ème étage, couloir 23-13, pièce 542 4 place Jussieu - 75005 Paris

Since 1991 CIF format (http://www.iucr.org/resources/cif/documentation) has proved to be very useful to crystallographers to exchange their data all over the world. In parallel to its popularization, the XML language and its derivatives (particularly HTML) has become universally adopted, with the huge development of internet.

In view of this evolution the following question arise : why not a Crystallographic Mark-up Language CrysML ? (I don't call it CML because Chemical Mark-up Language already exist).

Despite its quality, CIF have some inconveniences in comparison with XML: the most important is probably that it is not “object oriented”, and also that CIF files are not a “well formed” documents.

So I was wondering if it is not a good idea to rewrite CIF in XML format in the future.

Below is a first small file quartz.xml (created by hand with a basic vi editor !) : of course it is not a true valid crystallographic file, but just a piece of example to show what a CrysML file would look like . You may open this file in your browser.

One may note that :

- each block of data is delimited by an open <tag> and have a corresponding close </tag>, so a CrysML file seems more “robust” than CIF. A bloc corresponds to an “object” (in the meaning of computer language) and will probably be represented by a “structure” (in C or FORTRAN

language) with the same organisation in memory of programs that will read or create CrysML files.

- tag names may be identical to pieces of CIF names and consequently familiar to crystallographers.

An exception is for CIF “loop” to be replaced by list of items. I have added an “attribute” named length to indicate the number of items in a list. I know that this is not mandatory but I think that it is a “good practice” because a program reading the file can perform dynamical memory allocation of an array of structures to store the items of the list (and also may verify that the number of items really in the list is equal to the number announced in the attribute).

After quartz.xml is a second file CrisML_schema.xsd, referenced in the first file, containing a very rudimentary and incomplete “XML schema” , to play the role of the CIF dictionary.

If such an evolution is considered, the good news is that it will be quite easy to add an option to create XML files in a current software that creates CIF files.

Modifying programs that read CIF files to support CrysML files will need a more important work (but I think it is probably less difficult to get data you are interesting from a XML file than from a CIF one).

Of course the bad news will be the need to create an XML schema to replace CIF dictionary, witch

is certainly a very tedious important work.

(3)

quartz.xml :

<?xml version="1.0" encoding="UTF­8"?>

<CrysML xmlns:xsi="http://www.w3.org/2001/XMLSchema­instance"

        xsi:noNamespaceSchemaLocation="CrisML_schema.xsd">

<data_global>

  <journal>

    <name_full> Acta Crystallographica B </name_full>

    <year> 1976 </year>

    <volume> 32 </volume>

    <pages>

      <first> 2456 </first>

      <last>  2459 </last>

    </pages>

  </journal>

  <publ>

    <section>

      <title>

        Refinement of the crystal structure of low­quartz       </title>

    </section>

    <author_list length="2"> 

      <author>

        <name> Yvon Le Page </name>

        <address>

      Department of Geological Sciences, McGill University,       Montreal, Quebec, H3C 3G1, Canada

        </address>

      </author>

      <author>

        <name> Gabrielle Donnay </name>

        <address>

      Department of Geological Sciences, McGill University,       Montreal, Quebec, H3C 3G1, Canada

        </address>

      </author>

    </author_list>

  </publ>

</data_global>

<data>

  <chemical>

    <formula> Si O2 </formula>

    <compound_source> synthesised at Bell Lab </compound_source>

  </chemical>

  <space_group>

    <crystal_system>Trigonal</crystal_system>

    <name_H­M> P 32 2 1 </name_H­M>

  </space_group>

  <cell>

    <lengths>

      <a> 4.9134 </a>

      <b> 4.9134 </b>

      <c> 5.4052 </c>

    </lengths>

    <angles>

      <alpha>  90 </alpha>

      <beta>   90 </beta> 

(4)

      <gamma> 120 </gamma>

    </angles>

    <volume> 113.01 </volume>

    <formula_units_Z> 3 </formula_units_Z>

    <measurement>

      <temperature> 293 </temperature>

      <reflns_used> 342 </reflns_used>

    </measurement>

  </cell>

  <atom_site_list length="2">

    <atom_site>

      <label> O </label>

      <fract_x> 0.4141   </fract_x>

      <fract_y> 0.2681   </fract_y>

      <fract_z> 0.785467 </fract_z>

      <occupancy> 1 </occupancy>

      <Uij>

        <U_11>  0.0156 </U_11>

        <U_22>  0.0115 </U_22>

        <U_33>  0.0119 </U_33>

        <U_12>  0.0092 </U_12>

        <U_13> ­0.0029 </U_13>

        <U_23> ­0.0046 </U_23>

      </Uij>

    </atom_site>

    <atom_site>

      <label> Si </label>

      <fract_x> 0.46987  </fract_x>

      <fract_y> 0.0      </fract_y>

      <fract_z> 0.666667 </fract_z>

      <occupancy> 1 </occupancy>

      <Uij>

        <U_11>  0.0066  </U_11>

        <U_22>  0.0051  </U_22>

        <U_33>  0.0060  </U_33>

        <U_12>  0.00255 </U_12>

        <U_13> ­0.00015 </U_13>

        <U_23> ­0.0003  </U_23>

      </Uij>

    </atom_site>

  </atom_site_list>

</data>

</CrysML>

<!­­ MD5 signature without this line= f1227e979d666f6b9918625ab0e2e16b ­­>

(5)

CrisML_schema.xsd :

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

<xsd:element name="CrysML" type="CrysMLType"/>

<xsd:complexType name="CrysMLType">

  <xsd:sequence>

    <xsd:element name="data_global" type="data_globalType"/>

    <xsd:element name="data" type="dataType" maxOccurs="unbounded"/>

  </xsd:sequence>

</xsd:complexType>

<xsd:complexType name="data_globalType">

  <xsd:sequence>

    <xsd:element name="journal" type="journalType"/>

    <xsd:element name="publ"    type="publType"/>

  </xsd:sequence>

</xsd:complexType>

<xsd:complexType name="journalType">

  <xsd:sequence>

    <xsd:element name="name_full" type="xsd:string"/>

    <xsd:element name="year"      type="xsd:positiveInteger"/>

    <xsd:element name="volume"    type="xsd:positiveInteger"/>

    <xsd:element name="pages"     type="pagesType"/>

  </xsd:sequence>

</xsd:complexType>

<xsd:complexType name="pagesType">

  <xsd:sequence>

    <xsd:element name="first" type="xsd:positiveInteger"/>

    <xsd:element name="last"  type="xsd:positiveInteger"/>

  </xsd:sequence>

</xsd:complexType>

<xsd:complexType name="publType">

  <xsd:sequence>

    <xsd:element name="section"     type="sectionType"/>

    <xsd:element name="author_list" type="author_listType"/>

  </xsd:sequence>

</xsd:complexType>

<xsd:complexType name="sectionType">

  <xsd:sequence>

    <xsd:element name="title" type="xsd:string"/>

  </xsd:sequence>

</xsd:complexType>

<xsd:complexType name="author_listType">

  <xsd:sequence>

    <xsd:element name="author" type="authorType" maxOccurs="unbounded"/>

  </xsd:sequence>

  <xsd:attribute name="length" type="xsd:positiveInteger"/>

</xsd:complexType>

<xsd:complexType name="authorType">

  <xsd:sequence>

    <xsd:element name="name"    type="xsd:string"/>

    <xsd:element name="address" type="xsd:string"/>

(6)

  </xsd:sequence>

</xsd:complexType>

<xsd:complexType name="dataType">

  <xsd:sequence>

    <xsd:element name="chemical"       type="chemicalType"/>

    <xsd:element name="space_group"    type="space_groupType"/>

    <xsd:element name="cell"       type="cellType"/>

    <xsd:element name="atom_site_list" type="atom_site_listType"/>

  </xsd:sequence>

</xsd:complexType>

<xsd:complexType name="chemicalType">

  <xsd:sequence>

    <xsd:element name="formula"         type="xsd:string"/>

    <xsd:element name="compound_source" type="xsd:string"/>

  </xsd:sequence>

</xsd:complexType>

<xsd:complexType name="space_groupType">

  <xsd:sequence>

    <xsd:element name="crystal_system" type="crystal_systemType"/>

    <xsd:element name="name_H­M"       type="xsd:string"/>

  </xsd:sequence>

</xsd:complexType>

<xsd:simpleType name="crystal_systemType">

    <xsd:restriction base="xsd:string">

        <xsd:enumeration value="Triclinic"/>

        <xsd:enumeration value="Monoclinic"/>

        <xsd:enumeration value="Orthorombic"/>

        <xsd:enumeration value="Tetragonal"/>

        <xsd:enumeration value="Trigonal"/>

        <xsd:enumeration value="Hexagonal"/>

        <xsd:enumeration value="Cubic"/>

    </xsd:restriction>

</xsd:simpleType>

<xsd:simpleType name="Space_groupType">

    <xsd:restriction base="xsd:positiveInteger">

        <xsd:maxInclusive value="230"/>

    </xsd:restriction>

</xsd:simpleType>

<xsd:complexType name="cellType">

  <xsd:sequence>

    <xsd:element name="lengths"         type="lengthsType"/>

    <xsd:element name="angles"      type="anglesType"/>

    <xsd:element name="volume"      type="xsd:float"/>

    <xsd:element name="formula_units_Z" type="xsd:positiveInteger"/>

    <xsd:element name="measurement"     type="measurementType"/>

  </xsd:sequence>

</xsd:complexType>

<xsd:complexType name="lengthsType">

  <xsd:sequence>

    <xsd:element name="a" type="lengthType"/>

    <xsd:element name="b" type="lengthType"/>

    <xsd:element name="c" type="lengthType"/>

  </xsd:sequence>

(7)

</xsd:complexType>

<xsd:simpleType name="lengthType">

    <xsd:restriction base="xsd:decimal">

        <xsd:minExclusive value="0.0"/>

    </xsd:restriction>

</xsd:simpleType>

<xsd:complexType name="anglesType">

  <xsd:sequence>

    <xsd:element name="alpha" type="angleType"/>

    <xsd:element name="beta"  type="angleType"/>

    <xsd:element name="gamma" type="angleType"/>

  </xsd:sequence>

</xsd:complexType>

<xsd:simpleType name="angleType">

    <xsd:restriction base="xsd:decimal">

        <xsd:minExclusive value="0.0"/>

        <xsd:maxInclusive value="180.0"/>

    </xsd:restriction>

</xsd:simpleType>

<xsd:complexType name="measurementType">

  <xsd:sequence>

    <xsd:element name="temperature" type="xsd:positiveInteger"/>

    <xsd:element name="reflns_used" type="xsd:positiveInteger"/>

  </xsd:sequence>

</xsd:complexType>

<xsd:complexType name="atom_site_listType">

  <xsd:sequence>

    <xsd:element name="atom_site" type="atom_siteType" maxOccurs="unbounded"/>

  </xsd:sequence>

  <xsd:attribute name="length" type="xsd:positiveInteger"/>

</xsd:complexType>

<xsd:complexType name="atom_siteType">

  <xsd:sequence>

    <xsd:element name="label"     type="xsd:string"/>

    <xsd:element name="fract_x"   type="xsd:float"/>

    <xsd:element name="fract_y"   type="xsd:float"/>

    <xsd:element name="fract_z"   type="xsd:float"/>

    <xsd:element name="occupancy" type="occupancyType"/>

    <xsd:choice>

        <xsd:element name="Biso" type="xsd:float"/>

        <xsd:element name="Bij"  type="BijType"/>

        <xsd:element name="Uij"  type="UijType"/>

    </xsd:choice>

  </xsd:sequence>

</xsd:complexType>

<xsd:simpleType name="occupancyType">

    <xsd:restriction base="xsd:decimal">

        <xsd:minExclusive value="0.0"/>

        <xsd:maxInclusive value="1.0"/>

    </xsd:restriction>

</xsd:simpleType>

<xsd:complexType name="therm_disType">

(8)

    <xsd:choice>

        <xsd:element name="Biso" type="xsd:float"/>

        <xsd:element name="Bij"  type="BijType"/>

        <xsd:element name="Uij"  type="UijType"/>

    </xsd:choice>

</xsd:complexType>

<xsd:complexType name="BijType">

  <xsd:sequence>

    <xsd:element name="B_11" type="xsd:float"/>

    <xsd:element name="B_22" type="xsd:float"/>

    <xsd:element name="B_33" type="xsd:float"/>

    <xsd:element name="B_12" type="xsd:float"/>

    <xsd:element name="B_13" type="xsd:float"/>

    <xsd:element name="B_23" type="xsd:float"/>

  </xsd:sequence>

</xsd:complexType>

<xsd:complexType name="UijType">

  <xsd:sequence>

    <xsd:element name="U_11" type="xsd:float"/>

    <xsd:element name="U_22" type="xsd:float"/>

    <xsd:element name="U_33" type="xsd:float"/>

    <xsd:element name="U_12" type="xsd:float"/>

    <xsd:element name="U_13" type="xsd:float"/>

    <xsd:element name="U_23" type="xsd:float"/>

  </xsd:sequence>

</xsd:complexType>

</xsd:schema>

Note :

These two files can be validated using the following site : http://www.freeformatter.com/xml­validator­xsd.html

Références

Documents relatifs

In a response to our critique, Rockström and colleagues described their ideas ‘ in a nutshell, if the tipping point is the cliff, the planetary boundary is the fence near the

implementation of new releases of altri software. I would appreciate any suggestions whi~h you may have which would be beneficial in establishing an acceptable

State transfer faults may also be masked, though to mask a state transfer fault in some local transition t of M i it is necessary for M to be in a global state that leads to one or

To be sure, sortalist theories aren’t without controversy, but despite the controversy we can see from the foregoing considerations the deep confusion in the

I just don’t like math, or I think it is interesting, but difficult … Mathematics classroom setting influencing inclusion.. Eleventh Congress of the European Society for Research

‘The role of typology in the organization of the multilingual lexicon’ by Jasone Cenoz reports a qualitative study on child L1 Spanish–L2 Basque–L3 English that sheds light on

In this work, a new method to produce a concise summary of sequences of events related to time is presented, which is based on the data size reduction obtained merging time

A related, but different conside- ration, is that taking a reference wage rate that is greater than the minimum wage rate will some- times produce optimal allocations, in a perfectly