1. Preamble
The DFASDL specification contains the description of all currently valid elements of the Data Format and Semantics Description Language.
- Version
-
1.1.0-18-g5a707e4
Copyright (c) 2014 - 2017 Contributors as noted in the AUTHORS.md file
The DFASDL specification is distributed under the terms of the
Creative Commons Attribution 4.0 International license (CC BY 4.0).
2. Struktur
-
The root element of a DFASDL document is the element dfasdl.
2.1. Permitted nestings
The table Permitted nestings of the elements lists the permitted nestings of the elements within a DFASDL document. All elements of the topline can contain the marked elements.
x |
x |
x |
x |
x |
x |
||
x |
x |
x |
x |
x |
x |
||
x |
x |
x |
x |
x |
x |
||
x |
x |
||||||
x |
x |
x |
|||||
x |
x |
x |
x |
||||
x |
x |
x |
|||||
x |
x |
x |
x |
||||
x |
x |
x |
x |
||||
x |
x |
x |
|||||
x |
x |
x |
|||||
x |
x |
x |
x |
x |
x |
||
x |
x |
x |
x |
x |
x |
||
x |
x |
x |
x |
||||
x |
x |
x |
x |
x |
x |
||
x |
x |
x |
|||||
x |
x |
x |
|||||
x |
x |
x |
x |
x |
x |
||
x |
x |
x |
x |
2.2. Element groups
To make it easier to describe the elements, they are organized in specific groups.
2.2.1. Structure-Element-Group
Structure-Elements are used to describe the structure of the data.
Representatives:
2.2.2. Data-Element-Group
Data-Elements contain no other elements and are container for the data.
Representatives:
2.2.3. Time-Element-Group
Time-Elements contain no other elements and are container for time and date values.
Representatives:
3. Elements
3.1. bin
An element that contains binary data.
<bin byteOrder="littleEndian" id="ID1"/>
<bin encoding="Base64" id="ID2"/>
<bin mime="text/plain" id="ID3"/>
3.2. bin64
This element contains binary data that are encoded via Base64.
3.3. binHex
An element that contains hexadecimal encoded data.
3.4. celem
A choice-container-element defines the smallest possible entity within a choice element. It is recursively defined and can contain other elements.
-
A simple choice-container-element does not contain a value!
-
A simple choice-container-element can contain other elements.
-
A choice-container-element can only occur directly below a choice element.
<choice id="card">
<celem id="row" s="semantic">
<num id="row_num"/>
<str id="row_str"/>
</celem>
</choice>
3.5. choice
An element that allows the construction of alternatives in the structure.
-
Matching of a structure or elements
-
Elements must be within one or multiple celem elements within a choice.
-
The order of the data elements determines the matching. Therefore, specific data elements should be defined before a str element.
-
The last data elements within a choice should not contain a stop-sign.
<choice id="card">
<celem id="row1">
<num id="row1_num" start-sign="\d" stop-sign=";"/>
<str id="row1_str" start-sign="NAME" stop-sign=":"/>
</celem>
<celem id="row2">
<num id="row2_num" start-sign="\d" stop-sign=";" />
<str id="row2_str" start-sign="NAME"/>
</celem>
</choice>
3.6. cid
An element that can be used as nesting element for a data element.
-
A user defined ID represents the nesting element for a string or numerical data element
-
A user defined ID can define a class class.
<elem id="someElement">
<cid id="myCustomID" class="myCustomClass">
<str/>
</cid>
<str id="ID"/>
</elem>
<seq id="someList" min="2">
<elem id="structure">
<cid id="anotherCustomID" class="nestedClass">
<str id="ID"/>
</cid>
<str id="anotherID"/>
</elem>
</seq>
3.7. const
A constant is a nesting element for exactly one other element from the Data-Element-Group.
<const id="foo">
<str id="fooStr">Foo</str>
</const>
<const id="bar">
<num id="barNum">123</num>
</const>
3.8. date
An element that describes a date. The date must be in the ISO format
(yyyy-MM-dd
)!
<date id="dateField/>
3.9. datetime
An element that describes a complete date with time (timestamp). The timestamp must be in the ISO format!
<datetime id="dateTime"/>
3.10. dfasdl
The root element of a DFASDL document contains attributes that describe the document.
-
It exists only once in the whole document at the uppermost level.
-
The used semantic space is defined in the semantic attribute.
-
The attribute default-encoding can be used to set a default value for unset encoding attributes at elements.
<dfasdl xmlns="http://www.dfasdl.org/DFASDL" semantic="custom">
...
</dfasdl>
3.11. elem
An element defines the smallest possible entity within a format. It is recursively defined and can contain other elements.
-
A simple element does not contain a value!
-
A simple element can contain other elements.
<elem id="foo">
<seq id="bar" max="2">
<str id="foobar"/>
</seq>
</elem>
<elem id="empty"/>
3.12. fixseq
A fixed sequence specifies a repeating child structure with a finite set of elements.
-
A fixseq has the same characteristics like a seq, except that it defines a concrete number of elements.
-
The number of elements will be defined with the count attribute.
-
The stop-sign defines a character string that stops the sequence. If this stop-sign occurs in the data, the sequence is stopped and the next element after the sequence is processed.
<fixseq id="accountList" count="2">
<elem id="account">
<str id="number"/>
</elem>
</fixseq>
3.13. formatnum
A numerical data element that must fulfill the specified (format) format.
The following characters are valid within the data and the defaultnum attribute:
-
minus (
-
) -
numbers (
0-9
) -
point (
.
) -
comma (
,
)
<formatnum format="(\d\d\d)" id="ID" max-digits="12" />
<formatnum decimal-separator="." format="([0-9]{1,3}\.\d{1,2})" id="ID2"
max-digits="3" max-precision="2" />
The default value of the decimal-separator is the comma
(, ). If no value is specified for the decimal-separator, this
default value is used.
|
The matching part of the format attribut must be within a group (…) !
|
If the decimal separator is to be retained, it must be specified via the decimal-separator attribute. |
3.14. formatstr
An element for a string that must fulfill the specified (format) format.
<formatstr id="formatA" format="(\w\w\d)"/>
<formatstr id="formatB" format="(\w{1,10})"/>
<formatstr id="formatC" format=".*?:(.*)"/>
The matching part must be within a group (…) !
|
3.15. formattime
For date and time values that are not ISO conform. The specification for the format attribute must contain a value that can be processed by the following definition Java DateTimeFormatter!
<formattime id="my-time-is-now" format="dd.MM.yyyy HH:mm:ss X"/>
3.16. num
A data element that contains a numerical value.
-
A numerical element may only contains numbers and can contain a minus as first character.
-
A numerical element can define an exact number of digits (length).
-
The minus sign is not included in the calculation of the length.
-
-
A numerical element can specify a maximum number of signs (max-digits) that should be considered.
-
The minus sign is not included in the calculation of the length.
-
-
A numerical element can specify the number of signs after the comma (precision).
-
A numerical element can define a default value (defaultnum) that will be inserted for missing data values.
The following signs are valid in the data and in the defaultnum attribute:
-
minus (
-
) -
numbers (
0-9
)
<num id="numberA" length="4"/>
<num id="numberB" max-digits="5"/>
<num id="Pi" length="10" precision="9" defaultnum="3141592653"/>
3.17. ref
A reference refers to a data element within the document, that is placed at the position of the reference.
-
A reference must define a source ID (sid), that corresponds to the id of the referenced data element!
-
The referenced data element must be before the reference in the DFASDL.
-
If a reference is specified within a sequence, the reference must be at the end.
-
Only one reference is allowed within a sequence.
-
If no semantic meaning is defined for the reference (s), the semantic meaning of the referenced element is used.
<elem id="someBlockElement">
<elem id="anotherID">
<str id="firstname"/>
<str id="lastname"/>
<num id="mainNumber"/>
</elem>
</elem>
<ref id="number" sid="mainNumber"/>
<!-- Referenzieren aus einer Sequenz -->
<seq id="accountList" max="999">
<elem id="account">
<num id="account_id"/>
<str id="name"/>
<str id="account"/>
<seq id="children">
<elem id="alter">
<num id="anzahl"/>
<num id="age"/>
<ref sid="account_id" id="children_account_id">
</elem>
</seq>
</elem>
</seq>
3.18. seq
A sequence element defines a repeating structure.
-
A reference can define the following variants:
-
The IDs are not copied during the conversion within a sequence, but newly created in the class attribute. The ID
foo
becomesid:foo
. -
If the IDs should be deleted, the attribute keepID must be specified with
false
. -
Data-Elements must be placed within an elem element within a sequence.
-
The stop-sign defines a character string that can stop the sequence.
-
The attribute filter allows filtering upon the source data. Only data fullfilling the filter will be used.
<seq id="accountList" min="42" max="999">
<elem id="account">
<str id="number" class="foo"/>
<str id="name"/>
</elem>
</seq>
<seq id="accountList2" keepID="false">
<elem id="account">
<str id="number" class="bar"/>
</elem>
</seq>
<seq id="salaries" filter="salary > 20000">
<elem id="employee">
<str id="name"/>
<num id="salary"/>
</elem>
</seq>
3.19. str
A data element for character strings. Can be used as generic container to represent nearly every kind of data. (But should not)
-
A character element is only allowed to contain characters in the standard or defined encoding.
-
A character element can define the encoding of the expected characters (encoding).
-
A character element can define the exact number of allowed characters (length).
-
A character element can define the maximum number of characters (max-length).
-
A character element can define a number of signs that are used as stop signs (stop-sign).
-
A character element can define a default value that is inserted for missing data values (defaultstr).
<str id="A" encoding="UTF-16"/>
<str id="B" length="3"/>
<str id="C" max-length="5"/>
<str id="possiblyEmpty" defaultstr="missingValue"/>
<str id="D" stop-sign="\n"/>
3.20. sxp
An element that represents a Scala expression.
This element will be removed! |
<sxp id="expOne">
<ul><![CDATA[{List(apple, banana, orange).map(i => <li>{i}</li>)}]]></ul>
</sxp>
3.21. time
A data element for time values that must satisfy the ISO notation.
<time id="high-noon"/>
4. Attributes
4.1. Root attributes
Root attributes are only allowed at the root element dfasdl.
4.1.1. default-encoding
A default value for the encoding of read data. This has to be a valid definition like utf-8
.
This attribute is useful if all or most elements use the same encoding. |
4.1.2. semantic
This attribute describes the semantic space of the document. Currently, the following values are allowed:
-
custom
4.2. Generic attributes
Generic attributes are allowed on all elements besides the root element.
4.2.1. class
Defines a class definition for the element.
4.2.2. correct-offset
This attribute corrects the offset of the read-in data.
The offset can be corrected into the positive or the negative direction. |
4.2.3. encoding
The used encoding for the data. This has to be a valid definition like utf-8
.
4.2.4. id
-
An ID is a character string.
-
An ID must start with an alphabetic character.
-
An ID can contain characters from the ASCII alphabet, numbers, underscores and minus signs.
-
An ID is only allowed to exist once within the document.
-
Normally, all elements must define an ID!
If no ID is defined, the software automatically creates one. You should define your own IDs that are easier to read during the mapping process. |
4.2.5. s
Describes a semantic meaning of the element.
-
The semantic meaning is defined as character string.
-
Only values of the defined sematic space are allowed.
4.2.6. start-sign
A regular expression that describes the beginning of the element.
A start-sign is not allowed to be empty!
|
4.2.7. stop-sign
A regular expression that describes the end of the element data.
The default stop-sign considers UNIX and Windows line endings and
is defined as follows: \r\n?|\n
|
A stop-sign is not allowed to be empty!
|
4.3. Element specific attributes
Attributes that are only allowed at specific elements.
4.3.1. byteOrder
Defines the sort order for binary data. The following values are possible:
-
bigEndian
-
littleEndian
-
middleEndian
4.3.2. count
Defines a quantity.
4.3.3. db-auto-inc
The column of the element in the database is an auto-increment column. Meaning that the value of the column will be filled automatically if no value is provided.
Because database are limited in the usage of auto-increment columns you should use this attribute only on a simple num element without the attributes precision and length! |
<seq id="companies">
<elem id="companies-row">
<num id="companies-row-id" db-column-name="id" db-auto-inc="true"/>
</elem>
</seq>
4.3.4. db-column-name
The column name of the element in the database. If this is not set, the ID will be used as column name.
4.3.5. db-foreign-key
The foreign key definition of the database table described by the current element. You have to specify a comma separated list of DFASDL element ids that describe the referenced table columns.
<seq id="companies">
<elem id="companies-row">
<num id="companies-row-id" db-column-name="id"/> (1)
...
</elem>
</seq>
<seq id="contacts">
<elem id="contacts-row">
...
<num id="contacts-row-company-id" db-column-name="company_id" db-foreign-key="companies-row-id"/> (2)
</elem>
</seq>
4.3.6. db-insert
Allows the definition of database specific INSERT statements. The syntax must fulfill the definition for Prepared Statements.
INSERT INTO mytable (column1, column2) VALUES(?, ?)
It is possible to use a database specific SQL-Syntax. |
4.3.7. db-primary-key
Defines a primary key for a database table. If the attribute is defined, it must contain one or multiple (separated by comma) column names.
The column name(s) must correspond to the name of the database columns. |
4.3.8. db-select
Allows the execution of database specific SELECT statements.
SELECT
x374 AS column1,
y478 AS column2
FROM x2 JOIN y3 ON x2.id = y3.refId
WHERE x2.x23 = 1
ORDER BY y3.y1 ASC
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<dfasdl xmlns="http://www.dfasdl.org/DFASDL" semantic="custom">
<seq id="people" db-select="SELECT t1.name, firstname, title, telephone, t2.name AS productname FROM `people` AS t1, `products` AS t2 WHERE t1.pid = t2.pid">
<elem id="people_row">
<str db-column-name="name" id="people_row_name" max-length="12"/>
<str db-column-name="firstname" id="people_row_firstname" max-length="9"/>
<str db-column-name="title" id="people_row_title" max-length="22"/>
<str db-column-name="telephone" id="people_row_telephone" max-length="14"/>
<str db-column-name="productname" id="productname"/>
</elem>
</seq>
</dfasdl>
4.3.9. db-update
Allows the execution of database specific UPDATE statements. The syntax· must fulfill the definition for Prepared Statements.
UPDATE mytable SET id = ?, column1 = ?, column2 = ? WHERE id = ?
<?xml version="1.0" encoding="UTF-8"?>
<dfasdl xmlns="http://www.dfasdl.org/DFASDL" semantic="custom">
<seq id="people" db-primary-key="id" db-update="UPDATE people SET id = ?, name = ?, time = now() WHERE id = ?">
<elem id="people_row">
<num db-column-name="id" id="id" max-digits="5"/>
<str db-column-name="name" id="name" max-length="12"/>
</elem>
</seq>
</dfasdl>
It is possible to use database specific SQL-Syntax. |
4.3.10. decimal-separator
Defines a decimal separator for a numerical data element. The following values are allowed:
-
point (
.
) -
comma (
,
) -
Momayyez (
٫
)
4.3.11. defaultnum
Defines a default value for a numerical data element that is inserted when the data is empty.
4.3.12. defaultstr
Defines a character string for a data element that is inserted when the data is empty.
4.3.13. filter
Defines a filter expression that is used to limit the available source data.
Currently filtering is supported on databases only! |
Special characters that my lead to problems with XML like <
and & must be escaped properly!
|
<seq id="foo" filter="my-column-data < 1024">
...
</seq>
4.3.14. format
Contains the format definition for the content of the data element.
The matching part must be within a group (…) !
|
4.3.15. length
Defines the exact length of a character string.
4.3.16. keepID
Whether the values of the attribute id should be kept within sequences.
true
and false
are allowed.
The default value of this attribute is true .
|
4.3.17. max
Defines a maximum numerical value as Integer.
4.3.18. max-digits
Defines a maximum number of digits as Integer.
4.3.19. max-length
Defines the maximum length of a character string as Integer.
4.3.20. max-precision
Defines the precision after the comma for a numerical value.
4.3.21. mime
Defines the MIME type of binary data. e.g. application/postscript
.
4.3.22. min
Defines the minimum numerical value as Integer.
4.3.23. precision
Defines the precision. The number of positions after the comma for a numerical value.
4.3.24. sep
Defines a separator for the values of a data set.
This attribute is not used |
4.3.25. sid
Defines a source ID for a reference to another element.
4.3.26. trim
Whether the read-in character string should be cleaned. Spaces, tabulators and line breaks are deleted. The following values are posssible:
left |
Only at the beginning of the character string. |
right |
Only at the end of the character string. |
both |
At the beginning and the end of the character string. |
4.3.27. unique
The unique attribute indicates that a concrete value of the element
must only occur once! In principle this is the same like the UNIQUE
constraint in relational databases. The attribute maybe omitted or
contain "false" to be ignored. If set to true
it takes effect.
Currently this attribute is only allowed at numeric, string and time
elements.
4.3.28. value
Defines a value for a data set.
4.3.29. xml-attribute-name
Defines the name of the attribute at the XML element (defined via xml-attribute-parent). Allows to read-in data from XML attributes.
<seq id="foo">
<elem id="row">
<num id="age" xml-attribute-name="age" xml-attribute-parent="raw-data"/>
<num id="count" xml-attribute-name="count" xml-attribute-parent="raw-data"/>
</elem>
</seq>
4.3.30. xml-attribute-parent
Defines the name of a XML element that contains attributes which should be read-in. (see xml-attribute-name).
<seq id="foo">
<elem id="row">
<num id="age" xml-attribute-name="age" xml-attribute-parent="raw-data"/>
<num id="count" xml-attribute-name="count" xml-attribute-parent="raw-data"/>
</elem>
</seq>
4.3.31. xml-element-name
If the name of the XML element is not equal to id, it can be defined with this attribute (same as db-column-name).
<str id="some-id" xml-element-name="an-xml-id"/>