XML Import
Concept¶
Implex enables the import of XML files via a manual import configuration.
XML files use a standardized format that represents information in a text-based, hierarchical structure. This structure consists of elements, element attributes and element content, which in turn can contain further elements and attributes.
<xml version="1.0" ?>
<Elementname Attribut1="1" Attribut2="abc">
<Element>Element-Inhalt</Element>
</Elementname>
Based on the example in the section ReaderModule, an XML file with the same content can be used as the source:
<xml version="1.0" ?>
<People>
<Person Nr="1" Geschlecht="M" GebDat="19670812">
<Name>Meier</Name>
<Vorname>Max</Vorname>
<Hobbies>
<Hobby HobbyNr="1">Fußball</Hobby>
<Hobby HobbyNr="2">Hockey</Hobby>
</Hobbies>
</Person>
<Person Nr="2" Geschlecht="W" GebDat="19781103">
<Name>Müller</Name>
<Vorname>Sandra</Vorname>
<Hobbies>
<Hobby HobbyNr="1">Tanzen</Hobby>
<Hobby HobbyNr="2">Reiten</Hobby>
</Hobbies>
</Person>
</People>
The Implex import function can use any content of an element and any value of an attribute. A specific syntax for querying these values is available for this purpose.
The configuration file follows the basic structure with the sections Global, ReaderModule and Import.
Global¶
The CortexEngine connection parameters and the general working parameters for the import are defined in the global area.
<xml version="1.0" encoding="UTF-8"?>
<CtxImport>
<Global>
<LoginIP>[....]</LoginIP>
<LoginPort>[....]</LoginPort>
<LoginUser>[....]</LoginUser>
<LoginPW>[....]</LoginPW>
<ImportMode>[....]</ImportMode>
</Global>
<xml version="1.0" encoding="UTF-8"?>
<CtxImport>
<Global>
<LoginIP>127.0.0.1</LoginIP>
<LoginPort>29000</LoginPort>
<LoginUser>importuser</LoginUser>
<LoginPW>myPassWd</LoginPW>
<ImportMode>nu</ImportMode>
</Global>
ReaderModule¶
The ReaderModule area defines the reading parameters.
Determine here
- the source file, the main entry element for reading the records
- the element of a record
<ReaderModule type= "xml">
<IN_FILE>[....]</IN_FILE>
<MAIN_TAG>[....]</MAIN_TAG>
<DATASET_TAG>[....]</DATASET_TAG>
</ReaderModule>
<ReaderModule type= "xml">
<IN_FILE>/Pfad/zur/Quelle/2013Mai XML
</IN_FILE>
<MAIN_TAG>People</MAIN_TAG>
<DATASET_TAG>Person</DATASET_TAG>
</ReaderModule>
ImportSection¶
The ImportSection
contains the field assignments.
<ImportSection recordtype="[....]">
<FilterFunction>[....]</FilterFunction>
<Reference>[....]</Reference>
<Field>[....] = [....]</Field>
<RepGroup start="[....]">
<Field>[....] = [....]</Field>
</RepGroup>
</ImportSection>
- All information within the
Person
element is treated as one record. - Make sure to specify the path to the source file as an absolute path!
Configuration File¶
< xml version="1.0" encoding="UTF-8"?>
<CtxImport>
<Global>
<LoginIP>127.0.0.1</LoginIP>
<LoginPort>29001</LoginPort>
<LoginUser>importuser</LoginUser>
<LoginPW>myPasswd</LoginPW>
<ImportMode>nu</ImportMode>
</Global>
<ReaderModule type= "xml">
<IN_FILE>/Pfad/zur/Quelle/2013Mai XML
</IN_FILE>
<MAIN_TAG>People</MAIN_TAG>
<DATASET_TAG>Person</DATASET_TAG>
</ReaderModule>
<ImportSection recordtype="[....]">
<FilterFunction>[....]</FilterFunction>
<Reference>[....]</Reference>
<Field>[....] = [....]</Field>
<RepGroup start="[....]">
<Field>[....] = [....]</Field>
</RepGroup>
</ImportSection>
</CtxImport>
-
You can now make the field assignments in the
ImportSection
.To do this, you access individual elements and attribute values using a defined syntax.
Syntax¶
The defined syntax enables access to the content of an XML file.
Based on the definition in the ReaderModule
, the import mechanism runs through the XML source file and identifies a record.
The configuration of the reader module shown above with the entries MAIN_TAG
and DATASET_TAG
defines the entry point in a record.
Here, People
is used as an enclosing element for further content and Person
as individual records that are iterated through.
< xml version="1.0" ?>
<People>
<Person Nr="1" Geschlecht="M" GebDat="19670812">
<name>Smith</Name>
<Vorname>Max</Vorname>
<Hobbies>
<Hobby HobbyNr="1">Soccer</Hobby>
<Hobby HobbyNr="2">Hockey</Hobby>
</Hobbies>
</Person>
<Person Nr="2" Geschlecht="W" GebDat="19781103">
<Name>Müller</Name>
<Vorname>Cynthia</Vorname>
<Hobbies>
<Hobby HobbyNr="1">Dancing</Hobby>
<Hobby HobbyNr="2">Horse Riding</Hobby>
</Hobbies>
</Person>
</People>
Information per getChar
¶
To access specific information in each record, a field is addressed via getChar
, similar to CSV sources.To access specific information in each record, a field is addressed via getChar
, similar to CSV sources.
<ImportSection recordtype="PERS">
<Field>PerNam = getChar('Name')</Field>
</ImportSection>
</CtxImport>
Field Name¶
Simply specifying the field name permits direct use of the element content.
<ImportSection recordtype="PERS">
<Field>PerNam = getChar('Name')</Field>
<Field>PerGes = getChar('#Geschlecht')</Field>
<Field>PerNum = getChar('#Nr')</Field>
<Field>PerGeb = getChar('#GebDat')</Field>
</ImportSection>
</CtxImport>
- The
\#
character (hash) is required to use attributes of an element. - Personal number (
PerNum
), gender (PerGes
) or date of birth (PerGeb
) can be transferred.
Multiple Occurrences¶
If an element occurs more than once (e.g. 'Hobby'), it must be imported within a repeating group. It is also possible to combine it with other fields in the same group.
<ImportSection recordtype="PERS">
<Field>PerNam = getChar('Name')</Field>
<Field>PerGes = getChar('#Geschlecht')</Field>
<Field>PerNum = getChar('#Nr')</Field>
<Field>PerGeb = getChar('#GebDat')</Field>
<RepGroup start="Hobbies.Hobby">
<Field>HobNr = getChar('#HobbyNr')</Field>
<Field>Hobby = getChar()</Field>
</RepGroup>
</ImportSection>
</CtxImport>
- Within the element
Hobby
it is iterated and a singleHobby
is accessed. - To access a child element, the character
.
(dot) is used. - This reads and imports all child elements.
- In addition, the
Hobby
field is read directly by using thegetChar()
function without further parameters.
Note
Regardless of the examples, it is possible to combine the use of \#
and .
if deeper nested structures exist. Access via Hobbies.Hobby\#HobbyNr
would therefore be possible.