Skip to content

XML Import

Concept

Implex enables the import of XML files via a manual import configuration.

XML files use a standardized format that represents information in a text-based, hierarchical structure. This structure consists of elements, element attributes and element content, which in turn can contain further elements and attributes.

Basic XML File
<xml version="1.0" ?>
<Elementname Attribut1="1" Attribut2="abc">
  <Element>Element-Inhalt</Element>
</Elementname>

Based on the example in the section ReaderModule, an XML file with the same content can be used as the source:

XML File
<xml version="1.0" ?>
<People>
    <Person Nr="1" Geschlecht="M" GebDat="19670812">
        <Name>Meier</Name>
        <Vorname>Max</Vorname>
        <Hobbies>
            <Hobby HobbyNr="1">Fußball</Hobby>
            <Hobby HobbyNr="2">Hockey</Hobby>
        </Hobbies>
    </Person>
    <Person Nr="2" Geschlecht="W" GebDat="19781103">
        <Name>Müller</Name>
        <Vorname>Sandra</Vorname>
        <Hobbies>
            <Hobby HobbyNr="1">Tanzen</Hobby>
            <Hobby HobbyNr="2">Reiten</Hobby>
        </Hobbies>
    </Person>
</People>

The Implex import function can use any content of an element and any value of an attribute. A specific syntax for querying these values is available for this purpose.

The configuration file follows the basic structure with the sections Global, ReaderModule and Import.

Global

The CortexEngine connection parameters and the general working parameters for the import are defined in the global area.

Global Area
<xml version="1.0" encoding="UTF-8"?>
<CtxImport>
  <Global>
    <LoginIP>[....]</LoginIP>
    <LoginPort>[....]</LoginPort>
    <LoginUser>[....]</LoginUser>
    <LoginPW>[....]</LoginPW>
    <ImportMode>[....]</ImportMode>
  </Global>
Example Global Area
<xml version="1.0" encoding="UTF-8"?>
<CtxImport>
  <Global>
    <LoginIP>127.0.0.1</LoginIP>
    <LoginPort>29000</LoginPort>
    <LoginUser>importuser</LoginUser>
    <LoginPW>myPassWd</LoginPW>
    <ImportMode>nu</ImportMode>
  </Global>

ReaderModule

The ReaderModule area defines the reading parameters.

Determine here

  • the source file, the main entry element for reading the records
  • the element of a record
ReaderModule
<ReaderModule type= "xml">
    <IN_FILE>[....]</IN_FILE>
    <MAIN_TAG>[....]</MAIN_TAG>
    <DATASET_TAG>[....]</DATASET_TAG>
</ReaderModule>
Example ReaderModule
<ReaderModule type= "xml">
    <IN_FILE>/Pfad/zur/Quelle/2013Mai XML
  </IN_FILE>
    <MAIN_TAG>People</MAIN_TAG>
    <DATASET_TAG>Person</DATASET_TAG>
</ReaderModule>

ImportSection

The ImportSection contains the field assignments.

ImportSection
<ImportSection recordtype="[....]">
    <FilterFunction>[....]</FilterFunction>
    <Reference>[....]</Reference>
    <Field>[....] = [....]</Field>
    <RepGroup start="[....]">
        <Field>[....] = [....]</Field>
    </RepGroup>
</ImportSection>
  • All information within the Person element is treated as one record.
  • Make sure to specify the path to the source file as an absolute path!

Configuration File

Config
< xml version="1.0" encoding="UTF-8"?>
<CtxImport>
    <Global>
    <LoginIP>127.0.0.1</LoginIP>
    <LoginPort>29001</LoginPort>
    <LoginUser>importuser</LoginUser>
    <LoginPW>myPasswd</LoginPW>
    <ImportMode>nu</ImportMode>
</Global>
<ReaderModule type= "xml">
    <IN_FILE>/Pfad/zur/Quelle/2013Mai XML
  </IN_FILE>
    <MAIN_TAG>People</MAIN_TAG>
    <DATASET_TAG>Person</DATASET_TAG>
</ReaderModule>
<ImportSection recordtype="[....]">
    <FilterFunction>[....]</FilterFunction>
    <Reference>[....]</Reference>
    <Field>[....] = [....]</Field>
    <RepGroup start="[....]">
        <Field>[....] = [....]</Field>
    </RepGroup>
</ImportSection>
</CtxImport>
  • You can now make the field assignments in the ImportSection.

    To do this, you access individual elements and attribute values using a defined syntax.

Syntax

The defined syntax enables access to the content of an XML file. Based on the definition in the ReaderModule, the import mechanism runs through the XML source file and identifies a record.

The configuration of the reader module shown above with the entries MAIN_TAG and DATASET_TAG defines the entry point in a record. Here, People is used as an enclosing element for further content and Person as individual records that are iterated through.

Example Syntax
< xml version="1.0" ?>
<People>
    <Person Nr="1" Geschlecht="M" GebDat="19670812">
        <name>Smith</Name>
        <Vorname>Max</Vorname>
        <Hobbies>
            <Hobby HobbyNr="1">Soccer</Hobby>
            <Hobby HobbyNr="2">Hockey</Hobby>
        </Hobbies>
    </Person>
    <Person Nr="2" Geschlecht="W" GebDat="19781103">
        <Name>Müller</Name>
        <Vorname>Cynthia</Vorname>
        <Hobbies>
            <Hobby HobbyNr="1">Dancing</Hobby>
            <Hobby HobbyNr="2">Horse Riding</Hobby>
        </Hobbies>
    </Person>
</People>

Information per getChar

To access specific information in each record, a field is addressed via getChar, similar to CSV sources.To access specific information in each record, a field is addressed via getChar, similar to CSV sources.

Transferring the Field 'Name' per getChar
<ImportSection recordtype="PERS">
    <Field>PerNam = getChar('Name')</Field>
</ImportSection>
</CtxImport>

Field Name

Simply specifying the field name permits direct use of the element content.

Example Field Name
<ImportSection recordtype="PERS">
    <Field>PerNam = getChar('Name')</Field>
    <Field>PerGes = getChar('#Geschlecht')</Field>
    <Field>PerNum = getChar('#Nr')</Field>
    <Field>PerGeb = getChar('#GebDat')</Field>
</ImportSection>
</CtxImport>
  • The \# character (hash) is required to use attributes of an element.
  • Personal number (PerNum), gender (PerGes) or date of birth (PerGeb) can be transferred.

Multiple Occurrences

If an element occurs more than once (e.g. 'Hobby'), it must be imported within a repeating group. It is also possible to combine it with other fields in the same group.

Example of Multiple Occurrences
<ImportSection recordtype="PERS">
    <Field>PerNam = getChar('Name')</Field>
    <Field>PerGes = getChar('#Geschlecht')</Field>
    <Field>PerNum = getChar('#Nr')</Field>
    <Field>PerGeb = getChar('#GebDat')</Field>
    <RepGroup start="Hobbies.Hobby">
        <Field>HobNr = getChar('#HobbyNr')</Field>
        <Field>Hobby = getChar()</Field>
    </RepGroup>
</ImportSection>
</CtxImport>
  • Within the element Hobby it is iterated and a single Hobby is accessed.
  • To access a child element, the character .(dot) is used.
  • This reads and imports all child elements.
  • In addition, the Hobby field is read directly by using the getChar() function without further parameters.

Note

Regardless of the examples, it is possible to combine the use of \# and . if deeper nested structures exist. Access via Hobbies.Hobby\#HobbyNr would therefore be possible.