5. Detailed description of the Actions > 5.2. Input Actions > 5.2.10. XML/HTML File Reader > 5.2.10.2. HTML Extraction

Let’s assume that we have the following HTML file:

ANATEL~2_img268

When displayed inside a browser, this HTML file looks like this:

ANATEL~2_img269

There are two “iteration levels” inside this HTML file:

•The first iteration is about the Clients

•The second level of iteration is about the transactions that each client commited. The second level is “embedded” inside the first level.

To extract this HTML, you have 2 solutions.

Here are the parameters for the first solution:

ANATEL~2_img270

The parameter “Read only the subtags named:” is optional.

If left blank, Anatella simply iterates on all the subtags found here .

E.g. for the above example, you can let this parameter blank: It does not change anything.

The output of the first solution is (after removing the empy row):

ANATEL~2_img272

This first solution is not very good because the number of “Purchased items” extracted is limited to 3.

Here are the parameters for the second solution: Please note that we are now defining 2 levels:

ANATEL~2_img273

The output of this second solution is (after removing the empy row):

ANATEL~2_img275

This second solution is better than the first one because it does not impose any restriction on the number of transactions that each client commited (and, also, it looks more like the original HTML file).

Anatella allows you to define as many levels as you like, so that you can easily extract any data from any HTML file, whatever the size and structure. Furthermore, it’s very easy to find the right extraction parameters for the ReadXML Action (i.e. the right XPATHs) because you can directly use the XPath expressions generated by Chrome, Firefox or IE (i.e. nearly all other HTML parsers that are based on XPath expressions can’t use XPath expressions generated by Chrome, Firefox or IE because they have problems with <tbody> tags).

5.2.10.2. HTML Extraction – A More Complex Example

5.2.10.2. HTML Extraction – A More Complex Example