9.1.1. Selecting the Right Language to create your new Actions

<< Click to Display Table of Contents >>

Navigation:  9. Anatella for the Expert Users > 9.1. Developing new Script-Based Anatella Actions >

9.1.1. Selecting the Right Language to create your new Actions

 

To create new Actions, you can choose between different (scripting) languages: R, Python, JavaScript or C/C++.

 

Here are the Pro&Con of each of these languages:

 

Criteria

JavaScript

R

Python

C/C++

Runtime Speed

★★★

AN76E5~1_img89

★ x 30

Scalability

(i.e. Ability to process large tables)

★★★★★★

AN76E5~1_img89

★★

★ x 30

General-Purpose code available from the Internet, directly ready to “Copy-Paste”

★★

★★★★

★★★★★★

★★★

Availability of Machine Learning Algorithm

(i.e. will you find the ML algorithm that you are

searching for?)

No

★★★★

★★

★★★★

Short Development Time

Easy to rapidly get some results

★★★★★AN76E5~1_img89

★★★

★★★★

No

Directly offers some facilities to easily do Matrix/Vector computations

No

Yes

Yes

No

 

The Syntax of the Language & the Libraries is

agreable/coherent.

★★★★

AN76E5~1_img89

★★

AN76E5~1_img89

 
 

 

AN76E5~1_img5

What's the difference between JavaScript and Java?

Actually, the 2 languages have almost nothing in common except for the name. Java is coded in a similar fashion to C++. It is powerful enough to write major applications. Java has been generating a lot of excitement because of its unique ability to run the same program on IBM, Mac and Unix computers. Java is not considered an easy-to-use language for non-programmers.

JavaScript is much simpler to use than Java. No compilation, no applets, just a simple sequence.

 
 

Some Comments about the above table:
 

C/C++ is the fastest, the most scalable and the most universal langage. Actually, the (R/Python/JS) languages are all developed in (C/C++) and thus, most of the functionalities from these languages are also directly available in C/C++. Unfortunately, C/C++ is also the most difficult language to master. The time required to develop a new functionality using C/C++ is usually (at least) 20 times greater than with any of the other languages. So, unless you need extreme speed, I would suggest you to avoid C/C++.
 

In terms of Speed and Scalability, JavaScript is clearly the winner (after the C/C++). So, if you need to do some simple data processing on some large datasets, avoid R and Python: Use JavaScript.

The JavaScript engine included inside Anatella is more scalable than the R/Ptyhon engine because it processes data “row-by-row” in “streaming mode”. By default, the R/Python engines cannot process data “row-by-row” (in streaming mode) and are thus limited: They can only handle smaller data tables (where the whole table must fit inside the limited RAM of the computer). Inside Anatella, there is a special mode that allows to manipulate large tables (i.e. larger than the RAM) using the R&Python engines (i.e. when you partition the input table) but this mode is usually not as efficient (in terms of speed) as the normal “streaming mode” that is always used in JavaScript.

The JavaScript engine included inside Anatella contains an advanced debugger: You can place breakpoints, look at the variables inside the watch window, etc. This means that the debugging of Javascript-based Actions is really easy (compared to R/Python actions that do not have any debugger).
 

Sometime, you won’t be able to use Javascript to code your new Action because:
 

oYou need to use a specific library that only exists in R or Python.

The required library can be of two different types:

General Purpose Library: e.g. To access some REST service, to read some specific file format, to post some results inside a non-SQL database, etc.  Your best option is then to use Python.

Machine Learning Library: Your best option is to use R because R has still the largest Machine Learning Library and the largest Algorithmic Library.
 

oYou need to do many Matrix/Vector computations: You can use R or Python indeferently. R is usually slightly faster during runtime but Python is easier/faster to write. If you only have vector computations to perform, you might also be interested in using the simple&fast AN76E5~1_img95 Vectorized Calculator Action (see section 5.7.10).

 

 

AN76E5~1_img5

The Appendix A of this document contains an introduction to the Javascript Language. After reading this short introduction (4 pages), you should be able to code (nearly) any Action using Javascript.

 

Although Javascript is not very popular (yet!) in the particular field of the “data science”, it still is a language that worth learning, just because of its vast popularity in every other fields of the “computer sciences”: See appendix C (section 11.3) for more details about the popularity of Javascript.

 

 
 
Here are some more comments about each language:
 

C/C++.
 

You can recognize these Actions because they don’t have any Javascript clip0954, R clip0955  or Python clip0956 logo:

clip0959

 
 

Javascript.
 

You can recognize these Actions because they a Javascript clip0958 logo:

 

clip0960

 
When developing a new Action in Anatella, the Javascript language is usually the best solution because:
 

oThe syntax of the language is cleaner,

oIt’s faster than R or Python

oIt can easily run in “streaming” mode inside Anatella (i.e. with a very low RAM-memory-consumption).

oThere is already a large quantity of Javascript-based Actions inside Anatella that you can use as “starting point” to create your own Actions.

oYou won’t be limited by the Anatella Javascript engine because it fully supports the latest version of the Javascript/ECMAScript language (see the sections 11.1, 11.2, 11.5, 11.6 for more details about Javascript/ECMAScript)

 

To know how to create new Actions in Javascript, see the section 9.2.1.

 

 

R

 
You can recognize these Actions because they have a small “R" clip0961 logo:
 

AN76E5~1_img99

 

The integration between R and Anatella gives access to a very large library of algorithms because the R language is the language that offers the largest library of algorithms. These algorithms cover many different usages: machine learning, time series, clustering, etc. If you need a specific algorithm, there are good chances that it already exists inside R.

 

To know how to create new Actions in R, see the section 9.2.2.

 

 

Python
 

You can recognize these Actions because they have a small “Python" clip0962 logo:
 

AN76E5~1_img100

 

Most developpers like Python because of its easy syntax and the ability to usually arrive quickly to a working solution. However, this comes at a cost: i.e. The Python language is very (very!) slow, it consumes a large quantity of RAM memory and it offers a limited set of Machine Learning algorithms compared to the R language.

 

Many coders, unaware of Anatella, are using Python to develop simple ETL scripts (e.g. using Panda dataframes). This is an error because ETL script that are developped in Python are:
 

o…several orders of magnitude slower than the same ETL created purely using Anatella Standard Actions.
 

o…using a very high quantity of RAM memory because, in Python, all data-transformations are 100% “in RAM” (although, if you are using the Python engine included inside an Anatella data-flow, Anatella can manage to “stream” your datasets, so that your python scripts only uses a reduced quantity of RAM).
 

o…not scalable to large data volumes (e.g. for Big Data applications). Since all data-transformations in Python are “in RAM”, you are limited to the RAM of you server/laptop. In opposition, Anatella has no size limitation because it processes data in “streaming” mode.
 

o…slower to develop&create than a simple Anatella graph. This usually translates to larger bugdets and more delays for the final end-users.
 

o…difficult to maintain & support compared to the “easy to understand” Anatella graphs. This usually translates to much larger “support&maintenance” costs for the final end-users (that needs specific&costly man power to maintain their operational system).
 

Because of the limitations of Python listed here above, we advise you to only use Python when you have a very specific and very complex (and also very rare) data transformation to create that could not easily be coded using simple Anatella Actions.

 

To know how to create new Actions in Python, see the section 9.2.3.