5.2.8. SAS (.sas7bdat), SPSS (.sav and .por) and STATA (.dta) File Reader

<< Click to Display Table of Contents >>

Navigation:  5. Detailed description of the Actions > 5.2. Input Actions >

5.2.8. SAS (.sas7bdat), SPSS (.sav and .por) and STATA (.dta) File Reader

 
Icon: ANATEL~2_img228

 
Function: readStat
 

Property window:

 

ANATEL~2_img227

 

Short description:

 

Reads a table from a SAS (.sas7bdat), SPSS (.sav and .por) or STATA (.dta) File.

 

Long Description:

 
See section 5.1.1. to have more information on how to specify the filename of the “.cgel_anatella” file (i.e. You can use relative path, wildcards, and Javascript to specify your filename). You can also connect to the input pin of the ANATEL~2_img228  readStat Action a table containing (many) filenames.

 

 

ANATEL~2_img8

You can drag&drop a “.sas7bdat” file, a “.sav” file, a “.por” file or a “.dta” file from a MS-File-Explorer-Window into an Anatella-Graph-Window: This will directly create the corresponding ANATEL~2_img228 ReadStat Action inside the Anatella graph.

 

The ANATEL~2_img228  readStat Action supersedes the old ANATEL~2_img233   readSAS Action that was the only choice available in the older Anatella versions (older than v1.38). The ANATEL~2_img228  readStat Action has many advantages compared to the old ANATEL~2_img233   readSAS Action:
 

1.It’s about three times faster.
 

2.It does not require you to install any SAS OleDB drivers on your computer in order to operate. This means that you can now always read your .sas7bdat files, even without requiring any “administrative” privileges from your IT department (because “administrative” privileges are required to install the SAS OleDB driver).

 

For faster processing speed, the ANATEL~2_img228  readStat Action uses an asynchronous (i.e. non-bloking) I/O algorithm (See the section 5.2.6.2. about asynchronous I/O algorithms).

 
 

ANATEL~2_img8

Sometime, the readStat ANATEL~2_img228 Action does not correctly extract the dates stored inside some “date fields” inside a .SAS7BDAT file (i.e. you see a number instead of the actual date and time). When this happens, you need to perform an extra step to convert these dates to “normal” Anatella dates: i.e. Use the “to String from Elapsed Time” option of the ChangeDataType ANATEL~2_img9  Action with these parameters:
 

* Reference Time: 19600101 00:00:00

* Elapsed Time Unit: day

Here is a screenshot:

ANATEL~2_img240

 

 

 

ANATEL~2_img8

DISCLAIMER.

 

This Action allows the extraction of the data stored inside files originating from various commercial statistical systems (namely SAS, SPSS and Stata). The supported files are currently of extension: .sas7bdat (from SAS), .sav (from SPSS), .por (from SPSS) and .dta (from Stata). These files we be referred inside the rest of this section as the "Data Files".

 

The format of the Data Files belongs to their respective owner. These formats are proprietary and undocumented. Various students and coders all around the world have tried to decipher the internal structures of these proprietary formats. The end-result of their work is an open-source library (named "ReadStat") that we used inside this Anatella Action to decrypt&read the Data Files. Since we had no access to any kind of documentation (official or not) related to the formats of the Data Files, we cannot offer any kind of guarantee on the proper extraction of the data stored inside the Data Files. This means that:

● The safest way to extract data from SAS is to use the old ANATEL~2_img233 readSAS Action (but it requires the SAS OleDB drivers installed on your computer and it's quite slow: It's approximatively three times slower than this Action).

 

● The safest way to extract data from the SPSS or Stata environment is to export your data as simple text files (comma separated) and read them back inside Anatella using the "readCSV" Action.

 

To validate that the data were properly extracted from your “Data Files”, you should check:

● For SAS files: The character encoding: If some (accentuated) characters are incorrect or missing, you might want to use another character encoding (e.g. use the "ISO-8859-16" character encoding rather than the default "UTF-8" character encoding).

 

● For SAS files: SAS handles NaN and Null floating-point-numbers in a strange way. Validate that the Nulls are indeed extracted as Nulls (and not NaNs) and vice-versa.

 

● It's always a good ideas to validate the "Date" variables because they are always handled using the most "exotic" ways.