Sunday, June 12, 2011

Connecting SQL to Stata (integrating SQL commads with statistical analysis scripts)

For few days now I had a pleasure to work with a statistical analysis tool called Stata. Among other tools such as SPSS and SAS it used extensively in business world and scientific community. SPSS is more user friendly version of statistical software that provides same functionality as Stata and SAS being in the league of its own.

Common problem both Stata and SPSS share is a lack of SQL input support. The only support is really unpractical ODBC driver that is not that convenient to use. I got a feeling that it is more of a work around the a solution to the SQL input support. As a response to a need in connecting statistical software to a SQLite database (db I use to store and reshape all the data) I had to write a software the acts like a filter between Stata "*.do" files and Stata.

In essence I just added a command to a Stata do file that is able to open SQLite database and runs SQL statement data from witch will be imported into Stata. To do that I wrote a small program that is registered to interpret "*.so" files and find additional database commands, execute SQL statement and pass the date to a Stata that will continue interpreting the rest of the file containing Stata commands.

Essentially this is what the file looks like:

db=P:\Games\League of Legends\DBgen2\lolbw.db
[[
SELECT name, elo as Elo, SUM(win) AS Wins, SUM(loss) AS Losses,
 SUM(win)+SUM(loss) as Total, 100.0*SUM(win)/(SUM(win)+SUM(loss)) AS "Win %", 
 avg(dmg) as "Dmg Given", avg(dmgtaken) as "Dmg Taken", avg(gold) as Gold,  
 avg(cs) as CS, avg(kill) as Kills, avg(asist) as Assists, avg(death) as Deaths
FROM "match" m
WHERE 1=1
 --AND elo>=1200
 --AND elo<=1400
 --AND substr(id,3,1)=='r'
 --AND substr(map,-2,1)=='5'
 --AND server='EU'
 --AND datetime(date) > datetime('2011-05-24 03:00:00')--  patch 118
 --AND datetime(date) > datetime('2011-06-01 14:00:00')-- patch 119 oriana
GROUP BY name
HAVING Total>20
ORDER BY "Win %" DESC,Wins DESC

]]

set more off
#delimit ;

twoway 
(scatter kills win, sort mcolor (green) msize(large) msymbol(circle_hollow)) 
(scatter assists win, sort mcolor(blue) msize(medium) msymbol(circle_hollow)) 
(scatter deaths win, sort mcolor(red) msize(small) msymbol(circle_hollow)) 
(lowess kills win, noweight lcolor(green)) 
(lowess assists win, noweight lcolor(blue)) 
(lowess deaths win, noweight lcolor(red))
;



As you can see first part of the script connects to SQLite database and runs SQL statement resulting in the data that will later be processed by the rest of the Stata script file resulting in a graph.


EDIT 11-02-2011:

Zip file that contains:

- dosoSQLiteStata.exe
- Source code
- Template "install so extension (template).reg"
- Sample *.SO file

To use just drag and drop created *.so file on the "dosoSQLiteStata.exe"
OR
For registering *.so extension you need to edit .reg file and specify proper path.

http://www.mediafire.com/?ysfkjsxqvb26bu0
(Let me know if I need to upload it again)

5 comments:

  1. Any chance of getting this command that you wrote to interpret *.so files? I'm thinking of installing SQLite now that I see this.
    Thanks
    Gindo

    ReplyDelete
  2. @Gindo Edited the blog to include a link and some basic instructions.

    ReplyDelete
  3. Superb! Much appreciated. Gindo

    ReplyDelete
  4. Please upload again! Maybe to Google docs and make public?

    ReplyDelete
  5. Please upload again! The file is invalid now.

    ReplyDelete