Hi. I'm jon.404, a Unix/Linux/Database/Openstack/Kubernetes Administrator, AWS/GCP/Azure Engineer, mathematics enthusiast, and amateur philosopher. This is where I rant about that which upsets me, laugh about that which amuses me, and jabber about that which holds my interest most: *nix.

Restraining the madness: Python + SSH + pymssql = happy DBA

Posted 10-28-2009 at 06:42 PM by rocket357
Updated 10-28-2009 at 07:19 PM by rocket357

I routinely get requests from management like "I need a list of the urls and the sizes of databases for our <product_name> sites!" This, I can tell you, is NOT something I want to figure out by hand (not with 300 production databases and a slew of training, demo, test, etc... databases as well). No, there is a better, more relaxed way...a way that includes plenty of time for me to check my e-mail, go get a cup of coffee, and chat with my co-workers...

A bit of architectural overview: our <product_name> sites are built on a LAM$P stack...that is, Linux/Apache/PHP on the frontend, Windows/M$SQL on the backend (Linux/PostgreSQL coming soon!). To complicate matters, some of the machines are shared (both web server and/or database). Tracing a dedicated site is relatively easy...<agency_or_state_name>_app is the Linux/Apache/PHP box, and <agency_or_state_name>_db is the Windows/M$SQL box. Tracing a shared site is...well...the term "spaghetti" comes to mind...it could be on any one of 14 shared web servers and any one of 14 shared database servers (you get the idea). Couple this with the fact that management and support both tend to refer to sites by url (well, ok...customer name, but I've gotten to the point of asking for urls because it saves me time from having to look everything up), and a simple "get me a list of database sizes for each url" suddenly becomes a monumental task that could easily take weeks to determine.

I'm waaaaaay too lazy for all that nonsense...so I script the entire messy process with Python. (the only downside to this is that management thinks I'm a blackbox...ask a question, get an answer...they don't seem to realize how much work this would be if I wasn't so lazy).

Ok, the typical <product_name> site has a php defines file that lists, among other things, the url, customer name, and such. Using paramiko (and the ssh.py facade I talked about in this post), I can round robin to all of the known webservers with something like this:

Code:

result = connection.execute("""
for i in `find /var/www -iname <our_defines_file_name>.php -xdev | \
grep -v -E 'test|demo|training|archive|upgrade'`;
do 
   echo "### START $i"; 
   grep "^DEFINE" $i; 
   echo "### END $i"; echo;
done
""")

Whew...what a beast. The quick and dirty explanation:

for file in (command to find defines files), echo out a start tag, then dump the active defines in the file, then echo out an end tag.

You can dump the results to a file for processing later...right now we're just gathering defines files.

Once your defines file dir is full, you can start processing the files. In most of my code, I first build out the url from various defines gathered, then use urllib to download the login page for our <product_name> sites (I'll post code for adding PKI support, or just google it). I do this because in a fast-paced environment sometimes a defines file is left behind, or moved to a different location, or sometimes multiple webservers point to a particular database...at any rate, downloading the login page lets me verify that this is an active site (this tactic can also be used in a cronjob to verify your critical sites are up (well, at least you can verify they're serving the login page, meaning the database and webserver are operational for our software)). Once I verify the site is active, I pull the database connection info from the defines output and use pymssql to connect:

Code:

con = pymssql.connect(  user=db_user, password=decrypt_pass(db_pass), host=db_host, database=db_name)
cur = con.cursor()
cur.execute("""EXEC sp_spaceused""")
try:
   csvfile.write('"%s",%s' % (url,cur.fetchall()[0]))
except:
   # this is a blog post...so
   # I'm not going to list each possible exception
   # that could occur when writing a database result
   # set to a file...haha.
   pass

Two caveats about pymssql: if you use threads to create pymssql connections, you can hit FreeTDS's TDS_MAX_CONN limit relatively easily. One way around that is to use _mssql (pymssql lower-level interface), and use _mssql.selectdb(<db_name>) to reuse connections (or recompile FreeTDS with TDS_MAX_CONN limit raised up?). Also, pymssql wraps everything in a transaction...nice and safe, but kills certain commands (like most DBCC commands). Again, _mssql does NOT wrap in a transaction, so you can use _mssql to run DBCC commands.

Once that's done, I like to take the csv file and import it into Open Office. Then I save it as an excel spreadsheet and attach it to an e-mail for all relevant people. Management thinks the data will be available in 3 days, and it's available in 30 minutes (30 minutes of me standing around chatting at the water cooler). Win for me.

And it's all thanks to Guido and the other Python developers =)

Restraining the madness: Python + SSH + pymssql = happy DBA

Comments