MySQL – Finding Tables With Certain Columns

I’m working on munging some data on our main application and needed to build a set of tables that I need to look at. We have 140 tables in the app, and being lazy, I figured I could just write something to build the list for me.

 columns = %w(property_id town_id area_id subdivision_id county_id property_type_id status_id service_id)
 ActiveRecord::Base.connection.execute("show tables").each do |rs|
   table = rs.first
   columns.each do |col|          
     ActiveRecord::Base.connection.execute("show columns from `#{table}` like '#{col}';").each do |rs|
       puts "table: #{table} => #{rs.first}"

The columns variable is an array of column names I’m looking for. In the outer loop, I get the full list of public tables. I then iterate over each table checking if the table contains a column I’m looking for. I threw the above code in a rake task to run. It only works for MySQL, but would be easy enough to create for PostgreSQL. Unfortunately, the ‘show columns from …’ query doesn’t take or arguments, but this is something I’ll only need to run once.

Posted in mysql, ruby | Leave a comment

Capistrano for Server Monitoring

Capistrano is a fantastic tool for managing deploys, but it’s capable of far more than just pushing up code and restarting servers. Here are couple of tricks for harvesting stats and managing servers with Capistrano.

Suppose you need to check memory usage on a group of machines with some regularity. You could of course just log in to each machine and run:

 free -m

but this doesn’t scale all that well. Capistrano provides a simple method for executing a shell command and capturing the output: capture

  task :memory_usage, :roles => :server_group do
    puts capture("free -m")

This unfortunately will give you memory usage on only the first machine listed in the group server_group. More often, it’s helpful to check memory usage on a cluster of machines, maybe to find out which one is swapping. The method run can help us here. Run executes the provided command across a group of servers:

  task :restart_passenger, :roles => :passenger do
    run("touch #{current_path}/tmp/restart.txt")

The above command would run touch on the tmp/restart.txt on all servers in the passenger group. The run command can also take a block, which allows us to capture the response:

  task :memory_usage, :roles => :server_group do
    results = {}
    run("free -m") do |channel, stream, data|
      if stream == :out
        results[channel[:server].to_s] = data
    results.each_pair do |server, data|
      puts server
      puts data

The run method block has three arguments: channel, stream identifier, and data.  Channel contains information about the remote process and may be used to send data back to the remote process. Stream is the stream type: :err for stderr and :out for stdout. Data is the data received for the command. The above code should display memory usage on each server defined in the server_group group.

One gotcha to be aware of with the run command is when it fails to run on one server, the entire task will fail. To get around this, you can add true to the shell command:

  run("ls -l /home/foo; true") do |channel, stream, data|
    if stream == :out
      results[channel[:server].to_s] = data

This code will run across all servers, even if the directory /home/foo does not exist on a particular machine.

Another use case for the run command is to tail logs across multiple servers.

  task :delayed_jobs_log, :roles => :jobs do
    trap("INT") { puts 'Interupted'; exit 0; }
    run "tail -f #{shared_path}/log/delayed_job.log" do |channel, stream, data|
      if stream == :out
        data.split(/\n/).each do |row|
          puts "#{channel[:host]}: #{row}" 

This allows us to tail the Delayed Job log across all our DJ servers, printing the host before each file line. Typing control-c will end the tail as it would on the server itself.

Capistrano is a very powerful tool capable of far more than just deploys. It is well worth your time exploring what you can do with with in regards to server management. The power of Capistrano lies in the Net::SSH library, another powerful library to check out for remote server management.

Posted in capistrano, dev ops, ruby, sys admin | Leave a comment

Useful Postgres Queries

The more I work with Postgres, the more I prefer it to MySQL. Two years ago we migrated a sizable database (42g disk size) from MySQL to gain access to the Geospatial searching with GIS. We’ve replaced Sphinx with Elastic Search which handles spatial searches much faster. We no longer use the GIS extension, but the Postgres database remains. We currently run 9.1.

This post is intended to be a holder for the queries I find useful from time to time, but can’t seem to commit to memory.

Find all queries that have been running for at least two minutes:

 select procpid,datname,current_query,now() - pg_stat_activity.query_start 
   as duration 
 from pg_stat_activity 
 where pg_stat_activity.current_query <> ''::text and 
   now() - pg_stat_activity.query_start > interval '2 minutes';

To kill a long running process:

 kill -INT procpid

or, more safely

  select pg_cancel_backend(procpid#);

Adding a user to postgres and pg_bouncer:

  • Create a new user/role in the Postgres DB:
     CREATE USER username WITH PASSWORD 'password';
  • Add permission:
  • Generate pg_auth line:
     select '"'||rolname||'"'||' "'||rolpassword||'"' from pg_authid;

Index Usage:

 select * 
 from pg_stat_all_indexes 
 where schemaname = 'public' 
 order by relname asc, idx_scan desc;

View slow queries:

   now() - pg_stat_activity.query_start as duration 
  from pg_stat_activity 
  where pg_stat_activity.current_query <> ''::text and 
    current_query <> '' and 
    now() - pg_stat_activity.query_start > interval '0.5 seconds' 
  order by duration desc;

How far is the slave behind the master?

 SELECT now() - pg_last_xact_replay_timestamp() AS time_lag;
Posted in postgres | Leave a comment

LibRETS vs Rets4r

You don’t stumble into RETS without a very specific need.  RETS is an XML standard for communicating with real estate multiple listing services. It is commonly used to performing queries against MLS vendor systems.

In the past year, I’ve been focusing on integrating a growing number of MLSs into our main product, Sequoia.  I’ve used two different libraries to accomplish the low level interaction with a specific RETS server: RETS4R and more recently LibRETS, using the Ruby bindings.  As documentation is scarce, I’d like to provide some insight for the community.  First, an overview of the two libraries:

Rets4r ( is an attempt at a pure Ruby implementation for RETS.  The most recent versions use Nokogiri for XML parsing for properties, but REXML for metadata.  Rets4r was the library I initially used as the basis for our Generic MLS updater.  It’s a solid implementation, and I’ve been able to configure for all the major MLS data providers we use.  It uses Ruby’s net/http, which is the only real issue I’ve consistently struggled with. Being a pure Ruby based library, installation is a snap (gem install rets4r).

LibRETS ( is a C++ implementation of a RETS client from the National Association of Realtors.  I’ve just completed rewriting our Generic MLS updater on top of LibRETS using the Ruby bindings.  LibRETS is quite a bit more challenging to setup (OS X 10.6 & Ubuntu 10.04).  It does seem to be a touch faster for property queries, and is much faster with meta data retrieval.  API documentation is only available via the C++ documentation.

With Home Brew, installing LibRETS on OS X is a snap: brew install librets.  (If you have multiple versions of Ruby on your system, LibRETS will bind to the default version.  Be careful if your using a none default ruby while trying to access LibRETS.) For Ubuntu, I installed as follows

 sudo apt-get install libexpat1-dev libcurl3-dev libboost-dev libboost-filesystem-dev cantlr libantlr-dev swig libboost-program-options-dev ruby-dev

 tar -zxvf librets-1.5.2.tar.gz
 cd librets-1.5.2

 sudo ln -s `which cantlr` /usr/bin/antlr

 ./configure --disable-java --disable-perl --enable-fPIC --enable-shared_dependencies --prefix=/usr/bin
 sudo make install

Our MLS updater is designed to allows us to connect to a any RETS server, map all columns to our database schematic, and run continuous rolling updates, keeping our MLS data in sync with each data provider. The single largest issue with the initial version written on top of Rets4r was connectivity issues from Ruby’s Net/HTTP library. This resulted in lost connections and malformed XML. LibRETS uses libcurl and builds objects via streaming XML, resulting in its own list of challenges. I’ve been been able to mitigate most of these issues by minimizing the time a connection stays open, and retrying failed requests.

Which one’s better? It’s hard to say, but I’d lean towards LibRETS. The connection is more configurable from a setting perspective. If your MLS RETS server requires a user agent password, RETS4R can handle it, but the implementation is kludgy at best. LibRETS allows you to set the user agent password optionally. Retrieval of metadata is a lot cleaner for LibRETS. RETS4R does provide property data back in an array of hashes, the column being key, something I had to write for LibRETS. Although the documentation for LibRETS is for the C++ API, it’s been very helpful. RETS4R has very limited documentation, but enough to get you up and running on basic searching and displaying of property data. A concern with LibRETS is that the main developers are no longer actively working on the library. I haven’t found any bugs, but I would feel better it was a more actively developed project. RETS4R has a small community, and is on Github.

Posted in Uncategorized | 3 Comments