In case you haven’t heard:
GitHub’s production database was accidentally dropped. Oops. But it is good that fault was admitted and there is something that is being done about it.
I use a Mac and I frequently use Time Machine. I also use Git+Github. This translates into a double backup solution that works really well. Which is good because I just accidentally deleted my source code to my latest project on the Unix command line. (For the non-Unix reader: if you delete via the Finder window the files go to the Trash, but if you delete using a terminal window it bypasses the Trash and immediately disappears.)
Thankfully, I just plugged in my external drive and restored the code from a point earlier this morning and then I did ran a git-pull command to resync from Github. Voila, I’m back.
The importance of keeping constant backups cannot be underrated. I’ve actually started to use Git for normal files on my hard drive too. This means when I accidentally clobber a document I might have a backup of it in my repository. (Of course a little discipline helps when using version control systems like Git.) For the rest of the world, Time Machine (or other backup software) works just fine.
I’m also reminded of a couple of LifeHacker articles that came out in 2009 and 2006, respectively.
This is why I am not a sysadmin. I just spent the better part of this evening figuring out just how to get my latest app deployed to my production environment with Capistrano. Sure it is in fact easyish deployment, but you have to know the dance of how it all comes together. And there’s a lot of steps. I wrote this post to hopefully pull together a whole bunch of different information rather than have you read many separate links.
So first off, here’s my setup:
This first part is fairly easy thanks to:
capify .
If you end up creating the Capfile yourself you should be aware that capify does add a few extra goodies to the file:
# Capfile
load 'deploy' if respond_to?(:namespace) # cap2 differentiator
Dir['vendor/plugins/*/recipes/*.rb'].each { |plugin| load(plugin) }
load 'config/deploy' # remove this line to skip loading any of the default tasks
There wasn’t much I needed to do to the Capfile itself since I don’t have any custom tasks. But there was a bit of setup that needed to happen in the deploy.rb file. One thing to note is that I ended up switching my SSH port to a different one when I set up my server, so that explains the port on the roles:
# deploy.rb
set :application, "coolapp"
set :repository, "git@github.com:MYUSER/MYREPO.git"
set :deploy_to, "/var/www/APPNAME"
set :user, "app_deployer"
set :branch, "release"
set :git_enable_submodules, 1
set :use_sudo, false
set :scm, :git
role :web, "www.coolapp.com:12345"
role :app, "www.coolapp.com:12345"
role :db, "www.coolapp.com:12345", :primary => true
# Passenger stuff
namespace :deploy do
task :start do ; end
task :stop do ; end
task :restart, :roles => :app, :except => { :no_release => true } do
run "#{try_sudo} touch #{File.join(current_path,'tmp','restart.txt')}"
end
end
The :repository is my private Github repo. The :deploy_to points to the directory that will hold this Capistrano-based deployment—there is a trick I had to do on /var/www. I created a new Unix user app_deployer. I also have a release branch that is only used for Production pushes. I have submodules, so :git_enable_submodules is set. Since this is a non-privileged user we can’t use sudo. And finally, on the roles you’ll see that I’m SSHing to the production server www.coolapp.com at port 12345.
Rather than deploy straight from the master branch (which I think is just a horrible idea), I created a branch called release. To sync master to release I wrote a quick script:
#!/bin/bash # # script/sync_master_to_release.sh git checkout release git merge master git checkout master
I’ve started using Github a lot more. While I’m a big fan of having an in-house server to hold my oh-so-precious source code, I just can’t be bothered to admin it or set one up. Plus I want to be able to connect with other collaborators. So Github seems to be the best way right now to do so.
I set up a paid account so I could get some private repositories. Just a couple clicks of the mouse later and I had a repo set up for this project. The setup instructions are really clear. Since I had a local Git repo for this project on my laptop, all I had to do was add Github as a remote:
git remote add origin git@github.com:MYUSER/MYREPO.git
(I did in fact have an origin already set up, but I just had to remove the config info in .git/config before running the above line.)
Doing a git push origin master loaded up the Github repo in a flash. And to get the release branch up there it’s just a case of doing a git push origin release to make it aware there’s another branch. Subsequent git push-es pushed up all the objects for both branches.
At this point what I had was an app that was ready to be released to the world and I needed Capistrano to do its deployment magic. But I didn’t just want to grant full access to the deploy script. I don’t want the chance that someone could add tasks that could walk all around the filesystem if they got a hold of the SSH keys…
The great thing about Unix/Linux is you have all these permissions so you can lock down directories. The annoying thing about Unix/Linux is that you have to configure all of these permissions.
I SSHed into my production environment as my root user via a plain ol’ terminal and created a new user in a new group:
sudo /usr/sbin/useradd app_deployer sudo /usr/bin/groupadd deployments
(There was also a step to update the password for app_deployer.)
Then I had to associate the new user and the web server user with the newly-created group in the /etc/group file:
deployments:550:app_deployer,apache
I’m deploying to
/var/www/APPNAME
but when you run Capistrano for the first time it wants to create certain directories. Just for a few minutes I changed the main web directory to world-write permissions:
sudo chmod 777 /var/www
Finally, I could then run Capistrano for the first time:
cap deploy:setup
(Note: Remember that even though I SSHed in via a terminal as a root user, Capistrano’s config file has been set to log in via the “app_deployer” user.)
Once the directories were set up I could twiddle the permissions back to normal locked-down mode:
sudo chmod 755 /var/www sudo chown -R app_deployer /var/www/APPNAME sudo chgrp -R deployments /var/www/APPNAME sudo chmod -R g+s /var/www/APPNAME
This sets the /var/www directory back to world-accessible but NOT world-writable, ensuring that just the app_deployer user can make all the changes it wants to the APPNAME directory. (The “g+s” group-sticky-bit ensures that any new directories created under APPNAME carry the group owner too.)
Now, we’re getting close to deploying, but one thing we need to do is grant our production environment a way to clone our private repo. I logged out of the production terminal session and logged back in instead as the new user:
ssh -p 12345 app_deployer@www.coolapp.com
Then it was a matter of generating some RSA keys:
mkdir .ssh cd .ssh ssh-keygen -t rsa
After this I had a newly-minted id_rsa.pub. I turned to my web browser, went to Github, clicked on the Admin link on my app repo, then clicked Deploy Keys. There I created a new key and put up the contents of that public key.
Perfect. Now I can really do the deployment via Capistrano:
cap deploy
At this point messages scrolled by as the release branch was cloned from Github onto the production server into Capistrano-versioned directories. I did run into a couple of errors at first, but that was just because I had some issues with Git submodules. But I think this situation was more unique to me and it turned out in the end I had to juggle some RSA public keys to get things working.
(*One thing I didn’t mention was that I had actually done a previous deployment manually where I had SFTPed up my entire codebase to a temp directory and had done the database migrations already. Because of this I didn’t need to ask Capistrano to set up my database too—creating a user for this app, creating a MySQL catalog for the data, initializing the schema. It was also the case I got Passenger working with the test deployment.)
Setting up Passenger is really easy and there’s tons of guides saying how to do it on Linux. The essence of it is:
Following the Passenger setup instructions there were a few lines that needed to be appended to the httpd.conf file. Instead I created a separate config file in the conf.d directory:
# passenger.conf LoadModule passenger_module /usr/lib64/ruby/gems/1.8/gems/passenger-2.2.15/ext/apache2/mod_passenger.so PassengerRoot /usr/lib64/ruby/gems/1.8/gems/passenger-2.2.15 PassengerRuby /usr/bin/ruby PassengerMaxPoolSize 5 RailsEnv production
That config limits Passenger to 5 instances and ensures that the app is in Production mode. Next I added a config file for the new app:
# coolapp.conf
<VirtualHost *:80>
ServerAdmin contact@coolapp.com
ServerName coolapp.com
ServerAlias www.coolapp.com
DocumentRoot /var/www/coolapp/current/public
<Directory /var/www/coolapp/current/public>
AllowOverride all
Options -MultiViews FollowSymLinks
Allow from All
</Directory>
</VirtualHost>
After a quick Apache restart the app came online w/out any problems, and now whenever I want to release to production all I have to do from my laptop is:
script/sync_master_to_release.sh git push cap deploy
So…
I thought about the situation a little bit more. I don’t fully understand what’s going on but I think what I’m experiencing is a resource constraints problem. I *know* I’m on a small VPS (512 MB at the time of this writing) and I know that basic system services do consume *some* amount of RAM. I’ve played with all sorts of things and I wrote a script to try to poll for memory usage so I can see what might be causing issues. This is what I found:
I’m on an OpenVZ VPS so I have a bursting capability of up to 1 GB; more than a few seconds above 512 MB RAM and the virtualization will stop allowing apps to allocate any RAM. I’m running Passenger, Rails 2.3.8, and MongoMapper. Apache has a few httpd processing running, there’s system daemons, and a lonely MySQL daemon. I don’t do email on the box or any batch processing. It’s just a web server right now.
In the server’s most basic state just after rebooting it is sitting at about 200 MB of used RAM. As soon as I make an HTTP request to it memory usage goes up to about 307 MB. Doing a fetch of some Mongo records raises the memory usage to 353 MB. And then doing a lot of fetches makes the RAM usage top out at about 367 MB. The key here is that I finally set the Passenger config PassengerMaxPoolSize to 1 and that appears to be the key to limit my RAM consumption.
Now, what I think was happening earlier is that I didn’t have a limit set on the pool size and any time there’s more than 1 connection the amount of RAM usage would exceed 512 MB. Bursting for a second is OK. But the problem is that some connections are still open. And when the Mongo driver tries to use a cursor that was closed (because it exceeded the available RAM) then I start to run into errors in the log like:
Query response returned CURSOR_NOT_FOUND. Either an invalid cursor was specified, or the cursor may have timed out on the server.
TypeError (no c decoder for this type yet (0)).
and the most telling error:
ActionView::TemplateError (failed to allocate memory) on line #9 of app/views/xxxxx/xxxxx.haml
SO. I’ve left the config line PassengerMaxPoolSize 1 in the config and now everything seems to be working great. Sure, it’s a little slower than I’d like it to be but at least it WORKS.
I just saw this via Reddit: http://ducktypo.blogspot.com/2010/06/new-ruby-ecosystem.html
This summer I’m doing a *lot* of reading. One of my goals is to finally get a proper education on how Ruby, Rails, and gems work. When I started doing Rails in ’06 it was just random experiments. In ’08 I got serious and started using it for contracting.
For me one of the issues with working 60+ hours a week is that I rarely get time to stop and just read stuff. My general philosophy about getting stuff done is that I read about something new and shiny, play around with a small experiment, and then try to incorporate that into my current workflow. But if time constraints are tight then I often don’t get a chance to delve into any topic deeply, so I may not be using the most efficient ways of doing things. Especially with something like Rails where the whole platform evolves very quickly, I don’t get a chance to go back and refactor old code. Ultimately the end product becomes this kind of katamari of different partially-adopted practices. Sure, the whole project works and the logic is solid, but I say it is far from being elegant code and design.
Well I’ve decided that has to stop somewhere so a big part of what I’m doing this summer is doing a brain reset. I’m trying to get back to being current. I have a list 10 miles long, and I am making headway. Top topics include: Rails 3, how to write gems, web accessibility, grid-based design, object-oriented JavaScript, better jQuery practices, maybe learning a new programming language (like Scala), and NoSQL.
Holy cow this decreased the runtime of a particular processing task by 50%! Awesome! http://github.com/zdennis/ar-extensions/tree/master/ar-extensions/
Long long ago I had been doing Perl CGIs and I remembered from back then that when doing intensive database inserts it’s a good idea to do an INSERT OR UPDATE but I couldn’t find an equivalent ActiveRecord version. Then I just happened to be Googling around today (since I’m in the middle of doing some code cleanup) and I found ar-extensions. Wow. Just what I needed!
I find it weird that to migrate from your current Rails database to a previous migration step you use “db:rollback”. Since we already use the language “db:migrate” and “up” or “down” it would have made more sense to me that db:migrate:down without arguments does exactly what db:rollback does.
ARGH. Seriously: ARGH.
I have spent almost the entire day trying to migrate our Rails apps to use the ActiveRecordStore without errors and I finally just achieved success. This was mostly a case of incorrect documentation retrieved by Googling. The ultimate solution really came with me actually opening up the Rails gems and stepping through the actual ActionController and CGI::Session code. So, here’s my solution.
Firstly, why move away from the cookie store? Well, many reasons: the cookies only hold so much session data before they barf. When you look at the cookies created on your computer you see that they are these very long encoded strings. That results in only a very small amount of storage on some browsers. Secondly, there are temp session files created on your hard disk and those will eventually consume all of your inodes. Plus I think the database is just plain faster when you have a large number of sessions.
At first I started by tweaking the environment.rb files by uncommenting the magic line:
config.action_controller.session_store = :active_record_store
And then the instructions are to run the Rake script generator command to create a database migration:
rake db:sessions:create
Cool. And the final thing was to modify the application.rb CRSF forgery line:
protect_from_forgery :secret => 'xxx_YOUR_SECRET_HERE_xxx'
And that should have done it, right? Well the problem that really consumed today was the problem that I *cannot* use the default session storage table, sessions. No no no, since there is more than one application using this same database I needed to create tables like app1_sessions and app2_sessions.
This is where I went crazy.
The problem is that I’m running Rails 2.2.2 and when you Google you start pulling up Rails 2.3 documentation. Configuration adjustments I was trying kept turning up ineffective and referencing classes that did simply not exist. So what’s a person to do? Well: use the source!
So I peeked into my Mac’s /Library/Ruby/Gems/1.8/gems/rails-2.2.2 directory and poked around that gem and also the actionpack-2.2.2 gem. That’s where I finally figured out that the proper adjustment to the configuration was to modify the three accessible attributes:
CGI::Session::ActiveRecordStore::Session.table_name = 'app1_sessions' CGI::Session::ActiveRecordStore::Session.primary_key = 'id' CGI::Session::ActiveRecordStore::Session.data_column_name = 'data'
And if I had done things like this everything would have been fine. But the other compounding problem was that if you do a bit of Googling you end up with this one example that says to set the primary_key to something other than “id” like “session_id”. The problem with this is later when the CGI::Session::ActiveRecordStore goes to try to persist the data it starts echoing out log messages like:
WARNING: Can’t mass-assign these protected attributes: session_id
I was like: “WHAT?! Where is this coming from?!” It took a while to track it down to Line 292 of active_record_store.rb. Yes, this line was the cause of the above message:
@session = @@session_class.new(:session_id => session_id, :data => {})
So the “brilliant” thing was that part above where I had assigned primary_key to “session_id” originally. That’s plain wrong. It should just be “id”. That makes things simple. In fact, if you read the docs inside active_record_store.rb you’ll see this wonderfully friendly comment:
# Note that setting the primary key to the +session_id+ # frees you from having a separate +id+ column if you # don't want it. However, you must set # <tt>session.model.id = session.session_id</tt> by hand! # A before filter on ApplicationController is a good place.
I read that and finally I realized my mistake. After a restart things now work perfectly. I just wish this wasn’t so convoluted.
So, the little side project that could. Hm. It’s coming along. Unfortunately, as always, I haven’t spent nearly as much time as I want, however, I’ve had a number of very cool little breakthroughs in the past couple of days. Working on average an hour-and-a-half a night I’ve gotten to the point of consuming RSS feeds. So here’s where it gets exciting: it was stupidly easy to consume them. And since the framework below it now can deal with similar item types it shouldn’t be hard to absorb the data.
I’m back again to make this thing work!
The user tagging now seems to work as planned. When you create a user it’s date-tagged and user-tagged, to hopefully make locating these records easier in the future. The user tagging allows us to tag any registry object with a user registry object. Funny, the user registry entries get tagged with their own IDs.
Next up: rudimentary login system.
(I think I’ll delay the e-mail sending for later so I don’t spam people.)