March 11, 2014
Why bother with Continuous Integration?
Long gone are the days of deploying code that isn’t fully and automatically tested. Why? Because, nobody wants to get a call from a client asking why the app suddenly stopped working and sheepishly explain how you broke it with the latest release.
Continuous Integration (CI) is the final step in Test Driven Development (TDD) before you merge a branch back into master. Of course, your tests will only guarantee so much — that’s where code coverage and code reviews come into play. But let’s set that aside for now.
We did a lot of research, and Jenkins CI was the utility best suited to what we were looking for: low barrier of entry, simple setup, and ease of deployment.
Integrating a CI solution within your process is no easy feat. We use CI results both in development and before we deploy to production.
For our developers, a typical process is:
Create a new branch off of master.
Write a few failing tests that describe the work to be done
Get tests to pass.
Run the branch on CI.
If everything passes, the branch can be merged into master. If not, the failing tests need to be looked at. Either the tests are now inadequate and need to be updated, or there are side effects not accounted for. Regardless, the tests must pass and you can’t merge the branch until the build is green.
Once a branch has been merged, Github notifies the CI server of this, and triggers a master build.
If the build is green, the master branch gets a tag and we deploy from that green tag. Otherwise, the tests and build need to be examined.
How we got down to 10 minute builds
We first set up Jenkins on an Amazon Web Services Elastic Compute Cloud (AWS EC2) slice. At the time, our test suite took about 50 minutes to run on local development machines. On EC2 it took north of 4 hours. Clearly that wasn’t going to cut it.
Next, we decided to keep things local and bought the biggest and meanest machine we could find without breaking the bank: an Intel Quad-core Xeon powered machine with 16 cores, 16GB of RAM (this was 2011), and plenty of hard drive space. That brought our test suite down to ~45 minutes. An improvement — but not nearly enough.
Our breakthrough came from splitting discrete jobs into discrete Jenkins CI builds. The internals of our application don’t need to be shared, but it’s composed of a few Rails engines. So, we created a Jenkins job for each engine. This got us down to 25 minutes per build, as everything ran in parallel.
Next we broke off cucumber and rspec tests into different builds. Then we profiled the slowest tests per engine and clustered them into a ‘slow’ and ‘not quite as slow’ builds. Voilà, a sub 10-minute build. It feels a bit like cheating, but we did it!
The current setup involves a master Jenkins server that runs on a VPS (Virtual Private Server) on the DigitalOcean cloud, our aforementioned machine that is now strictly a slave, and many other slaves spun up in the cloud for the lighter jobs. The process intensive jobs are tied down to our slaves labeled
Key Jenkins plugins
Over the years, we’ve come to rely on a number of Jenkins plugins and contributed to their development.
This plugin gives us one main entry point that then triggers many other jobs. The overall result of the build is the
ANDing of all the downstream jobs. (Hello, parallelization!)
This allows us to run multiple slaves that do the heavy lifting.
We want to authenticate using our GitHub credentials, without having to maintain our own user databases.
This plugin triggers builds for different branches from the command line.
It’s a lot easier to see which branches have been built by referring to them by nam rather than a git SHA or build number (the default).
Informs the team when the master build is broken! (Or that it has been fixed)
There are certain occasions on which we want to mark a build as failed.
I’ll explain this in another post – but suffice to say it’s very cool!
Remembering all our of the CI & build options can be a bit daunting, so we wrote a script to make things a bit easier on our developers (mainly myself, I am lazy.)
The script defaults to building the current branch, but you can pass a different branch name as an option.
It goes something like this:
$ build/schedule_build build [Output about build] [A TerminalNotification pops up] $
$ java -jar jenkins-cli.jar \ -i ~/.ssh/IDENTITY_FILE \ -s https://ADDRESS_TO_YOUR_SERVER \ -p GIT_BRANCH=BRANCH_NAME_TO_BUILD \ build JENKINS_JOB_NAME \ -f -w
If we hear some requests about open sourcing this script, we will definitely do so.
Most jobs have similar settings, but require a unique build script. That’s fine – but when since we maintain north of 11 jobs per build, it quickly becomes painful.
To make our lives easier, we developed a script that knows which tests to run depending on the name of the job. Now we just need to be careful about naming our jobs, but we’ve been very good at that since day 1.
Our build script uses a YAML configuration file:
project_name-development-part_a: build-commands: - do_this - do_that gemset: some_gemset_name project_name-development-part_b: build-commands: - do_this - do_that gemset: some_gemset_name project_name-master-part_a: build-commands: - do_this - do_that gemset: some_gemset_name project_name-master-part_b: build-commands: - do_this - do_that gemset: some_gemset_name
And the heart of the build script is this:
RUBY_VERSION_AND_GEMSET="$VERSION@$GEMSET" rvm use $RUBY_VERSION_AND_GEMSET --create bundle check || bundle install eval $TEST_COMMAND
$TEST_COMMANDis the concatenated commands specified in the configuration file.
Originally, since we triggered many builds at the same time (parallelization), we had clashes from job to job when they each tried to install the same gem. We naively opted to pass
bundle installon each job. But this proved to not be very efficient, since each job then maintains its own copy of the gems. It’s time consuming and when we spin up a new server, each job needs to fill this cache.
We solved this by implementing a semaphore for multiple shell scripts running on the same system. They actually run across multiple servers, so they don’t compete for resources.
A nice implementation of a semaphore that can be used effortlessly on shell scripts is Procmail’s lockfile program.
The updated build script can be summarized as:
RUBY_VERSION_AND_GEMSET="$VERSION@$GEMSET" LOCKFILE="/tmp/$RUBY_VERSION_AND_GEMSET" rvm use $RUBY_VERSION_AND_GEMSET --create lockfile $LOCKFILE bundle check || bundle install rm -f $LOCKFILE eval $TEST_COMMAND
When multiple processes get to the lockfile line, the first one can write the file. The other ones will wait there until the file is removed and then write it themselves. And so on and so forth until all processes have gone through that stretch.
This particular change has given us quick wins on overall build time and reliability.
Earlier, I referenced the embarrassment of a broken, untested build. Continuous Integration will definitely help you avoid that, but the benefits go deeper than that.
CI helps you iterate faster, build confidence in your work, manage conflicts, and ensure consistency. You might even consider it part of your team – so make sure you treat it like one and listen to its feedback!
March 7, 2014
This is the third post in a series about how to throw the perfect hackathon.
Someone call in the lawyers… Hackathons have both grown in size and spread around the world in recent years. Companies are using hackathons to engage their communities, non-profits are using them to help solve big problems, and so on. This increased awareness of hackathons means that people are looking at different IP and ‘openness’ arrangements — some of which you might want to think through with a lawyer.
Remember I’m not a lawyer - I’m a hacker. This isn’t legal advice; I just want to highlight some common issues you might want to keep in mind.
Who owns the output?
Over the past few years, organizers have experimented with different ways of capturing and distributing the work created at hackathons. Some of these have involved different intellectual property (IP) arrangements; I want to review these and offer some thoughts on how you might tackle ‘ownership.’
At the vast majority of hackathons, participants retain ownership of their work and this is usually the best model. My colleague Brian has written on this subject before. As he points out, a transfer of IP towards the event organizer seldom aligns with your goals and won’t build long term engagement with the developer community.
Rewired State, based in the UK, has tweaked the traditional hackathon model with their signature Hack Days. There are a few variations of hack days, but they include some form of IP transfer. Two points are important to note here: the developers are paid at market rates and the IP transfer is completely transparent & explicit when participants register. You can learn more about Rewired State and the different kinds of events they run on founder Emma Mulqueeny’s blog.
The international civic-focused hackathon series Random Hacks of Kindness (RHoK) takes a different tack. Hackathon projects must be published under an OSI approved license and the code is made available on a publically accessible code repository. Again, this is clearly communicated to the community in advance.
While these two examples are quite different in nature, they are both upfront about their restrictions / conditions up front for potential attendees.
“I don’t want people to use my data after the weekend.”
Executed properly, hackathons can be a quite compelling for businesses. It’s an opportunity to expose talented and motivated people to your team and technology. I’ve seen lots of great examples where companies have used hackathons as a way to get user feedback and new ideas. Of course, participants benefit too: inside access to new tech, it’s architects, and spirited competition.
However, providing data & services for the weekend and then revoking them afterward does not work. The only circumstance in which I’ve seen this done properly is when work at the hackathon revealed a major security bug in the API. In this case the organization explained to everyone involved why the API keys were being revoked and provided access again as soon at the bug was resolved.
From a practical standpoint, it’s going to be very difficult to ‘take back’ datasets that were released during the hackathon. If you know ahead of time that your data or services aren’t going to be public afterwards, don’t build your hackathon around them!
That’s all for this week. Next time we’ll focus on how to find and set up a great venue. Happy hacking!
February 28, 2014
The bi-annual PennApps hackathon, hosted by the University of Pennsylvania, has long been a standard bearer for large-scale college coding competitions. The Spring 2014 winner, The Homework Machine, embodies four key characteristics of a winning project:
Addresses a problem we can all relate to
At some point, every kid has wished for a way to automate tedious math homework. These students were probably more bored than lazy, but it’s a fantasy we can all wrap our heads around. Projects stand out when fellow participants, judges, and the less technically inclined general public can all appreciate their value.
This hack (1) learns your handwriting, (2) solves math problems, (3) guides a pen rigged to motors and a pendulum to mimic your handwriting, and (4) correctly answers in the exact right spot! Oh yeah, and it was built it in 36 hours!
The ‘internet of things’ is all the rage right now, and while hardware hacks shouldn’t be given preference over equally difficult software-only projects, they do tend to have more engaging demonstrations. The Homework machine produced tangible results - not vaporware promises of something to come.
A bit quirky
Creators Derek and Christopher didn’t plan to help 4th graders shirk their homework responsibilities. They just wanted to test their skills while having a good laugh, which is the whole point of a hackathon. You may learn new technologies, create the seed of a business, and meet recruiters too, but it’s all about building something awesome in a ridiculous setting.
February 26, 2014
At ChallengePost, we’ve worked with some of the best organizations in the world and powered more than 400 in-person hackathons and online software challenges. But, if you’ve never participated in one, you may not fully grok the ChallengePost experience. That’s why we put together this short video:
Curious about planning a hackathon or online challenge? Our team would be thrilled to help! Give us a ring at 212.675.6164 or email firstname.lastname@example.org.
February 25, 2014
Today’s blog post comes from Ross Kaffenberger, ChallengePost’s Head of Engineering. You can find him on Twitter @rossta.
Exchanging feedback doesn’t have to be painful
These days, software developers are living in a GitHub Workflow world. They develop new code on version-controlled branches and gather feedback prior to inclusion in the primary release, or “master” branch, through pull requests.
Our development team at ChallengePost has been using this workflow for almost two years with great success, although we’ve had our share of pain points. For better or worse, feedback typically happens asynchronously and is in written form. Convenient, yes, although this approach is not free of the wrinkles, especially when we use poor word choice, hyperbole, sarcasm, and other forms of counterproductive commentary.
This has led to resentment and injured relationships on occasion. In response, I’m working to improve how we give and receive criticism.
Let’s assume that, when done well, code reviews are a good thing. That is to say, the practice of giving and receiving feedback in a consistent, continual manner has true benefits. These may include improving code quality over time and driving convergence of ideas and practices within your team. In my experience, for feedback to be effective, trust amongst team members is a key requirement.
This may not be an issue for teams that have been together for a long time or share common values, but for others, trust has to be earned. In the absence of trust, there’s more opportunity for personal differences to get intertwined with feedback. While there are no quick fixes, what follows are code review practices that we have adopted to foster our shared sense of trust.
1. Adopt a style guide
Spoiler alert: code syntax and formatting are trivial choices. What’s most important is your team agrees on and adheres to a set of guidelines.
Take a few hours as a team to hammer out a style guide for each of the languages you use. Better yet, use a public example like GitHub’s style guide as a starting point. Besides the obvious benefits of consistency and maintainability, style guides reduce the likelihood of flared tempers during reviews; when you’re pushing to get a new feature out the door, it’s unhealthy to argue over whitespace. This works when your team respectfully follows and comments on style issues respectfully, saving concerns about existing guidelines for separate discussions.
2. Start with the end in mind
Imagine a developer who emerges, after hours or days off in the “zone,” with a sparkly new feature and asks for a review. All is good, right? Except that the rest of the team has issues with the implementation. Words are exchanged, the developer takes the feedback personally, and suddenly the entire team is distracted from shipping code.
Personally, I believe code review should begin well before the final commit. It can happen early on; in short discussions with teammates once the ideas start to take shape. Get buy-in on your approach before you’re ready to merge your branch. Opening a pull request and asking for feedback while work is still in progress is a great way to build trust between teammates, and reduce the likelihood that criticism may be interpreted as a personal attack.
3. Use the Rubber Duck
Rubber duck debugging is a method of finding solutions simply by explaining code line-by-line to an inanimate object. We’ve found it helps to do the same with our writing, especially when our first instinct is to respond to code or another comment with sarcasm or anger. Take a moment to read your response aloud and question the wording, timing, and appropriateness. This includes taking into account the personality of the team members you’re addressing. Thoughtbot has compiled a useful set of code review guidelines to help both readers and writers respond thoughtfully. I also suggest that teammates share meta-feedback to ensure that everyone is hitting the right notes of tone and instruction.
The next time you feel pain in a code review, take a step back to consider what’s missing. It could be that your team needs to adopt some guidelines to reduce friction and ensure feedback is exchanged in as a constructive and positive manner as possible. After all, you have both code and relationships to maintain.