GREE acquires Funzio for $210M

Just two and half months after opening an office in San Francisco, GREE has acquired SF-based Funzio for $210M.  This is GREE’s second US-based acquisition and closes the gap of their goal to have 100 employees in the Bay Area by the end of the year.

GREE’s goal is to take down Zynga and compete in the US mobile gaming market.  They have been traditionally successful on the Japan iOS market with RPG titles like Doriland Expedition but haven’t gotten a foothold in the english speaking market.  Their first english title, Zombie Jombie peaked at #25 top grossing, but hasn’t lasted.  And when your revenues are in the billions, that doesn’t quite cut it.

With the Funzio acquisition, GREE get’s a team 100 strong, about $40M a year in revenue across three titles, and an english-speaking cross promo network.  And unlike other developers, Funzio has held a Top 50 grossing spot for almost a year, proving that their titles aren’t just flash in the pan.

Not bad compared to Zynga’s $200M acquisition of OMGPOP, of which Draw Something has lost over 30% of its DAU in the last 30 days.

You can follow me on Twitter here

How I Nearly Ruined Our Launch

I’m co-founder of a startup called MinoMonsters, we make an iOS game by the same name.  We went through YCombinator in the Winter 2011 batch.  On December 18th, the eve of our big launch, I nearly fucked everything up.  This is the story of what happened.

MinoMonsters is a monster battling game.  Think of it as Pokémon for the iPhone.  As is the case with many apps, our game is internet connected.  During the course of play, the app talks to a backed server. The server diligently performs tasks like finding your friends, letting you purchase in-game items, and letting you play with others.  When planning our server, we needed a solution that was quick to build, had fast performance, and was light over the wire.  We chose to build the server in Ruby with EventMachine and Synchrony speaking in protocol buffers over TCP. 

Since our soft launch on December 6th, the server has been chugging away without fault.  But then the problems began to emerge.  It all started with intermittent crashes.  I would wake up to find that the running EventMachine process had simply stop running.  The approach to understanding this problem is two-fold: recreate the crash to understand the “how” and capture a log to understand the “why”.  Since the crashes were intermittent, I had no way of predictably crashing the server.   So instead I started with logging.

First, by adding logging to my init script:

#!/usr/bin/env /home/ubuntu/.rvm/bin/ruby-1.9.3-p0
require 'daemons'

Daemons.run_proc(..., :backtrace => true) do
  ...
end

Next, by logging errors in EventMachine’s error_handler:

EM.synchrony do
  ...
  EM.error_handler do |e|
    MinoServer.log_error "Error raised during event loop: #{e.message}"
    MinoServer.log_error(e.backtrace.join "\n")
  end
  ...
end

Between these two, I figured I was covered.  The next step was to wait and watch it crash again.  As I monitored the server, I started to see a variety of errors across my application:

#Errno::EMFILE: Too many open files - content/627/inventory.plist
...
Error in .../new_relic/data_serialization.rb:27:in `read_and_write_to_file'
...
#LoadError: no such file to load -- strscan
...
HTTP request to https://graph.facebook.com/me returned HTTP 0, ""

I hadn’t found the source of the crashes, but I had found some issues.  Upon seeing these issues, my first gut reaction was that I must be running out of file descriptors.

$ ulimit -n
1024

Bingo.  Let’s fix that before doing anything else.

$ sudo sh -c '(echo "ubuntu soft nofile 40000" && echo "ubuntu hard nofile 40000") > /etc/security/limits.conf'
$ ulimit -n
40000

Then I methodically went through each error and devised a solution.  It wouldn’t solve my problem, but it would narrow down the options.  Over two weeks, I casually “fixed” more crashes as they cropped up and eventually they stopped.

Then the day came.  We had planned to set the price of our game to free.  As soon as we did, users started flooding in.  And then the problems started anew.

As before, the server was crashing.  I couldn’t recreate the crash, but I didn’t need to.  At first, it was crashing every hour.  Then it was every 30 minutes.  Soon enough, the server crashed within seconds of starting up.  Logs became increasingly useless, as the places in code that threw exceptions were random at best.  Something was falling apart and I couldn’t figure out what it was.  Without a backend, we couldn’t verify virtual goods purchases.  And without virtual good purchases, we would make no money off the crazy initial traction we received.  This was bad news.

I spent four hours trying everything I knew to solve the problem.  I scoured the web for obscure bugs.  I completely ripped out Synchrony.  I triple checked OS resources like file descriptors, disk space, and others.  I was out of ideas.  That’s when I sent out a distress beacon on Hacker News.

Within minutes I had dozens of people helping me get connected with the right Rubyists.  In the end, two people helped me out:  Ryan Stout and Aman Gupta.  With Ryan, we immediately started with generic diagnosis steps.  What does the server do?  What systems does it touch?  We systematically went through each touchpoint and observed it’s potential contribution to the problem.  This helped us eliminate any external factors (OS, Ruby, Library) as the culprit.  Then we got Aman involved.  As one of the maintainers of EventMachine, he was able to quickly start looking at my configuration.  Within minutes, he noticed something odd in my setup code:

EM.synchrony do
  EM.epoll
  EM.start_server "0.0.0.0", 8090, MinoServer
  EM.error_handler do |e|
    MinoServer.log_error "Error raised during event loop: #{e.message}"
    MinoServer.log_error(e.backtrace.join "\n")
  end
  
  MinoServer.initialize_server
end

This code was taken near-verbatim from an Ilya Grigorik post on EventMachine from 2008.  The code from that post originally looked like this:

EventMachine::run {
  EventMachine.epoll
  EventMachine::start_server("0.0.0.0", 8080, Handler)
  puts "Listening..."
}

Aman saw the issue right away.  EventMachine.epoll is a configuration method that must be called before the run loop.  From the documentation:

Call the method EventMachine#epoll anytime before you call EventMachine#run, and your program will automatically use epoll, if available

It turns out that this basic setup code that I copied from a 3-year-old blog post was incorrectly implemented.  If I had been more careful about reading the documentation, it’s possible I might’ve caught this issue myself.  The fix is easy:

EM.epoll
EM.synchrony do
  EM.start_server "0.0.0.0", 8090, MinoServer
  EM.error_handler do |e|
    MinoServer.log_error "Error raised during event loop: #{e.message}"
    MinoServer.log_error(e.backtrace.join "\n")
  end
  
  MinoServer.initialize_server
end

I made that change, deployed, and crossed my fingers.  Within minutes we knew that it worked.  Users were connecting en masse.  Thousands of monster battles started to ensue.  Virtual goods transactions began flowing.  All was right with the universe once more.

Let’s look at what happened:

  • EventMachine uses Ruby’s select by default for network I/O.  This is a sane default
  • Ruby’s select performance degrades as the number of persistant connections goes up.  It has an upper limit of 1024 connections.
  • I was aware of both EventMachine’s default behavior and the limitations of select.  
  • Before the game launched I looked at the code to make sure I was making the configuration call to switch to epoll.  It was there.
  • I did not confirm that the configuration call was working.  I never inspected the running state of the app and double-checked that it was using epoll instead of select.

Ultimately, I failed to ensure that a critical scaling component was activated before going live.  Since being fixed, the rest of the system has run smoothly, resulting in thousands of concurrents and millions of monster battles.

Huge thanks to Aman Gupta, Ryan Stout, and the supportive Hacker News community for helping me come to a solution quickly.  With you all, I was able to take a multi-day problem and get it fixed in a matter of hours.  Thanks.

You can follow me on Twitter here

The Problem with Malaria

Let’s talk about malaria.

Malaria is the one of the leading causes of death in the world.  It’s an important problem to solve.  So important that the Bill and Melinda Gates foundation has dedicated billions of dollars to the cause.

But we havent fixed it yet.  Why not?

As with every disease, It starts with what malaria has evolved to do. The lifecycle of malaria requires both a mosquito and a human. The disease starts inside a female mosquito, jumps to a human host for a while mucking about, and then jumps back to a different mosquito.

Like the common cold, malaria doesn’t instantly kill its host.  If malaria killed its host, it wouldn’t last long enough to transmit to the next host.  Since it require both a mosquito and a human host, the disease stops working if it isn’t constantly jumping around.  This means malaria must be good at least two things: going undetected and keeping its host moving. The longer it can stay in the current host, the more likely it’ll get passed along to the next one.  It turns out malaria is really, really good at this.

Malaria has lots of tricks to stay under the radar.  Think of it like a stealthy assassin that’s infiltrated your city.  You know someone is killing your civilians, but you can’t figure out who.  When you have malaria, your body is trying to tackle this problem.  We often use vaccinations to aid our bodies in solving this problem.  But why don’t we have a malaria vaccine?

It’s important to understand how vaccinations work. There are four top strategies used today:

  1. Put a dead bugger in the body.
  2. Put a neutered bugger in the body.
  3. Put the bugger’s coat (a virus-like particle) in the body.
  4. Put the bugger’s perfume (a protein marker) in the body.

All four of these strategies work the same way. Your body recognizes an intruder and teaches itself how to eliminate the threat.  With a vaccine, you teach your body what signs to look for in a disease.  It’s like equipping the police of your city with the physical description of the assassin.  But for malaria it’s a different story.

We know that malaria relies on going undetected.  It needs to transmit from host to host to operate correctly.  Malaria uses stealth to render the first two strategies ineffective. Introducing dead or neutered buggers into the body doesn’t give the immune system enough information to respond when the real deal comes by.

We can’t use the third strategy because Malaria isn’t a virus.  Viruses are organic robots that use some mechanical parts to inject their DNA into other cells.  You can identify them by their distinctive exterior.  Our third vaccination strategy doesn’t apply when we’re dealing with a non-virus disease.

That leaves us with only one option: a subunit vaccine.

A subunit vaccine works by taking the “smell” of a disease and giving it to the body.  The immune system will then respond to anything with the “smell” with a heightened response.  It’s like giving the police of your city a bottle of perfume and telling them to interrogate anyone smelling of it. These are difficult vaccines to put together. Proteins, like perfume, are a carefully constructed work of art and are hard to replicate perfectly. The body can get confused since it’s only given a small, complex piece of information to work with.  This can render the immune system response ineffective.

It only gets worse.  The malaria bugger goes through three different life stages while in the body. If  targeted at the late stage, malaria has already done enough damage to kill the human host. The ideal is to target the early stage, preventing transmission and protecting the human from symptoms. Unfortunately, the first stage only lasts 5 minutes.  If the body fails to identify the bugger and kick its ass in that time frame it effectively vanishes from sight.

So we have an assassin entering our city. We are already doing everything we can to destroy his transportation before he enters our borders (spraying DEET to kill mosquitos). We have a short time window to catch him before he masquerades as a citizen.  If we fail in that time we will never find him. So our only option is to look for signs of a spy and ruthlessly eliminate anything that fits the bill.

It turns out there’s a vaccine called RTS,S/AS01 and that’s precisely what it does. In the 80s we were able to produce a protein from the malaria sporozoite (the first stage inside the human body) to get a small level of immunity in humans. The problem since then has been ramping up the immune response. The body needs to act fast (within 5 minutes of being infected) and with extreme prejudice (wiping out the sporozoites) using very little training (a single sporozoite protein). That’s hard.  Then we have to vet this vaccination which requires human field testing in Africa.  That’s even harder.  Since that breakthrough in the 80s, researchers have been working hard to get the vaccination into a safe, sustainable condition.  In 2012 they are really close, but still have a ways to go.

The problem with malaria is that it’s tricky.  It’s taken time and some of the smartest people in the world to solve it.  With RTS,S/AS01 we are nearly there.  This vaccine is the culmination of 50 years of dedication, research, and hard work. It’s also a miracle of modern science. Most importantly, it will save millions of human lives.

You can follow me on Twitter here