open-source, performance

Load testing Apache Thrift

TL;DR: Use massive-attack

We are currently experimenting with integrating Apache Thrift into one of our Finatra-based APIs.

There was a bit of learning curve involved in this, as firstly most of our APIs use Akka HTTP, and also we had not utilised any RPC frameworks before. As part of the prototype, I created a simple Finatra API which had two endpoints that returned static responses: one over HTTP, and the other using Thrift. This is quite simple to do, after you figure out which plugins to use to generate the code based on the Thrift Interface Definition Language (IDL).

It took probably a day to set this up and deploy it to AWS – but then came the realisation that Thrift might simplify a lot of things, but load testing your endpoints is not one of them.

Because of how Thrift works, you would need to create a client to return your data; this is basically just a method. For example if you have created MyService Thrift service in API #1, you would simply create a client in API #2 which needs the data provided in MyService by:

lazy val thriftHost = "localhost:9911"

lazy val thriftClient: MyService.MethodPerEndpoint = 
  Thrift.client.build[MyService.MethodPerEndpoint](thriftHost)

API #2 can then surface the Thrift data from API #1 in JSON (or any other format) in an HTTP endpoint:

I then created Gatling load test scenarios for API #2 which load tested two endpoints: the first one powered by the API #1‘s HTTP endpoint, and the second one powered by API #1‘s Thrift endpoint.

The load tests ran fine, and Thrift-powered endpoint was faster as expected. But the problem with load testing this way is that you are basically load testing HTTP endpoints of API #2, not the Thrift endpoints of API #1, and it didn’t give me a clear idea of how many requests the Thrift endpoint can really handle when accessed from multiple APIs.

The next logical step was to look for load/performance testing tools that were capable of testing Thrift endpoints directly. This proved much more difficult than I expected; strictly speaking there are tools that can do this, but there were three big problems with them:

  1. They were quite complicated to use
  2. They were not updated in some cases in years
  3. And most importantly, they did not provide me with the information I wanted a load test tool to provide me with, such as how many RPS the Thrift endpoint can handle, and how fast.

In this process I experimented with Pinterest’s Bender, Twitter’s iago, and even tried writing my own JMeter Thrift plugin by following an obscure tweet down the rabbit hole.

Eventually all of these (failed) attempts made me think that load testing a Thrift endpoint cannot and definitely should not be this difficult. So, I started writing my own simple load testing tool, and called it simple-load-test. I eventually changed the name to massive-attack, which was brilliantly suggested to me by my colleague Michael; a bit of background: in my team we name our APIs after bands, which is incredibly confusing, but fun.

The concept behind massive-attack is quite simple: you can load test any method which returns a Scala (or Twitter) Future, and it will tell you the response times for that method after calling the method for the specified number of times or duration. You can do this as part of your normal unit/integration tests – I might change this later to implement the SBT’s test interface, but it works perfectly fine when added to Specs2 or ScalaTest scenarios.

For example to load test a Thrift endpoint,  you add the following to your test specs:

"Thrift endpoint" should {
  "provide average response times of less than 40ms" in {

    lazy val thriftHost = "localhost:9911"

    lazy val thriftClient: MyService.MethodPerEndpoint = 
      Thrift.client.build[MyService.MethodPerEndpoint](thriftHost)

    val testProperties = MethodPerformanceProps(
      invocations = 10000,
      duration = 300
    )

    val methodPerformance = new MethodPerformance(testProperties)

    val testResultF: Future[MethodPerformanceResult] = 
  methodPerformance.measure(() => thriftClient.programmes())

    val testResult = Await.result(testResultF, futureSupportTimeout)
  
    testResult.averageResponseTime must beLessThanOrEqualTo(40)
  }
}

This will call your thrift endpoint called “programmes” 10,000 times (or for 5 minutes, whichever come first) and asserts the average response times to be less than 40ms.

You can do assertions based on any of the properties returned as part of the test result. At the moment, following are supported:

  • Minimum response times (ms)
  • Maximum response times (ms)
  • 95 percentile response times (ms)
  • 99 percentile response times (ms)
  • Average response times (ms)
  • Number of invocations
  • Average requests (RPS)
  • Minimum requests (RPS)
  • Maximum requests (RPS)
  • Number of spikes
  • Percentage of spikes
  • Boundary that response is consider a spike

As you can tell, you can test any method this way – even HTTP endpoints:

...
  val httpClient: HttpClient = new HttpClient()
  val httpRequest: httpClient.RequestBuilder = 
    httpClient.get("http://0.0.0.0:8080/programmes")

   val testResultF: Future[MethodPerformanceResult] = 
     methodPerformance.measure(() => httpRequest.execute())
   ...      

As part of setting the test properties, you can also specify on how many threads you want to call your function – this is useful for HTTP/normal methods, but not so much for Thrift endpoints as the client runs only on one thread and calling it from multiple threads causes problems.

This library still needs a lot of work and fine-tuning, but the first version is now available through Maven – and more improvements will follow soon.

 

open-source, sbt-plugin

Tracing usage of your Scala library

Finding where your code is used across multiple projects in a big code base or organisation can be quite difficult, especially if you have made a change that needs to be propagated to every application that uses the updated client or library.

I have created an open source SBT plugin that can simplify this process a bit, more details can be found in the sbt-trace page.

vs.

Maps or For Comprehension?

Maps are a powerful tool in Scala, allowing you to apply a function to each item in a collection; however they sometimes can be a bit difficult to understand, especially for programmers who are new to Scala or functional programming in general.

As mentioned previously, one of the strength of Scala lies in the fact that it allows different solutions to the same problem. For example, given we have a list of Strings and want to transform each member to uppercase:

val listOfItems: List[String] = List("first", "second", "third")

We can simply do this using maps:

val upperCaseListOfItems: List[String] = listOfItems map (_.toUpperCase)

Which would result in:

upperCaseListOfItems: List[String] = List(FIRST, SECOND, THIRD)

However, we could have as easily used for comprehension:

val upperCaseListOfItems: List[String] = for {
  item <- listOfItems
} yield item.toUpperCase

Maps might be more functional, but I have always found for comprehensions easier to use and understand. It basically depends on how you learnt about Scala and your background, and which of these two methods you learnt about first!

I have always believed that what matters a lot is that the code is readable to other engineers, and for comps certainly achieve that.

vs.

Maps vs. Pattern Matching

One of the things I like about Scala is that it allows you to do the same thing, many different ways. Because of this, there is always the challenge of making the code more efficient; hopefully not at the expense of readability.

Take for example the concept of pattern matching in Scala. If we have:

case class Availability(startDate: Option[String])

val availability = Availability(Some("2016-05-06T09:00:00"))

And want to write a method which retrieves the startDate from an optional Availability, short of using if/else conditions, pattern matching would be the easiest way:

def getStartDate(available: Option[Availability]): Option[String] = available match {
  case Some(x) => x.startDate
  case None => None
}

Then if we call:

getStartDate(Some(availability))

We would get:

res0: Option[String] = Some(2016-05-06T09:00:00)

But this is not the most efficient way, and there is a much simpler way of getting the same result using maps:

def getStartDate(available: Option[Availability]): Option[String] = 
  available map (_.startDate) getOrElse None

Which can even be further simplified to:

def getStartDate(available: Option[Availability]): Option[String] = 
  available flatMap (_.startDate)

Personally I prefer the solution with flatMap, but because of the flexibility that Scala provides, you can always choose any of these options to get the same result.

 

json4s

json4s Custom Serializers

I recently had to work with custom serializers, which were interesting to say the least. I had two case classes and a trait in the following formats:

trait Parent
case class ChildClassOne(kind: String = "first_type", id: String) extends Parent
case class ChildClassTwo(kind: String = "second_type", id: String) extends Parent

And another case class which contained a list of Parents:

case class ParentResponse(total: Int, results: List[Parent])

Basically the json response might have a list of objects which can either be of type ChildClassOne or ChildClassTwo.

Because of this (I thought) I needed to create a custom serializer:

class ParentSerializer extends CustomSerializer[Parent](format => ( {
    case JObject(List(JField("kind", JString(kind)), JField("id", JString(id)))) 
        if kind == "first_type" => ChildClassOne(kind, id) 
    case JObject(List(JField("kind", JString(kind)), JField("id", JString(id)))) 
        if kind == "second_type" => ChildClassTwo(kind, id) 
  }, {
    case _ => null
  }))

This worked fine. Problem was that these objects might get quite big and I didn’t want to specify every single field in custom serializer. I also was not modifying the properties in any way, and was using the custom serializer just to return the right type of case class based on the kind field.

After trying different ways, I found doing it the following way to be the best solution using a Serializer:

trait Parent
case class ChildClassOne(kind: String = "first_type", id: String) extends Parent
case class ChildClassTwo(kind: String = "second_type", id: String) extends Parent

case class ParentResponse(total: Int, results: List[Parent])

class ParentSerializer extends Serializer[Parent] {
    private val ParentClass = classOf[Parent]
    implicit val formats = DefaultFormats

    def deserialize(implicit format: Formats): PartialFunction[(TypeInfo, JValue), Parent] = {
      case (TypeInfo(ParentClass, _), json) => json match {
        case JObject(JField("kind", JString(kind)) :: _) => kind match {
          case "first_type" => json.extract[ChildClassOne]
          case "second_type" => json.extract[ChildClassTwo]
        }

        case _ => throw new MappingException("Invalid kind")
      }
    }

    def serialize(implicit format: Formats): PartialFunction[Any, JValue] = Map()
  }

implicit val formats = DefaultFormats + new ParentSerializer
Akka

Akka

To create actors:
  1. Create ActorSystem
  2. Create an actor which extends untyped actor
  3. Create ActorRef which has System.actorOf step 2 actor
  4. Tell (fire & forget) or Ask ActorRef messages
  • Order of messages are retained per sender and receiver
  • Supervisor detects and responds to the failures of actors. This means if an actor crashes, a notification will be sent to its supervisor, which can decide what to do about it. This provides separation of processing and error handling
Uncategorized

Abstract Members

A member of a class or trait is abstract if the member does not have a complete definition in the class. Abstract members are intended to be implemented in subclasses of the class in which they are declared. Unlike Java, besides methods, you can also declare abstract fields and even abstract types as members of classes and traits.
  • Traits by definition are abstract
  • Types: one reason to use a type member is to define a short, descriptive alias for a type whose real name is more verbose, or less obvious in meaning, than the alias. Such type members can help clarify the code of a class or trait. The other main use of type members is to declare abstract types that must be defined in subclasses
  • It is OK to override a ‘def’ with a ‘val’ in the child class, however you cannot override a ‘val’ with a ‘def’
  • vars declared as members of classes come equipped with getter and setter methods. This holds for abstract vars as well. If you declare an abstract var named hour, for example, you implicitly declare an abstract getter method, hour, and an abstract setter method, hour_=
  • Pre-initialized fields let you initialize a field of a subclass before the superclass is called:
    • Pre-initialized fields in an anonymous class expression:
new {
     val numerArg = 1 * x
     val denomArg = 2 * x
} with RationalTrait
  • Pre-initialized fields in an object definition:
object twoThirds extends {
    val numerArg = 2
    val denomArg = 3
} with RationalTrait
  • Lazy vals: sometimes you might prefer to let the system itself sort out how things should be initialized. This can be achieved by making your val definitions lazy. If you prefix a val definition with a lazy modifier, the initialising expression on the right-hand side will only be evaluated the first time the val is used. This is similar to the situation where x is defined as a parameterless method, using a def. However, unlike a def a lazy val is never evaluated more than once. In fact, after the first evaluation of a lazy val the result of the evaluation is stored, to be reused when the same val is used subsequently