Field Notes from April 2015

Home

note

Entry dated 2015-04-26

I’ve been working through various false starts and rescopes of the TraceLogs project. The TraceLogs project is an umbrella term for a small product around error tracking (stripped down Honeybadger/Aibrake etc) that serves a test bed for a bunch of my ideas on software engineering and system design. Here is a quick list of some high level points:

  • Implementation agnostic test suite
  • Simlutation testing
  • Benchmark suite
  • As many languages & architecetures as possible
  • Continuous delivery with Mesos
  • Deployment pipelines with Mesos

Mostly, I want this project to level up my skills as developer in many different areas. Secondly, the product itself should be useful. I have written about some of these ideas in previous field notes as well. So without any more blathering, time to run through the failed experiments, why the failed, and the constant shifting scope to finally arrive at something actionable.

Failed Expirement 1: Verification Program

The first thing on the plate was to explore my ideas on complete blackbox testing using an implementation agnostic test program. The program would take the API server & frontend GUI hosts/ports as input and run through tests to say the specific implementation is correct or not. This proved to be quite challenging. There are two bits that require testing: The HTTP JSON API & the Server side rendered HTML GUI. I considered the following implementations. They were considered because of previous experience and there was not point in trying to introduce new tools into this part of the project because it’s just not important. E.g, they may be a very good testing tool in Python but I’m not familiar with Python to execute effectively.

  • Bats test suite with curl & some sort of XML thing for GUI testing
  • Ruby with faraday for the API & capybara (w/Poltergiest) for GUI
  • Node with supertest, request, and mocha
  • Node with nightwatch, request, and mocha

Personally I was more interested in the bats option. This option seemed the most appealing because it only required shell tools meaning it was about as implementation agnositic as it gets–it doesn’t even have a language. Previous exprience using curl inside bats for HTTP testing proved that bit was suffecient. It required custom helpers (e.g. curl for status, curl for header) using the various output format flags. That bit was all doable (as I’d done it before). There was the matter of more complex JSON generation in the shell. This sucks as there’s not way to do it besides string creation. Shitty, but manageable. So maybe not most developer friendly solution but functional. The other question was what to do with the HTML bit.

This pretty much sunk this idea. There is no go way in the shell to manage the DOM and user interactions. This is because this sort of thing is inherently stateful and requires some daemon to issue commands to. Also I don’t think this area is quite explored. There was some XML parsers but that only covers asserting on results, not interacting with the DOM. So end here. This approach would be viable if the process(es) under test were stateless and/or had better interface than HTML.

So onto the next option. No code has been written at this point, just investigation. I wanted to actually write something. So I decided to investigate the various node options. I figured that the node ecosystem probably has the best tools for working with the bullshit that is web development. Since it’s javascript there are plenty of good libraries (e.g. request) for making HTTP requests. Mocha is a decent enough test runner to make it all happen. I started off by looking into supertest (as I’d used it on other projects for testing express applications). Unfortuntately supertest seems to require an http server object. This immediately nixed this setup because the tests require network traffic. It’s better off anyway because supertest is ok. I prefer to make the requests and do my own assertions.

I did not want to give up on node things just set so I decided to investigate Nightwatch. The frontend lead at work told they used this to test the web client. Figuring this was good enough for their uses (which are entirely more sophisitcated than this experiment), I decided to spike on it. This approach was certainly fraught with tradeoffs. Nightwatch is one of those “take over the entire project” project. It has its own test runner, its own code loder, its own configuration file, and a bunch of other things that assert total ownership over the repository. I figure this could be worth it since it would manage all the browser things and provided a decent enough interface for filling in test boxes, clicking things, and all that other jazz. I could require request, use some promises to coordinate test aginst the API then the GUI. It seemed like it would work. It’s also useful to note that API is very small (just two requests) and the GUI is more complex so it seemed to OK to align these two tradeoffs.

Naturally I immediately ran into problems. The first step was to simply get nightwatch running. The nightwatch docs have a demo test. I could not this to work. I could not get the PhantomJS integration working as it didn’t support all the assertions. IIRC the assert title function failed or something. I really wanted to use phantomJS because it’s headless, but wasn’t entirely ready to give up. I tried using the actual browsers with selenium. I couldn’t get that working either. Some of the assertions still failed. I was able to create one passing “test” that simply opened google.com. However it did not include any assertions. That alone was the deal breaker, but it also conflicted in other areas. I wanted this test program to not require a ton of extra dependencies. Even if I could get one example thing passing in Firefox, that would require a selenium server, X server, and the browser itself. Now I know you could use sauce labs or something similar, but that’s just way too many external dependencies for a simple verification program. I did not even attempt to use request or any other things with the API specifically because I could not even get nightwatch working. So that was the end for this approach. I briefly looked into some of the other things like Zombie.js, but decided that this was as far as I was willing to go with node for this bit. It’s too bad really becuase the async nature works well for these things.

So that left one option on the list: Ruby and most my most familar tools. I knew this option would work so I wanted to evaluate it last. Well “work” is subjective, because sure the networking works, but what does the final product work?

Failed Experiment 2: Test program with Capybara & Friends

Armed with the comforting knowledge that my chosen tools would work with networked process, I set out to write a ruby program that took two arguments as input: the api sever & the gui server. The test program would executed like this: $ trace-logs-spec api.example.com gui.example.com. Using a combination of capybara & faraday the appropriate tests would happen. I set to work and some interesting things started to happen.

First off, I wanted each part of the spec to have a unique number. For example: 1.4.2.1 API rejects type parameter when all whitespace. The test suite would run then just spit out a bunch of check marks or X’s for each line item and exit or 1. This seemed like a good idea because each test would have a single assertion (the rest being preconditions). I created “spec” classes. Each class had an execute method that took the faraday connection & capybara session as arguments. Each class also implemented doc.spec and doc.description to make the report nice. This went all well in good for a few specs. I started out by writing specs for the main API call. the specs tested things liek parameter formats and all the normal input validation bullshit. Then two other concerns entered the experiment.

I had a dummy implementation in ruby that existed before the start of the experiment. I used that to test the specs. But then I asked the question: how can I test test program? That answer is quite easy actually. You provide a reference implementation. This in of itself is not a problem persay, but it deos require extra work. I set out to create the simplest possible rack applications to pass all the tests. I mean, the absolute shittiest things. This was actually fun in some way because I did not even use extra HTML. Just big ass heredoc strings in the sinatra route handlers. This bit actually worked out OK. So I soldiered on using my shitty reference implementation to verify verificatin program.

Then the next thing happened. I had written a simple test runner. The test runner went something like this: $ trace-logs-spec API_URL GUI_URL [FILE] [FILE], so you could do trace-logs-spec localhost:9091 localhost:9092 spec/api/*. Loading the ruby file would register the defined classes in a registry, then the test runner iterated over each defined class, instantiated it, and called the execute method (described earlier) and reported on the results. This worked fine. The problem was writing the spec classes themselves. I wanted to keep them isolated and small–so there was no super class or the like. Each spec contained everything required to run it. However this lead to a lot of copy and paste. The problem was that every API test required an account (since it’s an authenticated API). So before making the request the spec needed to go through the sign up flow everytime. I ended up copy and pasting this a bunch of times to keep the experiment moving. Figuring, something will have to be done about this eventually. The answer became obvious. Elminate the copy and pasted by having some sort of superclass or some setup & teardown. At that point I realized that I would just be rewriting my own test runner (e.g. MiniTest). I was also defining the spec numbers myself. This made it very difficult to keep them sensical beause they had to be grouped with others (so you could see all the similar spec numbers). That point it was just reduced to generic test class classes where each “spec” was a method on test class. Sure there wouldn’t be nice spec numbers, but that’s just the cost of doing business.

I consider this experiment a failure because no useful code was produced and no verification program was produced. However, it did show many ways how not to accomplish this task. Eventually if fail enough you may get an idea on how to succeed.

Moving Forward after 2 False Starts

Interesting spot that. All my experiments had failed. However the general setup for such a verification program was now obvious:

  • MiniTest
  • Group related specs into files
  • Provided list of tests via ARGV
  • Use Faraday
  • Use Capybara
  • Test against reference implementation

All of those together could create a verification program. However all the previous efforts simply glossed over a key fact in the program. The final implementations will be asyncronous. The API returns 202 Queued for a reason. However the reference implementation is sync (data store directly in an imemory array). So the tests that do things like: Make N API request, do some gui things, assert that N different things are displayed will not work in general. Sure you can add a wait, but that’s just a hack. It may work sometimes but you always end up in the scneario where sometimes it takes longer so the wait time increases, and the cycle repeats–or maybe some random thing prevented it from working that one time. I do not want to go down this road. In short, it is impossible to create a verification program that uses the API & GUI in coordinate to acceptane test the final product.

I have not touched on it yet but some things pointed to needing a third “controller” component. This thing component is like a backdoor to be honest. This is because the two components may not provide access to all functionality required for the tests. For example, there are things the system admin may do but are not part of the product. How do you access these in the product acceptance test context? This sort of thing kept coming up, but there wasn’t a need for such a thing in these experiments, but I could see it being a requirement in more complex systems.

Testing the machine & human interfaces at the same time also makes things difficult. The machine interface is stateless and the GUI is stateful. This creates cross cutting concerns and requires much different tools. I also realized that I had no interest in ever rewriting the GUI. I don’t like writing GUIs. I like working with machine to machine interfaces. For example, I want to use Erlang to implement the product. I don’t want to waste time figuring out how to produce HTML in that language. Many languages have much better support for creating thin clients so need to involve that in the general experiments.

At this point I figured it was time reasses and determine how to move forward.

Principles of System Design

I’ve been refining my principles of system design more and more. One of them is:

optimize the public interface for integration; use statically defined protocols for internal interfaces.

This because you have no control over the outside world and you do have complete control over the system internals. This principle also means internal architecture should tend towards SOA wich each component having a statically defined interface to optimize cross language access. In short this boils down to use HTTP & JSON for internet facing things & use Thrift to define internal protocols. There is much more to say about this principle, so back to this projects.

The best way to architect this product is to have three intefaces:

  1. Internet facing HTTP & JSON API
  2. Iternal facing thrift server
  3. Internet facing GUI talking thrift to backend.

The thrift protocol defines everything the GUI needs and some extra things. E.g. there is a createAccount RPC. This means the test program can use a machine/machine interface instead of going through a GUI layer. Also the frontend can be isolated and easily tested in all casses: simply provide a mock/stub for that required thrift RPCs. The verification program now shifts to verifying the two machine/machine interfaces: HTTP & Thrift. This also ensures an alternate backend implementation passes the tests so the GUI can simply be plugged in top allow the two to evolve in complete isolation. It also allows more complicated tests of the frontend (multiple browsers etc), but I don’t care about such things. Also working with human facing things is much more complicated thatn working with machines and require totally different work flows. I’m interested in things you can test and testing look & feel is impossible. Thrift itself is also great because it supports the large majority of langauges (and thusly covers all the languages and architectures I’d like to evaluate).

So now that some of the key problems and scoping things are addressed, it should be possible to move on with experiment three: implementation agnostic HTTP API & Thrift server verification program.

Experiment 3: Verification Program Revisted

Armed with all the lessons from previous failures and the knowledge that the concerns are separated it was time to get going. The general hypothesis is:

Given a language with Thrift support & HTTP libraries and a reference implementation, it is possible to create an implementation agnostic & maintainable verification program.

This should be easy enough. I decided use D for this experiment. I like D, it supports thrift, has built in unit testing (so testing the reference implementation is easy enough). Also mainly because I really like D and want to do more things with statically typed and compiled languages. It’s also nice to create an executable binary eliminating some dependencies. Upon quick initial research D does not appear to have a more complex test runner. This because DMD as built in unittests so separately defined test cases with xUnit like behavior hasn’t been needed. This is not a problem because a class could be created for each functionality or the like.

Unfortunately the D implementation seems to have failed before it even got started. I have not been able to compile a hello world program with the thrift libraries. Also a new version of DMD (the D compiler) was recently released. It’s uncertain if the thrift libraries will work with the new compiler, and given the generally small D community that any attention would be paid to such things. Compilation fails because of a linking error for x86_64 symbols, but it’s uncertain for which library. So things have stalled there. I’m currently compiling on OSX. I will try a linux VM (which the development should be in a VM anyway) and see if things work. If that doesn’t work without significant time investment the D implementation should be abondoned. This would sadden my greatly because I will probably not write any D if cannot get it to work with these things.

There are of course alternatives to D. I would select either JavaScript or Ruby. They are both interperted languages so things are dependencies are pretty much the same. I actually lean towards JavaScript for this because working with thrift in Node is nicer than Ruby and request is nicer than faraday. Mocha is decent enough as well. Most importantly I know either Ruby or Node implementations will work and move the experiment along. Once the verification program is working, the real fun can begin.

podcast

Entry dated 2015-04-05

Another episode of the MostlyErlang podcast. This one covered things going on in the release project. It was interesting, but most oit was completely over my head. It’s nice to know that there are smart people working on Erlang. If anyting it encourages me to spend more time working on Erlang things.

Entry dated 2015-04-04

Entertaining episode of the Ship show podcast. Not many takeways though since I’m not actively working in this area. However something did stick with me. They started to talk about the upgrade path for installed packages. The comments were “who in the fuck upgrades packages?” Note, that they do not usually swear on the show so it was obvious they were all emotionally engaged in this topic. And it’s true. I’ve noticed that changed in my own work over the years. Who in the fuck upgrades packages these days? Isn’t everyone using a VM or have some sort of configuration management? Why in their right mind are people running sudo apt-get -y upgrade on production systems, more over why is there shell access on these boxes? Nice way to pass an hour.

Entry dated 2015-04-04

Listened two very good espidoes from the Mostly Erlang podcast. Seem to be bunch of smart guys discussing, well, mostly erlang. First I listened to an episode on the “7 More Languages in 7 Weeks” book. The episode itself was interesting, mostly to hear well educated people speaking on many different languages. The episode featured discussion on Idris. Idris is a language is dependent types. This is crudely summarised as language contains a programming language inside the system. The author of the book mention that Idris changed the way he fundamentally thought about programming. Whenever someone says that, it grabs my attention. So luckily a few episodes later on they did one on Idris with the creator of the language.

This episode was much more interesting. It definitely wet my whistle. This topic requires more research. The Idris project itself seems to be portable and powerful. Notably is has an intermediate form. The intermediate form is something like “first functional form” (cannot remember the exact name). It made me think of a bunch of composed functions that could be executed in any language that can invoke functions (read: all of them). Idris has a compiler for Javascript and for C (which then generates native code). All in all very interesting. Key take away from the language creator:

Idris is a project for building DSL

This quote is paraphrased, but the creator kept hammering this point that the dependent types should make it easier to express the problem domain in a statically verifiable way.

thought

Entry dated 2015-04-05

Discussed current ideas on testing with Peter. We discussed the “no environment” mantra as well. It will be interesting to see if anything comes from that. The large part of the conversation focused on “out of process” testing. This is a term I coined (althought I’m sure someone has already came up with one) for creating an implementation agnostic test suite for networked programs. E.g. If the program is a server, then it’s possible to write a client to verify the server’s behavior. This mode is entirely black box. The client has no idea what the server is, just how to talk to it. Contract this with various forms of testing in process. Everything that happens inside the process is some form of whitebox testing. That is, you are aware of the internals and can directly manipulate memory or use the programs internal interfaces. The real world would example is starting an HTTP server and running curl vs executing a test to a Rack/Plug (or similar) interface. The ruby community exclusively does the latter. I think the former should be investigated and applied to appropriate projects.

However there are a few concerns. Here are few:

  • test suite coordination, bats + make would probably do well enough
  • CLI tool availablity. Testing HTTP responses with curl and jq would work fine enough, but there doesn’t seem to be a nice XML/xpath program. Seem the story for testing HTML is unclear. Note, that it may be possible to consider something like nightwatch.
  • State management. The point of this approach is to only verifiy networked programs through the networked interface. However all prorgams need certain state/date. So as part of the test suite, you’ll need to setup state. There may not be a way to set it up through the public interface. So how can you bridge that gap?
  • Bang for the bug. This approach does not eliminate white-box in process unit tests. Instead it shifts the burden to another program to verify high level flow through the network interface. It does not make sense to test everything in this way. Whatever test cases are constructed should hit as much of the system as possible.
  • Bats itself may not fit the use case. The key problem is that bats cannot provide assertion messages. It may make sense to write each test as an isolated bash script. There you can echo, fail, and source whatever other functions you need.

I’m considering this approach for tracelogs:

  • Create implementation agnostic test client for the HTTP & JSON backend
  • Some sort of javascipt thing for testing the web frontend (this bit assumes the backend & frontend are two distinct components)

This allows me to experiment with different backend implementations (which is one tracelog’s primary goals).

Entry dated 2015-04-04

Came across another article talking about configuring ruby programs through dynamically generated YAML. Like what. I’m starting to really real against this sort of shit. It made me think of a “no environments” rule. The idea is that the program should have no environments, but enforce operation purely through functional means.

The best I can think of is command line args. We operate in a world of docker containers and various other traffic direct to process scenarios. So the developer as a high level of control over how the program is started. This makes it possible to expose every configuration option the program requires through a CLI. This idea should be paired with no environments. That is there is no “production” configuration file. There is only an set of options which are used exclusively to change the behavior of the program. This would eliminate the need for “boot tests” as I call them.

Consider a program that uses redis. It requires a connection via a URI prameter. Start the program with: bin/server --redis-uri=redis://82.382.3891.91:6397. There no need for environment variables. The program should abort if all required options are not given. Environment variables are still useful, but no to the program itself. The --redis-uri value could default to $REDIS_URL. This way the flag may not have be provided, but the program itself will also have explicit configuration. Such a setup also makes it easier to see exactly what a program needs to start. Simply run bin/server --help. This will tell you everything you need to know to operate the program.

I think I will prepare some more ideas on this. Maybe blog post it.

lang

Entry dated 2015-04-04

Finished Chapter 6 of the Programming D. Figured it’s a good time to record some thoughts on D itself (also since I completed the first useful D program earlier this week). So the tl;dr is that I’m very interested in D. The language seems to have everything a developer needs to write well designed software accept a thriving ecosystem around it. That bit can change, just takes exposure and marketing. Here are some things that I really like about D.

  • Statically typed with an auto declartion so you don’t have to type int x = 0; when type can be automaticlly inferred
  • Contracts! Automatically enforced for either input or output of a function.
  • Parameterized types with type checking via if clause in the function definition
  • First class functions & closures
  • Decent set of enumerable methods in the standard library (e.g. map is built in function)
  • No need to write for loops. Use foreach.
  • Class invariants! Oh hell yes!
  • Classical inheritence implementation with interfaces & multiple inheritance
  • Metaprogramming!
  • scope is awesome
  • built in unit testing
  • Actually concurrency support (import std.parrallel)
  • UFCS makes it easy to use foo(bar) or bar.foo() when a function accepts the first argument as the correct type.

I will continue my adventures in D. Very promising language with a small computer. That means it’s easy to make a big impact. Thinking of creating my triple threat: Ruby, D, & Erlang.

article

Entry dated 2015-04-03

Came across this good paper on software architecture patterns. The paper is a short read, it’s ~40 pages. The paper isn’t ground breaking but it does cover a few patterns at the high level. One quote did stick with me though:

If you find you need to orchestrate your service components from within the user interface or API layer of the application, then chan‐ ces are your service components are too fine-grained. Similarly, if you find you need to perform inter-service communication between service components to process a single request, chances are your service components are either too fine-grained or they are not parti‐ tioned correctly from a business functionality standpoint.

This made me the consider the architecture at saltside. I’ve never considered it a microservice architecture. It’s always been “lagom” SOA. This did make me consider moving the search service into the core service. Originally it was separated for development purpose and not for true architectual reasons. I figured it was better to have an isolated codebase people could work on without getting wrapped up into the development concerns of another.

A good rule for defining the boundaries between services is how much cross talk there could be. In the core service case it has to push notification to search for indexing and removing data. Since core is the primary data owner it makes sense to move that functionality inside core. Not sure if that makes sense, but the paragraph did make me consider it.

quote

Entry dated 2015-04-03

I came across the best quote. It was in the HN comments from the software architecture paper posting. Here it is:

That’s a horrible description. The best people with “software architect” in their title that I have worked with have focused on total system quality through striving towards simplicity mainly by focusing on development and tooling and using their position as the arbiter of technical decisions to guide the system under development to coherent whole that fullfills the business requirements in a technically sound way.

I think the quote describes exactly how I try to do my job. Focus on the complete solution and strive for a technically correct & coherent way. Naturally the quote is also paired with examples of bad architects. That will never be me.