go-challenge/src/numbers/thoughts.md

4.4 KiB

Workflow approach taken

  1. get a minimal version of a handler working that responds with unsorted and undeduplicated JSON lists, no go-routines
  2. add asynchronous fetching of the URLs for performance
  3. ensure the 500ms timeout is met
  4. implement the sorting algorithm
  5. review manually for undefined behavior, memory leaks, possible security vulnerabilities and consistent error handling
  6. write tests

Retrieving of URLs and timing guarantee

URLs are retrieved asynchronously for performance reasons. Then the handler enters an aggregation/sort loop in order to gather as much data as is already available and process it immediately for sorting. This is repeated as long as there is still potential data to be discovered (as in: not all URLs fetched). If the timeout is reached at any point (while aggregating data or sorting) the handler returns immediately with the sorted data it has at that point.

That means that sorting is potentially done multiple times. That decision assumes that the highest time fluctuation comes from the HTTP requests and not from the sorting algorithm. It also helps with stopping early and further hardens the 500ms timeout, because we already have at least parts of the results sorted instead of none.

This rules out the case that we aggregate all data close before the timeout, but fail to sort the huge list within the rest of the timeframe and exceed it.

This might be tweaked, changed or optimized in case there is more information on the data that the backend regularly processes or has to expect.

Sorting/Merging of the numbers

Initially I wanted to write a lot of custom code that would switch the algorithm depending on the endpoint used, as in: only do expensive sorting when /random was retrieved and otherwise assume sorted lists that can easily be merged and deduplicated.

However, the actual endpoints are not in the task description and the example testserver is not a specification of the expected/valid endpoints and the properties of their responses.

Instead I use an iterative version of bottom-up mergesort. The reasons being:

  • relatively easy to implement
  • because it's iterative, it doesn't potentially blow up the stack
  • faster on partially sorted lists than quicksort
  • better worst case time complexity than quicksort

Parallelizing the merge sort algorithm can be considered, but isn't implemented, because there is no hard evidence that it is particularly useful for the expected input, nor is it expressed in the task.

The sorting procedure also stops early when the input timeout channel is triggered. Apart from a maybe small performance benefit, this is mainly done to avoid potential DoS attacks with overly huge inputs that get fetched within the 500ms timeframe. This is necessary, because the handler cannot kill the "sort go-routine", which would continue to run even after the handler has responded and consume CPU and memory. This is very basic and might be extended or removed in the future. Up for discussion.

Error handling

If URL parameters are missing, then HTTP code 400 is issued, instead of an empty 200 response. This kinda hijacks the HTTP protocol, but allows to signal wrong use of backend API. Some Google APIs also seem to use this method.

No package level error variables are used since the specific errors in getNumbers are irrelevant for the handler to proceed. Instead they will just be printed and the response integer list will always be set to nil on any such error.

The task does not exactly say how to handle the case of a single URL responding with garbage. It is assumed that it will be ignored just like URLs that take too long to respond.

JSON parsing/response

There are a few inconsistencies in the task and the testserver given about the exact form of the JSON input and output.

I decided that the input has to have an uppercase 'Numbers' key and the response of the handler as well.

The JSON parser isn't overly strict. In case I was allowed to use 3rd party packages, I'd choose a parser-combinator library to stricten the allowed input (e.g. only one Numbers key allowed, no garbage at the end of the input etc.).

TODO

  • make 'handler' function more slim (could be deduplicated)
  • could use contex.Context for more abstraction
  • more general types in some cases
  • proper mocking for the handler (no 3rd party libs allowed)
    • maybe use a Docker environment for that

Remarks

  • time used for task: ~2.5 days
  • no prior Go experience