go-challenge/src/numbers/thoughts.md

104 lines
4.4 KiB
Markdown

## Workflow approach taken
1. get a minimal version of a handler working that responds with unsorted and undeduplicated JSON lists, no go-routines
2. add asynchronous fetching of the URLs for performance
3. ensure the 500ms timeout is met
4. implement the sorting algorithm
5. review manually for undefined behavior, memory leaks, possible security vulnerabilities and consistent error handling
6. write tests
## Retrieving of URLs and timing guarantee
URLs are retrieved asynchronously for performance reasons. Then
the handler enters an aggregation/sort loop in order to gather as much
data as is already available and process it immediately for sorting.
This is repeated as long as there is still potential data to be discovered
(as in: not all URLs fetched).
If the timeout is reached at any point (while aggregating data or sorting)
the handler returns immediately with the sorted data it has at that point.
That means that sorting is potentially done multiple times. That decision
assumes that the highest time fluctuation comes from the HTTP requests
and not from the sorting algorithm. It also helps with stopping early
and further hardens the 500ms timeout, because we already
have at least parts of the results sorted instead of none.
This rules out the case that we aggregate all data close before the timeout,
but fail to sort the huge list within the rest of the timeframe and exceed it.
This might be tweaked, changed or optimized in case there is more
information on the data that the backend regularly processes or has
to expect.
## Sorting/Merging of the numbers
Initially I wanted to write a lot of custom code that would
switch the algorithm depending on the endpoint used, as in: only do expensive
sorting when `/random` was retrieved and otherwise assume sorted lists
that can easily be merged and deduplicated.
However, the actual endpoints are not in the task description and the
example testserver is not a specification of the expected/valid
endpoints and the properties of their responses.
Instead I use an iterative version of bottom-up mergesort. The reasons being:
* relatively easy to implement
* because it's iterative, it doesn't potentially blow up the stack
* faster on partially sorted lists than quicksort
* better worst case time complexity than quicksort
Parallelizing the merge sort algorithm can be considered, but isn't
implemented, because there is no hard evidence that it is particularly useful
for the expected input, nor is it expressed in the task.
The sorting procedure also stops early when the input timeout channel is
triggered. Apart from a maybe small performance benefit, this is mainly done to
avoid potential DoS attacks with overly huge inputs
that get fetched within the 500ms timeframe. This is necessary, because
the handler cannot kill the "sort go-routine", which would
continue to run even after the handler has responded and consume CPU and
memory. This is very basic and might be extended or removed in the future.
Up for discussion.
## Error handling
If URL parameters are missing, then HTTP code 400 is issued,
instead of an empty 200 response. This kinda hijacks the HTTP protocol,
but allows to signal wrong use of backend API. Some Google APIs also seem
to use this method.
No package level error variables are used since the specific
errors in `getNumbers` are irrelevant for the handler to proceed.
Instead they will just be printed and the response
integer list will always be set to `nil` on any such error.
The task does not exactly say how to handle the case of a single
URL responding with garbage. It is assumed that it will be ignored
just like URLs that take too long to respond.
## JSON parsing/response
There are a few inconsistencies in the task and the testserver given about
the exact form of the JSON input and output.
I decided that the input has to have an uppercase 'Numbers' key and
the response of the handler as well.
The JSON parser isn't overly strict. In case I was allowed to use
3rd party packages, I'd choose a parser-combinator library to stricten
the allowed input (e.g. only one Numbers key allowed, no garbage at the end of the input etc.).
## TODO
* make 'handler' function more slim (could be deduplicated)
* could use contex.Context for more abstraction
* more general types in some cases
* proper mocking for the handler (no 3rd party libs allowed)
- maybe use a Docker environment for that
## Remarks
* time used for task: ~2.5 days
* no prior Go experience