104 lines
4.4 KiB
Markdown
104 lines
4.4 KiB
Markdown
## Workflow approach taken
|
|
|
|
1. get a minimal version of a handler working that responds with unsorted and undeduplicated JSON lists, no go-routines
|
|
2. add asynchronous fetching of the URLs for performance
|
|
3. ensure the 500ms timeout is met
|
|
4. implement the sorting algorithm
|
|
5. review manually for undefined behavior, memory leaks, possible security vulnerabilities and consistent error handling
|
|
6. write tests
|
|
|
|
## Retrieving of URLs and timing guarantee
|
|
|
|
URLs are retrieved asynchronously for performance reasons. Then
|
|
the handler enters an aggregation/sort loop in order to gather as much
|
|
data as is already available and process it immediately for sorting.
|
|
This is repeated as long as there is still potential data to be discovered
|
|
(as in: not all URLs fetched).
|
|
If the timeout is reached at any point (while aggregating data or sorting)
|
|
the handler returns immediately with the sorted data it has at that point.
|
|
|
|
That means that sorting is potentially done multiple times. That decision
|
|
assumes that the highest time fluctuation comes from the HTTP requests
|
|
and not from the sorting algorithm. It also helps with stopping early
|
|
and further hardens the 500ms timeout, because we already
|
|
have at least parts of the results sorted instead of none.
|
|
|
|
This rules out the case that we aggregate all data close before the timeout,
|
|
but fail to sort the huge list within the rest of the timeframe and exceed it.
|
|
|
|
This might be tweaked, changed or optimized in case there is more
|
|
information on the data that the backend regularly processes or has
|
|
to expect.
|
|
|
|
## Sorting/Merging of the numbers
|
|
|
|
Initially I wanted to write a lot of custom code that would
|
|
switch the algorithm depending on the endpoint used, as in: only do expensive
|
|
sorting when `/random` was retrieved and otherwise assume sorted lists
|
|
that can easily be merged and deduplicated.
|
|
|
|
However, the actual endpoints are not in the task description and the
|
|
example testserver is not a specification of the expected/valid
|
|
endpoints and the properties of their responses.
|
|
|
|
Instead I use an iterative version of bottom-up mergesort. The reasons being:
|
|
|
|
* relatively easy to implement
|
|
* because it's iterative, it doesn't potentially blow up the stack
|
|
* faster on partially sorted lists than quicksort
|
|
* better worst case time complexity than quicksort
|
|
|
|
Parallelizing the merge sort algorithm can be considered, but isn't
|
|
implemented, because there is no hard evidence that it is particularly useful
|
|
for the expected input, nor is it expressed in the task.
|
|
|
|
The sorting procedure also stops early when the input timeout channel is
|
|
triggered. Apart from a maybe small performance benefit, this is mainly done to
|
|
avoid potential DoS attacks with overly huge inputs
|
|
that get fetched within the 500ms timeframe. This is necessary, because
|
|
the handler cannot kill the "sort go-routine", which would
|
|
continue to run even after the handler has responded and consume CPU and
|
|
memory. This is very basic and might be extended or removed in the future.
|
|
Up for discussion.
|
|
|
|
## Error handling
|
|
|
|
If URL parameters are missing, then HTTP code 400 is issued,
|
|
instead of an empty 200 response. This kinda hijacks the HTTP protocol,
|
|
but allows to signal wrong use of backend API. Some Google APIs also seem
|
|
to use this method.
|
|
|
|
No package level error variables are used since the specific
|
|
errors in `getNumbers` are irrelevant for the handler to proceed.
|
|
Instead they will just be printed and the response
|
|
integer list will always be set to `nil` on any such error.
|
|
|
|
The task does not exactly say how to handle the case of a single
|
|
URL responding with garbage. It is assumed that it will be ignored
|
|
just like URLs that take too long to respond.
|
|
|
|
## JSON parsing/response
|
|
|
|
There are a few inconsistencies in the task and the testserver given about
|
|
the exact form of the JSON input and output.
|
|
|
|
I decided that the input has to have an uppercase 'Numbers' key and
|
|
the response of the handler as well.
|
|
|
|
The JSON parser isn't overly strict. In case I was allowed to use
|
|
3rd party packages, I'd choose a parser-combinator library to stricten
|
|
the allowed input (e.g. only one Numbers key allowed, no garbage at the end of the input etc.).
|
|
|
|
## TODO
|
|
|
|
* make 'handler' function more slim (could be deduplicated)
|
|
* could use contex.Context for more abstraction
|
|
* more general types in some cases
|
|
* proper mocking for the handler (no 3rd party libs allowed)
|
|
- maybe use a Docker environment for that
|
|
|
|
## Remarks
|
|
|
|
* time used for task: ~2.5 days
|
|
* no prior Go experience
|