Initial commit

This commit is contained in:
Julian Ospald 2017-08-30 00:27:42 +02:00
commit c76b5266e2
8 changed files with 769 additions and 0 deletions

1
.gitignore vendored Normal file
View File

@ -0,0 +1 @@

88
go-challenge.txt Normal file
View File

@ -0,0 +1,88 @@
travel audience Go challenge
============================
Task
----
Write an HTTP service that exposes an endpoint "/numbers". This endpoint receives a list of URLs
though GET query parameters. The parameter in use is called "u". It can appear
more than once.
http://yourserver:8080/numbers?u=http://example.com/primes&u=http://foobar.com/fibo
When the /numbers is called, your service shall retrieve each of these URLs if
they turn out to be syntactically valid URLs. Each URL will return a JSON data
structure that looks like this:
{ "numbers": [ 1, 2, 3, 5, 8, 13 ] }
The JSON data structure will contain an object with a key named "numbers", and
a value that is a list of integers. After retrieving each of these URLs, the
service shall merge the integers coming from all URLs, sort them in ascending
order, and make sure that each integer only appears once in the result. The
endpoint shall then return a JSON data structure like in the example above with
the result as the list of integers.
The endpoint needs to return the result as quickly as possible, but always
within 500 milliseconds. It needs to be able to deal with error conditions when
retrieving the URLs. If a URL takes too long to respond, it must be ignored. It
is valid to return an empty list as result only if all URLs returned errors or
took too long to respond.
Example
-------
The service receives an HTTP request:
>>> GET /numbers?u=http://example.com/primes&u=http://foobar.com/fibo HTTP/1.0
It then retrieves the URLs specified as parameters.
The first URL returns this response:
>>> GET /primes HTTP/1.0
>>> Host: example.com
>>>
<<< HTTP/1.0 200 OK
<<< Content-Type: application/json
<<< Content-Length: 34
<<<
<<< { "number": [ 2, 3, 5, 7, 11, 13 ] }
The second URL returns this response:
>>> GET /fibo HTTP/1.0
>>> Host: foobar.com
>>>
<<< HTTP/1.0 200 OK
<<< Content-Type: application/json
<<< Content-Length: 40
<<<
<<< { "number": [ 1, 1, 2, 3, 5, 8, 13, 21 ] }
The service then calculates the result and returns it.
<<< HTTP/1.0 200 OK
<<< Content-Type: application/json
<<< Content-Length: 44
<<<
<<< { "number": [ 1, 2, 3, 5, 7, 8, 11, 13, 21 ] }
Completion Conditions
---------------------
Solve the task described above using Go. Only use what's provided in the Go
standard library. The resulting program must run stand-alone with no other
dependencies than the Go compiler.
Document your source code, both using comments and in a separate text file that
describes the intentions and rationale behind your solution. Also write down
any ambiguities that you see in the task description, and describe you how you
interpreted them and why. If applicable, write automated tests for your code.
For testing purposes, you will be provided with an example server that, when
run, listens on port 8090 and provides the endpoints /primes, /fibo, /odd and
/rand.
Please return your working solution within 7 days of receiving the challenge.

231
src/numbers/numbers.go Normal file
View File

@ -0,0 +1,231 @@
package main
import (
"bytes"
"encoding/json"
"flag"
"fmt"
"io/ioutil"
"log"
"net/http"
"net/url"
"numbers/sort"
"sync"
"time"
)
// Helper struct for JSON decoding.
type Numbers struct {
Numbers []int
}
// The maximum response time of the handlers.
var MaxResponseTime time.Duration = 500 * time.Millisecond
// The main entry point of the backend.
func main() {
listenAddr := flag.String("http.addr", ":8090", "http listen address")
flag.Parse()
http.HandleFunc("/numbers", func(w http.ResponseWriter, r *http.Request) {
numbersHandler(w, r)
})
log.Fatal(http.ListenAndServe(*listenAddr, nil))
}
// The main handler. The expected request is of the form:
// GET /numbers?u=http://example.com/primes&u=http://foobar.com/fibo HTTP/1.0
// The parameter 'u' will be parsed and all urls will be fetched, which must
// return a valid JSON that looks like e.g.:
// { "Numbers": [ 1, 2, 3, 5, 8, 13 ] }
//
// Then these lists are merged, deduplicated and sorted and the response will
// be a JSON of the same form.
// The handler is guaranteed to respond within a timeframe of 500ms. If
// URLs take too long to load or return garbage, they are skipped.
// If all URLs take too long to load or return garbage, an empty JSON list
// is returned.
func numbersHandler(w http.ResponseWriter, r *http.Request) {
// timeout channel for the handler as a whole
timeout := make(chan bool, 1)
go func() {
time.Sleep(MaxResponseTime)
timeout <- true
}()
var rurl []string = r.URL.Query()["u"]
// if no parameters, return 400
if len(rurl) == 0 {
w.Header().Set("Content-Type", "text/html")
w.WriteHeader(http.StatusBadRequest)
w.Write([]byte("Bad request: Missing 'u' parameters"))
return
}
// Non-blocking input channel for URL results.
// We will read as much as we can at once.
inputChan := make(chan []int, len(rurl))
var wg sync.WaitGroup
// fetch all URLs asynchronously
for i := range rurl {
wg.Add(1)
go func(url string) {
defer wg.Done()
n, e := getNumbers(url)
if e == nil {
if n != nil && len(n) > 0 {
inputChan <- n
} else {
log.Printf("Received empty list of numbers from endpoint")
}
} else {
log.Printf("Got an error: %s", e)
}
}(rurl[i])
}
// master routine closing the inputChan
go func() {
wg.Wait()
close(inputChan)
}()
// channel for sorting process, so we can short-circuit in
// case sorting takes too long
sortChan := make(chan []int, 1)
// aggregate numbers from URLs
var numberBuffer []int = []int{}
// these are actually sorted
var sortedNumbers []int = []int{}
// aggregate and sort loop,
// breaks if all URLs have been processed or the timeout
// has been reached
done := false
for done != true {
select {
case <-timeout:
log.Printf("Waiting for URL took too long, finishing response anyway")
finishResponse(w, sortedNumbers)
return
case res, more := <-inputChan:
if more { // still URLs to fetch
numberBuffer = append(numberBuffer, res...)
// continue to aggregate numbers from the buffer
continue
} else { // all URLs fetched, sort and be done
log.Printf("Nothing else to fetch")
done = true
}
// non-blocking branch that sorts what we already have
// we are not done here yet
default:
// only sort if we have new results
if len(numberBuffer) == 0 {
continue
}
}
// sort fallthrough, either the inputChan is currently "empty"
// or we fetched all URLs already
go func(n []int) {
res, err := sort.SortedAndDedup(timeout, n)
if err != nil {
return
}
sortChan <- res
}(append(sortedNumbers, numberBuffer...))
numberBuffer = []int{}
select {
case merged := <-sortChan:
sortedNumbers = merged
case <-timeout:
log.Printf("Sorting took too long, finishing response anyway")
finishResponse(w, sortedNumbers)
return
}
}
log.Printf("Result is complete, finishing response")
finishResponse(w, sortedNumbers)
}
// Finalizes the JSON response with the given numbers. This always
// sends a 200 HTTP status code.
func finishResponse(w http.ResponseWriter, numbers []int) {
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusOK)
json.NewEncoder(w).Encode(map[string]interface{}{"Numbers": numbers})
}
// Gets the numbers from the given url.
// 'resp' is always nil if there was an error. Errors can
// be url parse errors, HTTP response errors, io errors from reading the
// body or json decoding errors.
func getNumbers(rawurl string) (resp []int, err error) {
// validate url
u_err := validateURL(rawurl)
if u_err != nil {
return nil, u_err
}
// retrieve response
r, r_err := http.Get(rawurl)
if r_err != nil {
return nil, r_err
}
if r.StatusCode != 200 {
return nil, fmt.Errorf("HTTP: Status code is not 200, but %d",
r.StatusCode)
}
// open body
defer r.Body.Close()
body, b_err := ioutil.ReadAll(r.Body)
if b_err != nil {
return nil, b_err
}
// parse json
return parseJson(body)
}
// Parse the given raw JSON bytes into a list of numbers. The JSON
// is expected to be of the form:
// {"Numbers": [1,2,5]}
//
// Not particularly strict.
func parseJson(body []byte) (res []int, err error) {
dec := json.NewDecoder(bytes.NewReader(body))
var n Numbers
j_err := dec.Decode(&n)
if j_err != nil {
return nil, j_err
} else {
if n.Numbers == nil {
return nil, fmt.Errorf("JSON: missing key 'Numbers'")
}
return n.Numbers, nil
}
}
// Validate the URL. The URL has to be syntactically valid,
// has to have a scheme of either 'https' or 'http' and a hostname.
// An error is returned on invalid URLs.
func validateURL(rawurl string) error {
u, u_err := url.Parse(rawurl)
if u_err != nil {
return u_err
}
if u.Scheme != "https" && u.Scheme != "http" {
return fmt.Errorf("URL: not a valid HTTP/HTTPS scheme in %s", rawurl)
}
if u.Host == "" {
return fmt.Errorf("URL: not a valid host in %s", rawurl)
}
return nil
}

112
src/numbers/numbers_test.go Normal file
View File

@ -0,0 +1,112 @@
package main
import (
"net/http"
"net/http/httptest"
"testing"
)
// Tests valid and invalid URLs via 'validateURL'.
func TestValidateURL(t *testing.T) {
urls_valid := []string{
"http://www.bar.com",
"https://86.31.3.9.de",
"http://localhost:8080",
"https://baz.org",
}
urls_invalid := []string{
"http:/www.bar.com",
"ftp://86.31.3.9.de",
"localhost:8080",
"ssh://foo.bar",
}
for _, url := range urls_valid {
err := validateURL(url)
if err != nil {
t.Errorf("URL %s invalid\nError was: %s", url, err)
}
}
for _, url := range urls_invalid {
err := validateURL(url)
if err == nil {
t.Errorf("URL %s valid", url)
}
}
}
// Tests specific JSON that is accepted by 'parseJson', e.g.
// {"Numbers": [1,2,5]}
func TestParseJson(t *testing.T) {
validJSON := [][]byte{
[]byte("{\"Numbers\": []}"),
[]byte("{\"Numbers\": [7]}"),
[]byte("{\"Numbers\": [1,2,5]}"),
[]byte("{\"Numbers\" : [1 , 2 ,5]}"),
}
invalidJSON := [][]byte{
[]byte("{\"Numbers\": [}"),
[]byte("\"Numbers\": [7]}"),
[]byte("{\"umbers\": [1,2,5]}"),
[]byte("{\"umbers\": [1,2,5]"),
[]byte("{\"Numbers\" [1,2,5]}"),
}
for _, json := range validJSON {
_, err := parseJson(json)
if err != nil {
t.Errorf("JSON \"%s\" invalid\nError was: %s", json, err)
}
}
for _, json := range invalidJSON {
res, err := parseJson(json)
if err == nil {
t.Errorf("JSON \"%s\" valid\nResult was: %s", json, res)
}
}
}
// Test the actual backend handler. Because we have no mocking framework,
// we just do a few very basic tests.
func TestHandler(t *testing.T) {
// no url parameters => status code 400
{
req, err := http.NewRequest("GET", "/numbers", nil)
if err != nil {
t.Fatal(err)
}
rr := httptest.NewRecorder()
handler := http.HandlerFunc(numbersHandler)
handler.ServeHTTP(rr, req)
if status := rr.Code; status != http.StatusBadRequest {
t.Errorf("Handler returned status code %v, expected %v", status, http.StatusOK)
}
}
// invalid url => empty result, status code 200
{
req, err := http.NewRequest("GET", "/numbers?u=ftp://a.b.c.d.e.f", nil)
if err != nil {
t.Fatal(err)
}
rr := httptest.NewRecorder()
handler := http.HandlerFunc(numbersHandler)
handler.ServeHTTP(rr, req)
if status := rr.Code; status != http.StatusOK {
t.Errorf("Handler returned status code %v, expected %v", status, http.StatusOK)
}
body := rr.Body.String()
expected := "{\"Numbers\":[]}\n"
if body != expected {
t.Errorf("Body not as expected, got %s, expected %s", body, expected)
}
}
}

100
src/numbers/sort/sort.go Normal file
View File

@ -0,0 +1,100 @@
// Sorting algorithms. May also contain deduplication operations.
package sort
import (
"fmt"
)
// Mergesorts and deduplicates the list.
func SortedAndDedup(timeout <-chan bool, list []int) (res []int, err error) {
sorted, err := Mergesort(timeout, list)
if err != nil {
return nil, err
}
deduped := dedupSortedList(sorted)
return deduped, nil
}
// Deduplicate the sorted list and return a new one with a potentially different
// size.
func dedupSortedList(list []int) []int {
newList := []int{}
if len(list) <= 1 {
return list
}
var prev int = list[0]
newList = append(newList, list[0])
for i := 1; i < len(list); i++ {
if prev != list[i] {
newList = append(newList, list[i])
}
prev = list[i]
}
return newList
}
// Mergesorts the given list and returns it as a result. The input list
// is not modified.
// The algorithm is a bottom-up iterative version and not explained
// in detail here.
func Mergesort(timeout <-chan bool, list []int) (res []int, err error) {
newList := append([]int{}, list...)
temp := append([]int{}, list...)
n := len(newList)
for m := 1; m < (n - 1); m = 2 * m {
for i := 0; i < (n - 1); i += 2 * m {
select {
case <-timeout:
return nil, fmt.Errorf("Sorting timed out")
default:
}
from := i
mid := i + m - 1
to := min(i+2*m-1, n-1)
merge(timeout, newList, temp, from, mid, to)
}
}
return newList, nil
}
// The merge part of the mergesort.
func merge(timeout <-chan bool, list []int, temp []int, from int, mid int, to int) {
k := from
i := from
j := mid + 1
for i <= mid && j <= to {
if list[i] < list[j] {
temp[k] = list[i]
i++
} else {
temp[k] = list[j]
j++
}
k++
}
for i <= mid && i < len(temp) {
temp[k] = list[i]
i++
k++
}
for i := from; i <= to; i++ {
list[i] = temp[i]
}
}
// Get the minimum of two integers.
func min(l int, r int) int {
if l < r {
return l
} else {
return r
}
}

View File

@ -0,0 +1,92 @@
package sort
import (
"testing"
)
// Test the mergesort and deduplication with a predefined set of slices.
func TestSortAndDedup(t *testing.T) {
to_sort := [][]int{
{},
{7},
{1, 4, 5, 6, 3, 2},
{1, 2, 3, 4, 5, 6, 7},
{1, 1, 1, 3, 3, 2, 1},
{84, 32, 32, 7, 1, 2, 1},
{1, 3, 5, 5, 7, 8, 10, 17, 19, 24, 27, 34, 76, 1, 1, 2, 3, 5, 8, 13, 21},
{1, 3, 5, 5, 7, 8, 10, 17, 19, 24, 27, 34, 76, 1, 1, 2, 3, 5, 8, 13, 21, 1, 3, 5, 5, 7, 8, 10, 17, 19, 24, 27, 34, 76, 1, 1, 2, 3, 5, 8, 13, 21, 1, 3, 5, 5, 7, 8, 10, 17, 19, 24, 27, 34, 76, 1, 1, 2},
}
result := [][]int{
{},
{7},
{1, 2, 3, 4, 5, 6},
{1, 2, 3, 4, 5, 6, 7},
{1, 2, 3},
{1, 2, 7, 32, 84},
{1, 2, 3, 5, 7, 8, 10, 13, 17, 19, 21, 24, 27, 34, 76},
{1, 2, 3, 5, 7, 8, 10, 13, 17, 19, 21, 24, 27, 34, 76},
}
for i := range to_sort {
sorted, _ := SortedAndDedup(make(chan bool, 1), to_sort[i])
if slice_equal(sorted, result[i]) != true {
t.Errorf("Failure in sorting + dedup, expected %s got %s", result[i], sorted)
}
}
}
// Test the mergesort with a predefined set of slices.
func TestSort(t *testing.T) {
// ok
to_sort := [][]int{
{},
{7},
{1, 4, 5, 6, 3, 2},
{1, 2, 3, 4, 5, 6, 7},
{1, 1, 1, 3, 3, 2, 1},
{84, 32, 32, 7, 1, 2, 1},
{1, 3, 5, 5, 7, 8, 10, 17, 19, 24, 27, 34, 76, 1, 1, 2, 3, 5, 8, 13, 21},
}
result := [][]int{
{},
{7},
{1, 2, 3, 4, 5, 6},
{1, 2, 3, 4, 5, 6, 7},
{1, 1, 1, 1, 2, 3, 3},
{1, 1, 2, 7, 32, 32, 84},
{1, 1, 1, 2, 3, 3, 5, 5, 5, 7, 8, 8, 10, 13, 17, 19, 21, 24, 27, 34, 76},
}
for i := range to_sort {
sorted, _ := Mergesort(make(chan bool, 1), to_sort[i])
if slice_equal(sorted, result[i]) != true {
t.Errorf("Failure in sorting, expected %s got %s", result[i], sorted)
}
}
}
// Helper function to compare int slices for equality.
func slice_equal(s1 []int, s2 []int) bool {
if s1 == nil && s2 == nil {
return true
}
if s1 == nil || s2 == nil {
return false
}
if len(s1) != len(s2) {
return false
}
for i := range s1 {
if s1[i] != s2[i] {
return false
}
}
return true
}

103
src/numbers/thoughts.md Normal file
View File

@ -0,0 +1,103 @@
## Workflow approach taken
1. get a minimal version of a handler working that responds with unsorted and undeduplicated JSON lists, no go-routines
2. add asynchronous fetching of the URLs for performance
3. ensure the 500ms timeout is met
4. implement the sorting algorithm
5. review manually for undefined behavior, memory leaks, possible security vulnerabilities and consistent error handling
6. write tests
## Retrieving of URLs and timing guarantee
URLs are retrieved asynchronously for performance reasons. Then
the handler enters an aggregation/sort loop in order to gather as much
data as is already available and process it immediately for sorting.
This is repeated as long as there is still potential data to be discovered
(as in: not all URLs fetched).
If the timeout is reached at any point (while aggregating data or sorting)
the handler returns immediately with the sorted data it has at that point.
That means that sorting is potentially done multiple times. That decision
assumes that the highest time fluctuation comes from the HTTP requests
and not from the sorting algorithm. It also helps with stopping early
and further hardens the 500ms timeout, because we already
have at least parts of the results sorted instead of none.
This rules out the case that we aggregate all data close before the timeout,
but fail to sort the huge list within the rest of the timeframe and exceed it.
This might be tweaked, changed or optimized in case there is more
information on the data that the backend regularly processes or has
to expect.
## Sorting/Merging of the numbers
Initially I wanted to write a lot of custom code that would
switch the algorithm depending on the endpoint used, as in: only do expensive
sorting when `/random` was retrieved and otherwise assume sorted lists
that can easily be merged and deduplicated.
However, the actual endpoints are not in the task description and the
example testserver is not a specification of the expected/valid
endpoints and the properties of their responses.
Instead I use an iterative version of bottom-up mergesort. The reasons being:
* relatively easy to implement
* because it's iterative, it doesn't potentially blow up the stack
* faster on partially sorted lists than quicksort
* better worst case time complexity than quicksort
Parallelizing the merge sort algorithm can be considered, but isn't
implemented, because there is no hard evidence that it is particularly useful
for the expected input, nor is it expressed in the task.
The sorting procedure also stops early when the input timeout channel is
triggered. Apart from a maybe small performance benefit, this is mainly done to
avoid potential DoS attacks with overly huge inputs
that get fetched within the 500ms timeframe. This is necessary, because
the handler cannot kill the "sort go-routine", which would
continue to run even after the handler has responded and consume CPU and
memory. This is very basic and might be extended or removed in the future.
Up for discussion.
## Error handling
If URL parameters are missing, then HTTP code 400 is issued,
instead of an empty 200 response. This kinda hijacks the HTTP protocol,
but allows to signal wrong use of backend API. Some Google APIs also seem
to use this method.
No package level error variables are used since the specific
errors in `getNumbers` are irrelevant for the handler to proceed.
Instead they will just be printed and the response
integer list will always be set to `nil` on any such error.
The task does not exactly say how to handle the case of a single
URL responding with garbage. It is assumed that it will be ignored
just like URLs that take too long to respond.
## JSON parsing/response
There are a few inconsistencies in the task and the testserver given about
the exact form of the JSON input and output.
I decided that the input has to have an uppercase 'Numbers' key and
the response of the handler as well.
The JSON parser isn't overly strict. In case I was allowed to use
3rd party packages, I'd choose a parser-combinator library to stricten
the allowed input (e.g. only one Numbers key allowed, no garbage at the end of the input etc.).
## TODO
* make 'handler' function more slim (could be deduplicated)
* could use contex.Context for more abstraction
* more general types in some cases
* proper mocking for the handler (no 3rd party libs allowed)
- maybe use a Docker environment for that
## Remarks
* time used for task: ~2.5 days
* no prior Go experience

View File

@ -0,0 +1,42 @@
package main
import (
"encoding/json"
"flag"
"log"
"math/rand"
"net/http"
"time"
)
func main() {
listenAddr := flag.String("http.addr", ":8080", "http listen address")
flag.Parse()
http.HandleFunc("/primes", handler([]int{2, 3, 5, 7, 11, 13}))
http.HandleFunc("/fibo", handler([]int{1, 1, 2, 3, 5, 8, 13, 21}))
http.HandleFunc("/odd", handler([]int{1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23}))
http.HandleFunc("/rand", handler([]int{5, 17, 3, 19, 76, 24, 1, 5, 10, 34, 8, 27, 7}))
log.Fatal(http.ListenAndServe(*listenAddr, nil))
}
func handler(numbers []int) func(http.ResponseWriter, *http.Request) {
return func(w http.ResponseWriter, r *http.Request) {
waitPeriod := rand.Intn(550)
log.Printf("%s: waiting %dms.", r.URL.Path, waitPeriod)
time.Sleep(time.Duration(waitPeriod) * time.Millisecond)
x := rand.Intn(100)
if x < 10 {
http.Error(w, "service unavailable", http.StatusServiceUnavailable)
return
}
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusOK)
json.NewEncoder(w).Encode(map[string]interface{}{"Numbers": numbers})
}
}