Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unstable API #13

Closed
jrtberlin opened this issue Jan 30, 2020 · 10 comments
Closed

Unstable API #13

jrtberlin opened this issue Jan 30, 2020 · 10 comments

Comments

@jrtberlin
Copy link
Contributor

Sometimes departures can not query the departures and responds with:

Could not query departures
Error: json: cannot unmarshal object into Go value of type []main.result

I looked up with curl what the actual error is:

{"error":true,"msg":"HAFAS error: Service Unavailable"}

In my opinion departures should be a bit clearer what the issue is e.g. HAFAS error: Service Unavailable and/or use a more stable API (I'm unsure how feasible it is to switch the API)

@noxer
Copy link
Owner

noxer commented Jan 30, 2020

What would be your way of dealing with it, should we wait a little longer and retry or give up?

@jrtberlin
Copy link
Contributor Author

I would probably ask the API maintainer if that is a rate limit, a server capacity issue or HAFAS beeing HAFAS.
If the issue is the software quality of HAFAS we should increase the default retries. In my opinion, waiting two seconds longer is a better default config than getting an error.
If it's a rate limit or similar the project could look into hosting a dedicated API server or accessing the source directly.

@noxer
Copy link
Owner

noxer commented Jan 30, 2020

Okay, so this issue is two fold. Detect the HAFAS errors and find a better strategy of dealing with them.

@noxer
Copy link
Owner

noxer commented Jan 30, 2020

Does tagging @derhuerst work?

@derhuerst
Copy link

Does tagging @derhuerst work?

It does! 🎉

There are three different aspects here:

  • The *.transport.rest APIs return any error as {"error": true, "msg": "…"}. You would have to handle this in your client.
  • In the case described above "HAFAS error: Service unavailable", the upstream API, which 2.bvg.transport.rest wraps, returns errors. In most of the cases, this is not caused by an invalid request to it, but simply because it's down; We can't really do anything about it.
  • That aside, it would be nice to let *.transport.rest expose more properties of the error via JSON, e.g. isHafasError. The current error-handling code is very bare-bones.

If the issue is the software quality of HAFAS we should increase the default retries.

You could do that. On the other hand, I've wanted to add this to hafas-rest-api itself for a while now. Retrying on the server side yields more appropriate retrying schemes (e.g. when many people send requests and all of HAFAS is down) and better latencies.

If it's a rate limit or similar the project could look into hosting a dedicated API server [...].

That is yet another improvement on my todo list: Access HAFAS via a range of IP addresses, to prevent rate-limiting. Let me know if you can help with this.

If it's a rate limit or similar the project could look into [...] accessing the source directly.

Unfortunately that's almost impossible for now:

  • HAFAS is a huge piece of software that does many things; it's being sold as a tightly integrated all-in-one solution. AFAIK, both legally & technically, it's not meant to be extended or partially replaced by other systems.
  • The underlying public transportation data is often copyright-protected and exclusively handed over to HaCon (they write & run HAFAS) for the purpose of running the API.

@noxer
Copy link
Owner

noxer commented Feb 4, 2020

It does! 🎉

🎉

The *.transport.rest APIs return any error as {"error": true, "msg": "…"}. You would have to handle this in your client.

We should implement this either way. Even if transport.rest handles retries, it may still encounter a situation where HAFAS just doesn't want to respond.

In the case described above "HAFAS error: Service unavailable", the upstream API, which 2.bvg.transport.rest wraps, returns errors. In most of the cases, this is not caused by an invalid request to it, but simply because it's down; We can't really do anything about it.

Do you log those errors? Maybe we can find a pattern which allows us to better time the retries.

That aside, it would be nice to let *.transport.rest expose more properties of the error via JSON, e.g. isHafasError. The current error-handling code is very bare-bones.

I'd like that :)

You could do that. On the other hand, I've wanted to add this to hafas-rest-api itself for a while now. Retrying on the server side yields more appropriate retrying schemes (e.g. when many people send requests and all of HAFAS is down) and better latencies.

This makes a lot of sense. This could also prevent departures from DDOSing your API when the requests fail. To prevent DDOS you could also add a field "retry" to the error response which indicates how many seconds to wait before a retry should be attempted (or a negative number/zero to block retries entirely).

That is yet another improvement on my todo list: Access HAFAS via a range of IP addresses, to prevent rate-limiting. Let me know if you can help with this.

Sure, we could set up a number of relays for the API, we just need to create a protocol for that.

Unfortunately that's almost impossible for now:

I think what he meant is to circumvent transport.rest and access HAFAS directly. That would be a lot of work but may prevent rate limiting issues. It doesn't solve the underlying issue, though. I would push that faaaaar back on the project plan.

Thank you for your answers.

  • Tim

@derhuerst
Copy link

In the case described above "HAFAS error: Service unavailable", the upstream API, which 2.bvg.transport.rest wraps, returns errors. In most of the cases, this is not caused by an invalid request to it, but simply because it's down; We can't really do anything about it.

Do you log those errors? Maybe we can find a pattern which allows us to better time the retries.

I do keep those logs. I'been planning to publish them somewhere for a while now, but haven't managed to so far.

Not sure how well you can time them. The downtimes of HAFA APIs seem pretty much random.

That aside, it would be nice to let *.transport.rest expose more properties of the error via JSON, e.g. isHafasError.

I'd like that :)

PR welcome!

To prevent DDOS you could also add a field "retry" to the error response which indicates how many seconds to wait before a retry should be attempted (or a negative number/zero to block retries entirely).

I could, but this just has an advisory function, and most people won't obey it. I like the idea of the API being able to handle this much more. Let's see.

Sure, we could set up a number of relays for the API, we just need to create a protocol for that.

I already did (read from derhuerst/vbb-rest#29 (comment) onwards), but the setup was overly complex and annoying.

Now it's using Caddy, using a.2.bvg.transport.rest as the main upstream and b.2.bvg.transport.rest as the fallback. But people often forget to renew their SSL certs & update bvg-rest from Git.

There's two more options I want to evaluate though: making use of the IPv6 address pool of the VPS it's currently running on, and SOCKS proxies. Both have the advantage that they don't increase the maintenance effort much, in contrast to the HTTP-based load balancing setup.

Let me know if you want to help, via doing PRs or by running the VPS.

I think what he meant is to circumvent transport.rest and access HAFAS directly. That would be a lot of work but may prevent rate limiting issues. It doesn't solve the underlying issue, though. I would push that faaaaar back on the project plan.

You can always spawn bvg-hafas/hafas-client as a child process and call it via JSON-RPC. But the idea of *.transport.rest is that it should be convenient to access public transport data.

@derhuerst
Copy link

I have opened a thread in the transport.rest meta-repo to discuss hosting further: public-transport/transport.rest#2

@derhuerst
Copy link

An update:

  • 2.vbb.transport.rest has been shut off for ~6 months now.
  • I have just deprecated 3.vbb.transport.rest.
  • Please use v5.vbb.transport.rest. It has built-in caching, which might help with the underlying HAFAS endpoint being unreliable.

@noxer
Copy link
Owner

noxer commented Oct 28, 2020

This should be solved now. The application uses v5 and I've seen no problems since.

@noxer noxer closed this as completed Oct 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants