Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using machine translation #68

Open
Bloke opened this issue Mar 5, 2025 · 11 comments
Open

Using machine translation #68

Bloke opened this issue Mar 5, 2025 · 11 comments

Comments

@Bloke
Copy link
Member

Bloke commented Mar 5, 2025

As much as I despise the splurge of appalling so-called "AI" tools, is there any mileage in using the machine translation engines available in CrowdIn for doing pre-translations that can be checked and tweaked by human translators? Or is there too much scope for the machine tools getting it wrong and providing bad or potentially embarrassing out-of-context translations?

Just wondering if this would give us a leg-up to completing some of the pophelp packs, since it may seem a daunting task to create from scratch or from a sparsely-populated pack, but might seem less daunting if it just needs a sweep to check for accuracy and then tweak. Assuming of course, the tools are freely available. If there's cost involved, it's probably not worth it.

To be clear, I would only (potentially) advocate this usage on strings where no translation has been provided yet. No way would I want a machine tinkering with what we've already translated.

The setup wizard allows the tool(s) to "learn" context from other strings in the project at varying levels of granularity, but I'm not sure how accurate it would be.

Not sure also on the efficacy of using the tool(s) for individual textpack strings. The context there might be more difficult for a machine to ascertain.

Thoughts?

@philwareham
Copy link
Member

The Crowdin AI tools are a bit hit-or-miss. For some languages it works great, others not so much (really depends on how many users have inout data in that language across the whole system it seems). I guess you could use something like ChatGPT for better translations maybe?

@Bloke
Copy link
Member Author

Bloke commented Mar 5, 2025

Okay, good to know, thanks.

I'm not fussed about using something as invasive as ChatGPT. If it handled more languages, I'd nearly always fall back on DeepL because it seems to make a better job of translations (from rudimentary testing) than Google translate, etc.

Just wondered if the built-in modules offered inside CrowdIn were worth pursuing. If they're not much cop, happy to leave it.

@philwareham
Copy link
Member

philwareham commented Mar 5, 2025

Oh, I see there is an AI section now (in addition to the ML they had before, which I based my comment on). Let me investigate, will get back to you shortly.

@philwareham
Copy link
Member

OK, so there are a few providers, any preference? This also depends on what the cost is for each service.

Image

@Bloke
Copy link
Member Author

Bloke commented Mar 5, 2025

OK, so there are a few providers, any preference?

The cheapest/free-est ;)

I don't have any preference whatsoever. I have practically zero experience with any of them.

@philwareham
Copy link
Member

Hmmm OK, I will add $20 to OpenAI and see what it spits out (not saying I'll use it past that initial test). I can get the money back from our Open Collective funds.

@Bloke
Copy link
Member Author

Bloke commented Mar 5, 2025

Nice one. Gotta be worth a punt.

@philwareham
Copy link
Member

OK, I used the OpenAI module through Crowdin complete the Spanish pophelp, which was about 60% completed previously (see this commit for reference 5178f44). That cost around £2.50 in total.

So it's up to you, the translations seem to look good based on my limited knowledge of Spanish and checking that HTML and things like that are properly formed. We can apply this to some other languages ad-hoc or splurge on getting all the languages into a level of completeness.

@Bloke
Copy link
Member Author

Bloke commented Mar 5, 2025

I know almost no Spanish but it seems reasonable as a starting point. Nice one.

Ad-hoc is fine by me. Maybe if we can enlist someone who speaks Spanish to check if the translations are accurate enough, it might give us confidence to proceed with others.

@philwareham
Copy link
Member

philwareham commented Mar 5, 2025

I've done Spanish, German and French. That has used up the budget I put in now.

They have been put into the dev repo of Textpattern, so depending on what feedback you get from the beta2 will determine whether we invest more funds into the translations for other languages.

@Bloke
Copy link
Member Author

Bloke commented Mar 5, 2025

Sweet, thank you. That's a good call about using beta 2 as a casting call for people to verify the machine output... which was trained by people(!)

Top work, thank you for jumping on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants