Opinion Last year, I wrote a piece here on El Reg about being murdered by ChatGPT as an illustration of the potential harms through the misuse of large language models and other forms of AI.

Since then, I have spoken at events across the globe on the ethical development and use of artificial intelligence – while still waiting for OpenAI to respond to my legal demands in relation to what I’ve alleged is the unlawful processing of my personal data in the training of their GPT models.

In my earlier article, and my cease-and-desist letter to OpenAI, I stated that such models should be deleted.

Essentially, global technology corporations have decided, rightly or wrongly, the law can be ignored in their pursuit of wealth and power.

Household names and startups have, and still are, scraping the internet and media to train their models, typically without paying for it and while arguing they are doing nothing wrong. Unsurprisingly, a number of them have been fined or are settling out of court after being accused of breaking rules covering not just copyright but also online safety, privacy, and data protection. Big Tech has brought private litigation and watchdog scrutiny upon it, and potentially engendered new laws to fill in any regulatory gaps.

But for them, it’s just a cost of business.

Another way forward

There’s a principle in the legal world, in America at least, known as the “fruit of the poisonous tree,” in which evidence is inadmissible if it was illegally obtained, simply put. That evidence cannot be used to an advantage. A similar line of thinking could apply to AI systems; illegally built LLMs perhaps ought to be deleted.

Machine-learning companies are harvesting fruit from their poisonous trees, gorging themselves on those fruits, getting fat from them, and using their seeds to plant yet more poisonous trees.

After careful consideration over the time between my previous piece here on El Reg and now, I have come to a different opinion with regards to the deletion of these fruits, however. Not because I believe I was wrong, but because of moral and ethical considerations due to the potential environmental impact.

Research by RISE, a Swedish state owned research institute, states that OpenAI’s GPT-4 was trained with 1.7 trillion parameters using 13 trillion tokens, using 25,000 NVidia A100 GPUs costing $100 million and taking 100 days and using a whopping 50GWh of energy. That is a lot of energy; it’s roughly the equivalent power use of 4,500 homes over the same period.

From a carbon emissions perspective, RICE says that such training (if trained in northern Sweden’s more environmentally friendly datacenters) is the equivalent of driving an average combustion-engine car around the Earth 300 times; if trained elsewhere, such as Germany, that impact increases 30 fold. And that’s just one LLM version.

In light of this information, I am forced to reconcile the ethical impact on the environment should such models be deleted under the “fruit of the poisonous tree” doctrine, and it is not something that can be reconciled as the environmental cost is too significant, in my view.

So what can we do to ensure those who scrape the Web for commercial gain (in the case of training AI models) do not profit, do not gain an economic advantage, from such controversial activities? And furthermore, if disgorgement (through deletion) is not viable due to the consideration given above, how can we incentivize companies to treat people’s privacy and creative work with respect as well as being in line with the law when developing products and services?

After all, if there is no meaningful consequence – as stated, today’s monetary penalties are merely line items for these companies, which have more wealth than some nations, and as such are ineffectual as a deterrent – we will continue to see this behavior repeated ad infinitum which simply maintains the status quo and makes a mockery of the rule of law.

Get their attention

It seems to me the only obvious solution here is to remove these models from the control of executives and put them into the public domain. Given they were trained on our data, it makes sense that it should be public commons – that way we all benefit from the processing of our data and the companies, particularly those found to have broken the law, see no benefit. The balance is returned, and we have a meaningful deterrent against those who seek to ignore their obligations to society.

Under this solution, OpenAI, if found to have broken the law, would be forced to put its GPT models in the public domain and even banned from selling any services related to those models. This would result in a significant cost to OpenAI and its backers, which have spent billions developing these models and associated services. They would face a much higher risk of not being able to recover these costs through revenues, which in turn would force them to do more due diligence with regards to their legal obligations.

If we then extend this model to online platforms that sell their users’ data to companies such as OpenAI – where they are banned from providing such access with the threat of disgorgement – they would a

 » …
Read More