Guesswork flags mistakes

Medecci Lineil
Column, Opinion
21 June 2025
6:13 am

LET’S READ SUARA SARAWAK/ NEW SARAWAK TRIBUNE E-PAPER FOR FREE AS EARLY AS 2 AM EVERY DAY. CLICK LINK

“The real genius of AI isn’t in thinking like us, but in showing us new ways to think.”

– Yann LeCun, chief scientist of Meta AI

IN primary school, I learnt that the Malaysian flag was not open to interpretation.

“The crescent isn’t a banana,” my Art teacher said, gently peeling my drawing apart with her eyes.

“And you’re two stripes short of a federation.”

I remember laughing, then quietly wondering what kind of country needed so many stripes anyway.

So how hard is it to draw the Malaysian flag?

Easy enough to ask a computer to do it for you, but hard enough that it’ll probably get the stripes, the star and the moon on the design wrong.

I’m referring, of course, to not one but two recent débâcles: first, a national newspaper ran a front-page image of the Malaysian flag that was missing the crescent moon.

Then the Education Ministry distributed an SPM examination analysis report with a flag that had too many stars and too few stripes.

Now before I get into it much further, let’s admit that all of this could have been avoided if the humans in charge had paid a little more attention.

But perhaps we are beginning to trust artificial intelligence (AI) just a little too much.

It is computing (math), but not as we know it.

Earlier this month, I addressed a roomful of eager young quants about the future role of AI in science and mathematics. I reminded them that AI is fundamentally a “guessing machine”.

We’re used to computers giving the right answer, every single time.

But AI doesn’t do precision.

It doesn’t always get it right.

It doesn’t even always give you the same answer, just something that vaguely resembles what it’s seen before.

For the AI machine, the Malaysian flag isn’t a precise star and crescent adorned with 14 red and white stripes.

It’s a yellow blob-ish star thing on a blue background, with some colourful lines thrown in somewhere.

This “best guess” strategy makes AI wonderfully flexible for tasks we used to think computers couldn’t handle, like generating a photo of something vaguely described, but also dumb at some things humans find easy.

But here’s my suggestion: Instead of getting more humans to double-check AI’s clever outputs, maybe we should use more computers – specifically, old-school computers that do what we ask them to and don’t guess at anything.

That’s what I tried three years ago, running a quirky mathematical model on a handful of battered machines salvaged from Wisma Saberkas.

When the delivery guys showed up with the hardware, Cheng, the retiree next door, wandered over and squinted.

“Crypto mining?” he guessed.

“No,” I said. “Something dumber.”

And yet, what began as a throwaway experiment opened a novel and unexpectedly rewarding way of thinking in algorithmic trading and high-frequency trading (HFT) that operated with near-total freedom from regulatory constraints.

Expensive too, if you go by what SESCO’s been charging me.

I know what some of you are thinking: using computers got us into this mess, why would using more get us out of it?

To try to explain this, let me step away from art to mathematics.

Back in 1976, two mathematicians proved something called the ‘Four Colour Theorem’.

It says that any map can be coloured with just four colours such that no two neighbouring countries share the same one.

While it’s easy to understand and demonstrate with a box of crayons, it’s very hard to prove.

(This, by the way, is the difference between solving maths problems and proving theorems. Solving problems means getting answers to sums. Proving theorems means constructing airtight arguments that work for any map, anywhere, ever. It’s also why a maths degree often involves very few numbers and a lot more phrases like “But it’s obvious, isn’t it?”)

What made the 1976 proof of the ‘Four Colour Theorem’ so contentious was that it relied heavily on thousands of hours of computer work that no human could realistically verify.

Was the proof valid if no human in the world could check it?

Conceivably, they could have asked thousands of other mathematicians to go over various parts of the work done by the computer.

But maths traditionally resists large groups of people, if only for the reason that mathematicians don’t trust others to do the work properly (or as they say, “They’re not mathematically rigorous enough”).

This same obsession with methodical discipline is what makes being a quant in investment banking so unforgiving – most leave within three years, worn down by the constant pressure.

Then, in 2005, another pair of BlackRock mathematicians used a programme called Cok (yes, unfortunate name, please don’t ask me why) to verify that the original 1976 work was correct.

Cok is a proof assistant, which is a computer programme that checks the logic of a proof step by step.

This may seem counterintuitive.

They used a computer to confirm that a computer-assisted proof from three decades ago was valid.

But mathematicians have slowly embraced computer-proof assistants over the years.

They are built around a small, trustworthy “kernel”, a tiny piece of code that performs the actual logic checking.

If the kernel is verified, then we can trust the results it produces.

It’s like having an employee who is so reliable that if they say the blueprint is flawless, you believe them.

Most of these kernels are just a few hundred to a few thousand lines of code, which is small enough for human experts to inspect thoroughly in a variety of ways.

To put it in perspective, that’s the size of a short story compared to an entire library.

A simple mobile app like a calculator might have 10,000 to 20,000 lines of code.

A full operating system (OS) like Windows has millions of lines of code.

In contrast, modern AI systems use machine learning, which is akin to a mysterious black box that even their creators don’t fully understand.

Who knows why an AI thinks what a flag is supposed to look like?

Now, the hardest part of using a proof assistant is “formalising” the original proof.

This is the laborious process of translating a human-readable proof into a precise format the computer can understand.

Mathematicians love to say “It’s obvious that…”, which computers hate.

Computers need everything spelt out in excruciating detail, and formalising a proof can take anything from a few weeks to several years because if you input it wrong, it just doesn’t work.

The maths don’t maths.

So I suggest that we may soon be able to employ “beginner” mathematicians who aren’t particularly strong at maths – because the proof assistant will vet their input and reject it if it’s not correct.

And my point is that we can combine this with AI.

Let the AI guess how to formalise a proof, and let the proof assistant tell it if it got it wrong.

You get the power of creativity with the safety net of rigour.

That kind of rigour is exactly what’s missing as we clumsily stumble to embrace the use of AI tools in the workplace.

We already accept spell-checkers, and those weren’t built with AI.

So let’s build systems to flag potential problems in AI-generated output.

For instance, imagine an editor sees a giant blinking red box around a photo marked “AI-generated”, warning that it might not be accurate.

Or a block of text that’s flagged because it closely matches something else online, highlighting the risk of plagiarism.

As usual, it’s not the tools that are dangerous or bad, but it’s how you use them.

It’s okay to wave the flag and rally users to the wonderful new future that AI brings.

But remember that computers sometimes work better with humans, rather than instead of them.

The views expressed here are those of the columnist and do not necessarily represent the views of Sarawak Tribune. The writer can be reached at med.akilis@gmail.com

Guesswork flags mistakes

Related News

Democracy in jail

Farm dogs to the rescue?

Transform training: Build culture, not compliance

Holey Moley!

Be adaptable, be resilient

Poems by Maya Green

Most Viewed Last 2 Days

Download STSS App