Word2Color

Word2Color is a Kaggle notebook experiment (on the back of this amazing course) that takes a text and extracts colours from it. Extracted colours can be neatly visualised and the user can learn more about what these colours mean in a particular book, in different cultures or from the colour theory point of view.

🛠️ Hey Kaggle x Google folks — I know the capstone asks for “code snippets in the blog.” But I didn’t want to paste function blocks out of context - notebook has all of it with clarifications. This blog is about the personal why and some of the what if.

I built this experiment because I like to read. Then, sometimes, I like to watch Youtube videos about hidden messages in the book I’ve just read. And then I use Google Scholar to find the papers that go extremely deep on particular topics in the book or related to the book.

Try it - it’s great fun! Maybe you know already that Dorothy’s shoes in ‘The Wonderful Wizard of Oz‘ book are actually silver, and not red, as in the movie, but do you know that ‘The Wizard of Oz‘ is a legit experimentation technique in many science fields? Now you do.

Eternal texts for now & the future

What if in 2030 I could have all these explorations from one place, effortlessly? See what ‘viridian‘ or ‘tyrian‘ colours do look like? Learn why there is so much blue in ‘To the lighthouse‘? Explore why ‘The Wonderful Wizard of Oz‘ is inherently political?

As it’s 2025, and I only have so much time, for my primer experiment I’ve selected colours and their symbolism as the playground for me & friendly GenAI. I envisioned the solution that ‘magically‘ extracts colours from the text, visualises them and then can have a bit of a conversation outlining what these colours mean.

The experiment is built as a multi-step GenAI pipeline, powered by Google Gemini family of GenAI models, including several stages visualised in the illustrations below:

  • Extraction agent starts with text chunking (the text is fed to the model in chunks of dozens of sentences for context and token limits) and few-shot (means I provide model with examples) prompting employed for colour extraction in structured JSON (like a set of revision cards). A very basic re-prompting is used when the model doesn’t provide a valid output.

  • Evaluation agent then employed for grading literal vs. figurative use

  • Embeddings model used to enable semantic colour matching, if the exact one is not available

  • Function-calling agent employed for analysis, visualisation and documented queries (called RAG)

Divide & Conquer

Have you ever heard about how multitasking is bad for you? Apparently, it’s bad for AI too - I initially started with a single prompt asking the model to give me the list of colours with “DO’s“ and “DONT’s“, yet even on smaller texts the model either became hyper-vigilant, giving me all the adjectives (high, left, dirty, etc) or meticulously documented lack of thereof (“none“ as if “I looked here and there is no colour attached to it“).

Newest models did a much better job, though, yet using them required $$$ and also didn’t sit right with my understanding about the complexity of the task, so I’ve been set on finding the way to make the “-lite“ family of Google models do their best for my case.

The solution was in employing two models, with one being dead set on identifying anything that looked like a colour and a second one grading the results of the first, filtering out the idioms (“out of the blue“) or product names (“Red Panda restaurants“) or names (“Mr.Green“). The evaluation model is provided not only the word extracted, but the exact sentence to make their decision.

”From where do I know you?” or semantic search

After a long list of colours is extracted, the system attempts to match the word to the hex numbers. Exact matching happens when the name of the colour is the same as the colour in the database: “red“ author’s colour = “red“ database colour = #FF0000 in hex notation.

Semantic matching happens when there is no exact match and we need to involve a specialised AI (embeddings model) . The results here would be influenced by the training data that model had (most basic consideration is we want to know if the model supports the languages we need to semantically search). Semantic search works because this specialised model saw a lot of texts and “remembers” (don’t come at me for oversimplifying all things transformers, it’s an artist blog, I’m a data professional in other parts of the Internet) that some things are actually “the same“ or “connected“, provided enough content.

So, in case of Google text-embedding-004, for “tyrian“ the model suggests that semantically (means the model saw it in similar contexts) this word is a lot like “purpley“ which is #66023b, sourced from XKCD Color Survey. These words being “a lot like“ each other is an important distinction - we can only be sure if the match is exact, yet semantic search is a good alternative.

Determined on non-determinism

But wait—how “exact” is the answer, really?

Let’s say I ask: “What color is tyrian?”
Go to Google Images. Look at the results.
What do you see?

For me, it’s a perfect example of a non-deterministic outcome. Depending on who I ask—or even which link I click—the resulting hex value varies. Sometimes it's rich wine, other times a deep magenta. Even when I ask the same question, the answer isn't fixed.

There are multiple opinions on what “tyrian” is.
And that’s the point.

Out of 16 million possible colours in the RGB spectrum, we - humanity - have only managed to collectively agree on maybe 140-ish standard names ( W3C CSS colours, basically designed for web design). The rest? They live in the margins—uncertain, debatable, interpretive.

Just like human language.
Just like literature.
Just like GenAI.

This is what drives my fascination with generative models—as an artist, and as someone who builds systems. Colour matching becomes a metaphor for working with LLMs. When I ask, “What colour is tyrian?”, the model doesn’t give me a universal truth. It gives me a spectrum of meaning—deep purple, red-violet, something between ink and wine.

And evaluating that response? That’s not the hard part. In the case of colour, different embedding models, different datasets, different cultural references—each one can offer a valid, if incomplete, slice of understanding.

The hard part is agreeing—with fellow humans—on which parts of that nuance we want to share, standardise, or elevate. Like the 140 colours we’ve named, out of 16 million possibilities.

It’s not much.But it’s enough to build with. Enough to design websites, write code, share creative work without debating what “salmon” or “aqua” means every time.

Agreement, even if limited, creates shared language. And shared language lets us move forward.

And that? That’s worth building for.

Shortcuts & Limitations

OK, now back to reality! As I targeted this “enhanced reading” vision - taking a text, extracting colours, visualising them, analysing them and having a bit of a meta-conversation (“What is the symbolic meaning of the most popular colour in ‘The Wonderful Wizard of Oz‘?“), some of the interesting tech was de-scoped:

  • Multiple extractions = more coverage. Running the model several times could capture ~5–10% more unique colours just by concatenating and deduplicating results.

  • Evaluation context is short. Currently, one sentence is used to grade extractions—more context = better nuance when evaluating.

  • No multilingual support (yet). Everything is English-based, but this limits cultural and historical nuance I can extract and also limits the texts I can analyse. While some early checks suggest that evaluation and extraction models could easily stay the same, I need to explore ones that supports multiple languages.

  • Wikipedia/Google search. It would've helped ground more conversations, but didn’t make it into this version (mainly because I got distracted by a related toy - scroll to the bottom of the post).

  • Context caching. I expect full text loaded into cache to increase model accuracy, with or without additional runs. It would be also interesting to explore how evaluation could be different with full context available to the model, cached.

Why It Matters

The essence of books may remain black and white, but our understanding of them doesn’t have to. Word2Color explores how a single layer of deeper book understanding (colours) can be extracted and then used to not only see it in a new colour, but enable deeper understanding of the story through historical, social, symbolical and cultural perspectives.

In 2030, I imagine readers tracing emotional color arcs of characters, seeing theme shifts in visual time, toggling between literary symbolism and historical meaning, having AI co-readers to explain metaphor, context, and contradiction.

This notebook is just one tiny step in that direction.

Bonus Toy: Textint.online

The core colour-matching logic has also been released as a toy: Type any word, get a colour. While my Kaggle notebook goes into trouble of searching for colours and only them matching them with a hue (and then having educated conversations about these colours), I’ve had an urge to see what would happen if any word could be matched to a colour. The answer: it’s a bit weird, but beautiful too!

Want to see what melancholy, triumph, or bureaucracy look like? Go play.

Next
Next

Typocut