January 19, 2021By Jason Cronquist← Back to Blog

Text Adventure Map From Fantasy Map Using AWS Rekognition

I recently had a renewed interest in working on my text adventure engine. This is by no means a completed project, but it does run well enough to build worlds on it. The idea behind this engine, is to experiment with Machine Learning in a text-adventure environment. My end vision is an adventure game with the flexibility of ai-dungeon but the structure and open worlds of Gemstone.

I would also like the worlds that are generated to be based off the DnD World as a way to help train new Dungeon Masters on the details of the Faerun universe. Recreating Faerun in a Text Adventure however would be time consuming, and I'm already spread too thin over various projects.

After a hackathon at work where we experimented with the various AWS tools we learned about in 2020's 'AWS Reinvent', I had the idea to use AWS Rekognition to build traversable Text Adventure maps from a labeled fantasy map. In theory, the world map could be populated off of a labeled fantasy map with the various towns, villages, and regions. Flavor text can be added to each area in the map from the DnD wiki.

I started with a small subsection of the Faerun Map zoomed in on the Ten Towns region.

AWS Rekognition

Getting image labels from an image was by far the easiest part. I just uploaed the image to S3, then passed the S3 key to Rekognition, and it returned a list of annotations of the various labels it could detect along with their bounding boxes.

Rekognition returns two types of labels, WORDS and LINES. In this case I was interested in the lines as many locations are a combination of several words. Example from the map image I used would be 'Sea of Moving Ice' and 'Icewind Dale'.

Issues With Rekognition

Immediately we can see several shortcomings with the results that will need to be accounted for. The easiest issue to take care of is inconsistency in capitalization. As seen in the Luskan Label, it did a very poor job of identifying upper-case vs lower-case letters. This issue is fixed by calling .captialize() on each label.

Another issue that showed itself is that the LINE objects will sometimes join words that are distinct from one another. Ironmaster and Ten-Towns are not the same place, but AWS Rekognition identified them as such. One idea I had to address this issue, is to zoom in on identified text and run Rekognition again to see if it can split them up better. I haven't tested this at the time of writing.

issue connecting disconnected points — 'Ironmaster Ten Towns'

A related issue to the above is multiline names are not considered connected. Here Icewind Dale is shown as two seperate labels. Both of these issues will require some sort of location based processing to determine if stacked words might be related to the same location in a map.

Spelling mistakes like the extra trailing 's' in Fireshear was expected. This could be remedied with a list of expected words/names and a distance between words algorithm to automatically correct spelling mistakes.

Related to the above, some words were not spaced correctly. The propossed fix for spelling errors would also handle spacing issues.

I've only breifly looked into building my own Image Text Extraction model, but could tell it would be time consuming and difficult. Considering the scope of my project, I decided to accept the issues above and move on to building the traversable graph from the Rekognition annotations.

Building the Graph

After running Rekognition and filtering for LINE type annotations, we are left with a series of labels and the four points that bound the text within the image. The first thing to do is translate the four points of the bounding box into a single point. I accomplished this by taking an average of all four points.

To connect the graph, I took the simple solution of creating an edge from every point to every other point. I was able to reduce the number of edges by only creating an Edge in one direction: ie) So an edge would exist A -> B, but not B -> A. I also checked if any edge leaving a node was within five degrees of another leaving edge, and removed all but the shortest edge.

The idea here is that if you were to travel from A to C which passes thorugh B, you build a path between A and B then B and C, rather than create a new path from A to C. This isn't always true of course. In particular things get strange when dealing with port cities, and river crossings, but for my use case this was a decent assumption to make.

After connecting all the points, I ended up with a fully connected graph.

This would make a very poor experience in a text based adventure however, as the number of potential travel locations would become too huge for a large map. I reduce the number of edges to a minimally connected graph, by removing all intersecting lines and leaving the shorter line of the comparison in the graph.

After removing the duplicate lines I am left with the connected graph below. This looks perfect for my use-case.

Next Steps

From here I'll need to translate the connected graph into a traversable text adventure map. This should be straight forward enough. I'll be using Inform7 syntax, or other similar Interactive Fiction langauge which will mean I can use an off-the-shelf engine to run the generated game. After that, I'll have to address the issues mentioned above before I can build out the auto-generating flavor text.

There's still much to explore and learn here, and I'll definitely be revisiting this topic in the future!