Timeline of Sikh Affairs & Experiments with AI
- I wrote an AI-assisted Python program to take my Web-based timeline of Sikh affairs and convert it into a 169-slide PDF for easy viewing.
- I started this project as a way to create a more user-friendly way for people to consume the timeline I created years ago based on many hours of research and analysis over a number of years.
- My goal was to improve the formatting and add images for a more informative viewing experience.
- In other words, how to use the content you have and breathe new life into it.
- The programming effort took a couple of half days and some 500 odd lines of Python code.
- I used AI (Claude, ChatGPT, Perplexity) to provide suggested approaches and code, but didn't directly integrate it into my editor. I use Sublime for editing and selectively cut and paste recommended code as needed. The alternative is too intrusive and AI has free reign to make changes to your code wherein you eventually lose track and control over your program and no longer fully understand how it works.
- The high level program steps are:
- Take the HTML file and use regex to convert the file to a CSV for ingestion into the Python program
- Remove everything except the date (in a specific and consistent format, e.g. YYYY-MM-DD) and the event text
- Escape any quotes inside the event text
- Remove any hyperlinks inside the event text
- Replace any other HTML tags inside the event text (e.g. bold, italics)
- Ingest the file into Python and read it row by row
- Use some sort of text summarization and keyword extraction to manufacture a title for each event
- Use the keywords extracted above to fetch an appropriate image from the Wikimedia Commons API based on image formats supported by PowerPoint, preferred size, licensing requirements, image quality, etc.
- Create a PowerPoint and place the date, title, event text, and image on the slide
- Publish the final PowerPoint slide deck
- Convert the slide deck to PDF
- I had to iterate over many of the above steps to get each one right. Specifically, the AI methods I used for text summarization and keyword extraction did not work well because, for example, PyTorch is only compatible with Python 3.12 and my Mac is on 3.13 and I didn't feel it was worth going down that rabbit hole.
- Also, the image search on Wikimedia Commons works well for well-known people and things. I had to manually download and replace about 20-30 images to make the deck accurate and meaningful. But the automation made things a lot easier.
- As I learned during the execution of this project, AI refuses to provide certain pictures, e.g. in the case of controversial people or issues. I've experienced the same issue when doing GenAI based image generation
- Here's the final code, in case you're curious. I did not remove all of the commented out failed experiments, so you will see some of that.
Comments
Post a Comment