©2020 by Justin Wilson

Forecasting the Weather From our Past — Part 1

The need has arisen, the desire is there, and I finally have an idea I am excited about. The need is to complete my Master’s Thesis, the desire is a life long passion for meteorology, and the idea is to attempt to forecast the weather based on our past.




DISCLAIMER: Let me state this early and often. I do not think that technology can replace a skilled meteorologist. The hope is that through data-mining, machine learning, and artificial intelligence I can help meteorologists complete their job more effectively.


With all of that out of the way let me set the stage. I know the end goal, but I have no idea how to accomplish it. What I do know is how to data-mine, teach machines how to learn, and even create basic artificial intelligence using technology. The exact details are, at the time of this writing, still unknown.


What I do know is that I want to start small and scale from there. To begin with I want to pick a single city, and surround cities, that has a wide range of weather. That means four distinct seasons, dramatic day and night temperature swings throughout the year, and a good mix of sun, rain, snow, sleet, wind, barometric pressure, cloud coverage, humidity, and more. The need for this diverse weather is ensure the algorithms I design are not guessing accurately. There needs to be statistical evidence that the algorithm is accurate and not just lucky.


After a few calls to local weathermen, NOAA, and Penn State I came to the conclusion that Minneapolis Minnesota was the perfect fit. According to FiveThirtyEight.com and various other sources Minneapolis easily ranks in the top three cities in the United States when it comes to forecasting the weather. Let’s take a deeper dive into Minneapolis weather.

Minneapolis has a record high temperature of 108 degrees and a record low of -41 degrees F. Rainfall can vary from nearly nothing to over four inches in a given month. Additionally barometric pressure varies considerably, humidity fluctuates day to day, and cloud coverage is consistently changing.


There are no natural barriers to block cold air from pouring south from Canada during the winter. This also means that on rare occasion Minneapolis can see severe weather like tornadoes and floods.


Lastly, Minneapolis has a resounding amount of accurate weather data that is readily accessible for research projects like mine. Minneapolis also offers the ability to expand since, like most metropolitan areas, there are plenty of other smaller cities in the surrounding region that I can expand into as the need exists. Saint Paul, Plymouth, and more.


With the city picked I then began searching for the best place for data that was at a reasonable cost. Ultimately I found that I could obtain the last six years worth of data from https://openweathermap.org/ for a measly ten USD. The historical data should be more than enough to get me started on my journey.


It offers wind data, cloud coverage, humidity, temperature, and more in an easy to consume JSON format. Upon downloading the last six years worth of data for Minneapolis it was time to pick a technology to store this data.


My first thought was to stand up a SQL Server and just dump the data into some organized tables that are easy to query and work with. I then realized that while this would be the quickest way, it may not be the best long term.


I spent several weeks contemplating what technology to use. Should it be a document storage platform like MongoDB or a traditional one like previously discussed? The scale of the data, if this works, could easily outgrow what I would want to store effectively in a SQL Server. I wanted the ability to Shard data across multiple nodes so I can distribute queries, that will get complicated, as the project grows.


While attending the InterSystems Global Summit 2018 in San Antonio I came across my answer. InterSystems is in the process of releasing InterSystems IRIS Data Platform which solved all of my problems. InterSystems IRIS allows for Sharding, can easily be scaled in the cloud using containers or virtual machines, and has all (and more) of the features I need to save an analyze my data.


Knowing the technology stack I wanted to deploy I reached out to InterSystems to obtain licensing for their data platform. After explaining my project to them they were more than happy to assist and were kind enough to offer a free 90 day license to get me started. (Yes, they do offer this to everyone but I am hopeful for the future)

The next steps will be to actually begin analyzing this data and attempting to make progress towards the goal. Stay tuned for more updates in the future of my progress.