Objectives

Reading and processing files and using arrays.

Weather Data

Many real programs work with data. Sometimes lots of data! This data is often stored in text files and must be read, parsed and converted into values usable by your program. In this assignment you are going to write a program to process weather data from the National Climate Data Center. The NCDC maintains a data collection of daily weather observations for more than 19,000 locations in the United States. Some of these have been recorded since the late 19th century. The data used in this assignment was obtained from their web site www.ncdc.noaa.gov/cdo-web.

Download the file PortlandWeather1941to2020.txt into your program 10 project directory. This contains the daily weather data recorded at the Portland International Jetport since 1941. You are going to be analyzing this data to look for evidence of climate change. The start of this file looks like this:

29220 data records
DATE       TMAX TMIN
---------- ---- ----
01/01/1941   38   25
01/02/1941   32   20
01/03/1941   31   22
01/04/1941   34   25
01/05/1941   32   20
01/06/1941   29    5
01/07/1941   29  -10    

Data files come in many different formats and sometimes need different techniques to parse them. This file is very simple. It is organized by line and in fixed width columns. The first three lines are a header describing the size of the file and the data columns.  Each subsequent line represents a single day's weather record. The columns are:

  • DATE - The date of the observation in format MM/DD/YYYY
  • TMAX - The maximum temperature in Fahrenheit
  • TMIN - The minimum temperature in Fahrenheit

Important grading note: You should not use specific numerical values in your code (like 29220, 1941, or 2020). This is called "hard coding" and is considered bad practice because it would require changing the program every time you want to analyze a new data file. Read these values from the file. I am only providing 1 file, but there are 19,000 to choose from. 1 point will be deducted for every time a hard coded value is used that should have been read from the file.

Part 1 (5 points) Allocate Arrays to Hold the Data

The first line in the file says how many data records there are. You will be using a Scanner to process the file. Read that number of records and then create 5 separate integer arrays for month, day, year, tmax, and tmin. These will hold all the data from the file. This is a lot of data, but your computer can easily hold it in memory.

Part 2 (5 points) Read and Store the Data in your Arrays

First call your Scanner's nextLine() method 3 times to discard the remainder of the 3 header lines.

Next you are going to read and store all the data into your arrays.

The Scanner's default behavior is to break up data at whitespace (spaces, tabs, and newlines), but we also want to break apart the date string which uses forward slashes to separate the month, day and year. Use code like this to change the delimiter to also include forward slashes:

      fileScnr.useDelimiter("[/ \t\n\r]+");

Now you can just use .nextInt() to read all five fields: month, day, year, tmax, and tmin. The Scanner will then automatically wrap to the next line and you can continue reading the entire file, and storing it in your arrays.

Verify that you have read in the data correctly. Either examine your arrays using the debugger or print them. Do they match the contents of the file?

Part 3 (5 points) Long Term Averages and Records

In Program 9 you wrote methods to work on arrays. Copy those methods into this program and use them to calculate and print:

  • The highest temperature in tmax and the date it occurred on (use arrayMax(), arrayFirstIndexOf(), and then use that index for your month[], day[], and year[] arrays)
  • The lowest temperature in tmin and the date it occurred on
  • The average tmax (Check your value. It should be 55.6608...)
  • The average tmin

Part 4 (5 points) Finding the starting index for each year

One aspect of climate change that has been noted in Maine is that the first fall frost has been occurring later in the year. We are going to analyze the minimum daily temperature data to calculate the date each fall when the temperature first reaches freezing temperatures.

First we are going to find the starting index of each year in the 'year' array. Your task for Part 4 is to create a loop that prints:

year starting index
1941    0
1942    365
1943    730
1944    1095
1945    1461
...

There are many ways to write this. The easiest is to just use your arrayFirstIndexOf() method from Program 9.

Remember it is poor programming to hard code specific values such as 1941 and 2020 in a program because then the program would have to be modified to process a different file. Instead use year[0] to get the starting year and year[n-1] to get the last year. (I used the variable n for the number of data records.)

I suggest that you store these values in an array similar to how Program 9 used an array for the starting indices of the months. This will make Part 6 easier.

Part 5 (5 points) Finding the first fall frost

Make a method int firstFallFrost(int[] tmin, int yearStartIndex, int yearEndIndex) that looks at a one year section of the array for the date of the first fall frost and returns the index of that day. Hint: start searching forward from the middle of the year.

For testing, you should find that for 1941 (tmin, 0, 365) it occurs at index 262, and for 1942 (tmin, 365, 730) it occurs at index 636. In the next part you will use the month and day arrays to print the human understandable dates 9/20 and 9/29.

Part 6 (5 points) Printing the first fall frost dates

Use your starting year indices from Part 4, your firstFallFrost() method from part 5, and the month and day arrays, to print a table of the first fall frost dates for each year like this:

year  first fall frost
1941     9/20
1942     9/29
1943    10/10
1944     9/25
1945     9/30
...

Make sure your results match.

Finally, do a little analysis of the results: You can do this by hand or with code. The first fall frost dates vary a lot from year to year. Compare the average of the first 10 years of the historical record to the average of the most recent 10 years? How much has it changed?

Extra Credit (5 points) Averaging by decade

A nicer way to present the data is averaged by decade, i.e. the average fall frost day in the 1940's, 1950's, ... 1920's. Write code to do the analyses this way. Note that for some decades we may not have data for all 10 years (in our data the 1940's and the 2020's). Your code should still give correct averages for these decades. Print your results in a neat table.

What to turn in

Turn in the final version of your code, your output, and your final analysis of your results.