Reading and processing files and using arrays.
Many real programs work with data. Sometimes lots of data!
This data is often stored in text files and must be read,
parsed and converted into values usable by your program. In
this assignment you are going to write a program to process
weather data from the National Climate Data Center. The NCDC
maintains a data collection of daily weather observations
for more than 19,000 locations in the United States. Some of
these have been recorded since the late 19th century. The
data used in this assignment was obtained from their web
site www.ncdc.noaa.gov/cdo-web.
Download the file PortlandWeather1941to2020.txt into your program 10 project directory. This contains the daily weather data recorded at the Portland International Jetport since 1941. You are going to be analyzing this data to look for evidence of climate change. The start of this file looks like this:
29220 data records DATE TMAX TMIN ---------- ---- ---- 01/01/1941 38 25 01/02/1941 32 20 01/03/1941 31 22 01/04/1941 34 25 01/05/1941 32 20 01/06/1941 29 5 01/07/1941 29 -10 |
Data files come in many different formats and sometimes need different techniques to parse them. This file is very simple. It is organized by line and in fixed width columns. The first three lines are a header describing the size of the file and the data columns. Each subsequent line represents a single day's weather record. The columns are:
- DATE - The date of the observation in format MM/DD/YYYY
- TMAX - The maximum temperature in Fahrenheit
- TMIN - The minimum temperature in Fahrenheit
Important grading note: You should not use specific
numerical values in your code (like 29220, 1941, or 2020).
This is called "hard coding" and is considered bad practice
because it would require changing the program every time you
want to analyze a new data file. Read these values from the
file. I am only providing 1 file, but there are 19,000 to
choose from. 1 point will be deducted for every time a hard
coded value is used that should have been read from the
file.
The first line in the file says how many data records there
are. You will be using a Scanner to process the file. Read
that number of records and then create 5 separate integer
arrays for month, day, year, tmax, and tmin. These will hold
all the data from the file. This is a lot of data, but your
computer can easily hold it in memory.
First call your Scanner's nextLine() method 3 times to
discard the remainder of the 3 header lines.
Next you are going to read and store all the data into your
arrays.
The Scanner's default behavior is to break up data at whitespace (spaces, tabs, and newlines), but we also want to break apart the date string which uses forward slashes to separate the month, day and year. Use code like this to change the delimiter to also include forward slashes:
fileScnr.useDelimiter("[/ \t\n\r]+");Now you can just use .nextInt() to read all five fields: month, day, year, tmax, and tmin. The Scanner will then automatically wrap to the next line and you can continue reading the entire file, and storing it in your arrays.
Verify that you have read in the data correctly. Either
examine your arrays using the debugger or print them. Do
they match the contents of the file?
In Program 9 you wrote methods to work on arrays. Copy those methods into this program and use them to calculate and print:
- The highest temperature in tmax and the date it occurred
on (use arrayMax(), arrayFirstIndexOf(), and then use that
index for your month[], day[], and year[] arrays)
- The lowest temperature in tmin and the date it occurred on
- The average tmax (Check your value. It should be
55.6608...)
- The average tmin
One aspect of climate change that has been noted in Maine
is that the first fall frost has been occurring later in the
year. We are going to analyze the minimum daily temperature
data to calculate the date each fall when the temperature
first reaches freezing temperatures.
First we are going to find the starting index of each year
in the 'year' array. Your task for Part 4 is to create a
loop that prints:
year starting
index 1941 0 1942 365 1943 730 1944 1095 1945 1461 ... |
There are many ways to write this. The easiest is to just use your arrayFirstIndexOf() method from Program 9.
Remember it is poor programming to hard code specific
values such as 1941 and 2020 in a program because then the
program would have to be modified to process a different
file. Instead use year[0] to get the starting year and
year[n-1] to get the last year. (I used the variable n for
the number of data records.)
I suggest that you store these values in an array similar
to how Program 9 used an array for the starting indices of
the months. This will make Part 6 easier.
Make a method int firstFallFrost(int[] tmin, int
yearStartIndex, int yearEndIndex) that looks at a one
year section of the array for the date of the first fall
frost and returns the index of that day. Hint: start
searching forward from the middle of the year.
For testing, you should find that for 1941 (tmin, 0, 365)
it occurs at index 262, and for 1942 (tmin, 365, 730) it
occurs at index 636. In the next part you will use the month
and day arrays to print the human understandable dates 9/20
and 9/29.
Use your starting year indices from Part 4, your
firstFallFrost() method from part 5, and the month and day
arrays, to print a table of the first fall frost dates for
each year like this:
year first
fall frost 1941 9/20 1942 9/29 1943 10/10 1944 9/25 1945 9/30 ... |
Make sure your results match.
Finally, do a little
analysis of the results: You can do this by hand or
with code. The first fall frost dates vary a lot from year
to year. Compare the average of the first 10 years of the
historical record to the average of the most recent 10
years? How much has it changed?
A nicer way to present the data is averaged by decade, i.e. the average fall frost day in the 1940's, 1950's, ... 1920's. Write code to do the analyses this way. Note that for some decades we may not have data for all 10 years (in our data the 1940's and the 2020's). Your code should still give correct averages for these decades. Print your results in a neat table.
Turn in the final version of your code, your output, and your final analysis of your results.