The purpose of this assignment is to practice with AVL trees and using binary files. Note that B-Trees are much better for this in terms of applications, but AVL trees are a great learning experience.
Write a program that reads 32-bit positive integers from a text file and inserts them into a AVL tree stored on a binary file. You may assume that all values on the input file are greater than 0 and fit in a 32-bit signed integer. You may also assume that all values on the input file are different, but it would be a good idea to check for this in case you (or I) create an invalid input file by mistake.
For file names, use command line parameters. The first parameter will be the name of the input file (text), the second file name will be the name of the output file (binary).
For a check of the efficiency of the program, print to standard output the height of the tree. As a quick check of the result, print the value in the root, the smallest value in a leaf, and the biggest value in a leaf to standard output. See sample below for format.
In order to be able to check this program, we will all need to use the same binary file format. That is, we will all need to use the same node structure. For now, use the following struct declaration. If you think something should be changed or added, we will need to discuss this in class since everyone needs to use the same declaration.
int left_child; // child file address
// [record index, not byte address]
int height; // height from here down (0 if leaf)
int parent; // record index of parent (C_NULL if root)
// may need filler to force zeros in any padding of the struct
int file_loc; // location on file (record index)
const int C_NULL = -1;
const int C_NODE_SIZE = sizeof(avl_tree_node);
Notice the file_loc field. It is redundant but should make coding operations on the structure easier. Also, use C_NULL (set to -1) for null pointers and C_NODE_SIZE for the size of a node. Do not scatter the use of the values -1 and 24 (or whatever node size is) throughout your program, it is a maintenance nightmare.
One main point to the problem is to get more practice with binary files. Be sure to build the tree on the file, not in memory. You should never have more than about six or seven nodes in memory at any one time. [That is an approximate value. If you manage to only have four nodes in memory or need to have eight, that is fine. But, don't have dozens of nodes in memory at one time.]
In order to make sure everyone has the same result, when a new node is allocated, it should go at the end of the file right after the previous last node. However, to make sure we can find the root, keep the root at the start of the file. When a node needs to move to the root, swap the contents of that node with the old root.
You must work as a team of two. Let me know who your teammate will be. If you can't find a teammate, let me know. If you feel that having a teammate will be a problem for you, let me know that as well.
As a second aspect of this assignment, write a test plan for your program. Include inputs and expected outputs. Be sure to find cases that test each of the following cases. Note that the test plan is due before the program. This is typical in industry.
Test cases − directions (left/right) are given from the lowest out-of-balance node moving down the tree towards the newly inserted node.
It is possible that a test file could test more than one case. However, try to make your test file test just one case so that you can debug one case at a time.
Submit your testing document and sample input files using the departmental submit facility by midnight on Tuesday, March 31st.
Follow the coding standard. Submit your program using the departmental submit facility by midnight on Tuesday, April 7th.
Submit your team evaluation using the departmental submit facility by midnight on Thursday, April 9th.
Timing note: My program takes around 5 seconds for 100,000 values. Your program should be similar. I did not try to be clever, so don't be surprised if yours is faster. Do be worried if yours is much slower.
Binary files should be read and written a record at a time. Reads and writes of items within records will lead to a 20 point penalty.
Again, you should build the tree on the file, not in memory. Building the tree in memory will lead to a 30 point penalty.
File names should not contain characters that need to be escaped if used in a shell command. For example, file names should not contain spaces. There will be a 10 point penalty for invalid file names.
Programs should be in Unix text format. There will be a 10 point penalty for programs in DOS text format.
Here's one I didn't think I'd have to say in Data Structures. Any program using the colon question mark syntax will get a zero. To be clear, that is not a zero for coding style but a zero on the assignment.