«BASIC HUMAN DECISION MAKING: An Analysis of Route Choice Decisions by Long-Haul Truckers John Holland Knorring Advisor: Professor Alain L. Kornhauser ...»
even if one had the socioeconomic data for any given driver. For this study, because the data is not available, socioeconomic factors will be entirely ignored and will be grouped in as part of the error term in the analysis.
3.3.3 Characteristics of Specific Trucks Lastly, the data does not include any special characteristics of the trucks. For example, it is not possible to determine if a representative truck is carrying hazardous waste and is not allowed to follow certain routes. Route choice determination is effected because if a driver is unable to use a certain route, the data does not show this factor.
One would not know if a truck driver chose a sub optimal route because he was hauling hazardous waste, or if the driver perceived the given route to be superior to the alternatives available. This study assumes that trucks are allowed to travel freely on any route that allows truck traffic.
4 Methodology As with any scholarly data analysis, it is important to clearly define the thought, as well as analysis, process used to come to any conclusion. This thesis is focused on the route choice decisions that truck drivers make. It uses an enormous revealed preference data set. However, the data is not formatted in an easy to use format with all of the routes and decisions laid out. There is a large amount of data that is essentially useless to this study, so it is necessary to get into the data and pull out all of the “good” data to be used in the analysis. During this reduction process, it is important to keep in mind that this study wants to reduce the data into a more workable size while at the same time maintaining the information content of the data set.
4.1 Stop Determination The first area where this thesis looked to remove extraneous data was when the trucks were stopped. The data set contained a large amount of position data for trucks that showed that the trucks were essentially stopped for extended periods of time. This observation is in agreement with reality because truck drivers are only allowed to drive for a set number of hours each day and must stop to rest during each day. This thesis, however, is not concerned with trucks that are stopped. This thesis views a stop as the end of a trip. Determining stops, however, is not as easy as one might think.
The format of the data made it possible to determine stops via a heuristic without any degradation in the information content of the data set. Because the data collection
methods had some sort of error involved with them, it was unlikely that a truck would report itself to be in exactly the same position. It was however quite common that the truck would report itself to be quite close to its previous observation. This study decided that if a truck reported itself to be within 4 miles of its previously reported position 30 minutes or longer after the previous observation, then the truck was deemed to be stopped over that time period. The advantage to this method allowed for significant data reduction to be performed and the data analysis was further simplified. Additionally, there were a number of trucks that were stopped for eight or more hours in the same location. The algorithm would search for these instances and remove all of the stop data except for the first and last stopped observations. One possible drawback to this heuristic is the possibility that a truck was stopped at a terminal, made a location observation, then departed to make a delivery, and returned to the terminal before making another observation. While this chain of events is quite possible and would potentially alter the conclusions of this thesis, it is important to keep in mind that this thesis is focused on the basic decision making behavior of long haul truckers rather than single load delivery vehicles.
4.2 Determination of Regions to be Examined The determination of zones to be examined came from three sources. First and foremost, Todd Burner ’99 set the framework for determining appropriate cities to be examined in his senior thesis. Additionally, Prof. Alain Kornhauser contributed to the list of potential analysis regions. Lastly, I picked the Chicago Skyway case study, as well as the 90/94 case study.
The cases to be examined are as follows:
Ø Chicago, IL (Chicago Skyway-80/94) Ø Cincinnati, OH (I-75/I-275) Ø Columbus, OH (I-70/I-270) Ø Indianapolis, IN (I-70/I-74) Ø Nashville, TN (I-40/I-440) Ø Memphis, TN (I-40/I-440) Ø Houston, TX (I-10/I-610) Ø Oklahoma City, OK (I-40/I-240) Ø Richmond, VA (I-95/I-295) Ø St. Louis, MO (I-55/I-255) Ø San Antonio, TX (I-35/I-410) Ø Wilmington, DE (I-95/I-495) Ø Interstates 90 and 94 between Tomah, WI, and Hirsch, MT As was previously stated, the analysis regions were picked based on the layout of the highway network surrounding major cities. Regions that were selected had a major highway that lead up to and through the downtown area of a major city in addition to a bypass route that circumvented the downtown area of the city. These areas were selected because they proved to be fertile ground for performing decision-making analysis.
Additionally, the 90/94 case study was chosen because it hopefully would also be a fertile
area for analysis of perceived speeds on alternate routes that were greater than 100 miles in length.
4.3 Trip Determination Determining trips from the data set proved to be quite a large task. There are a few software packages available that can take raw GPS data as inputs and map match them to the U.S. highway network. This study chose to use ALK Associates’ CoPilot guidance package.
The first attempt at analyzing the data involved a process where 10,000 trucks worth of GPS data would be loaded into the computer. Then, one of the previously determined case studies would be examined. The number of trucks on either the downtown or the bypass route would then be counted manually. This method had many significant drawbacks. First, because there was so much data being shown on any given screen, it was quite difficult to single out any specific truck. Additionally, it was quite difficult to determine exactly what route the truck had taken. For example, it was possible that a truck would enter the downtown route, make an observation, and then immediately exit the downtown route. It was also quite difficult to accurately count the numbers of trucks on each route because of the infrequency of the data collection.
Lastly, sorting out “good” truck data from 250,000 trucks for 13 analysis zones and two possible routes is an enormous task to do by hand, so a better method was needed.
4.3.1 Route Determination Heuristic In order to streamline the data analysis, another heuristic was devised to determine what routes people took. For travel demand modeling, it is important to have Origin-Destination pairs. These pairs are used to generate the trips and routes for the analysis. In looking at the data, one piece of information that has proven to be quite useful is the latitude and longitude information contained in the GPS records. With this information, along with the exact location of the roads, one can determine exactly where the truck is on the map. Additionally, this thesis only wants to examine trucks that pass through the selected analysis region without stopping. This study needs only trucks that pass through the analysis region, because the drivers of those specific trucks are totally free to make their own route choice decisions. They do not need to go to a certain point in the city to drop off a load, thereby forcing the driver to take a specific route. The first idea conceived was to pick all of the links in the CoPilot database that corresponded to the downtown route as well as the bypass route and figure out which trucks “snapped” to those routes. However, the computational intensity of this method is beyond the scope of this project, and a simpler method was devised.
4.3.2 The Box Algorithm The method devised, henceforth referred to as the Box Algorithm, proved to be a viable heuristic. The premise behind the Box Algorithm is the data is already formatted in a nice numeric format that is easy for a computer to work with. This study leveraged the computing power of three supercomputers to sort the data and extract the useful data.
Essentially, the process the algorithm utilized is as follows. First, the user inputs a series
of pairs of GPS coordinates. These pairs were used to set the northwest and southeast corners of a rectangular box. For a list of the input coordinates, please refer to the Appendix. The program takes four pairs of coordinates to generate boxes 1, 2, 3, and 4 respectively. Boxes 1 and 2 were set up so that they capture the trucks that were on the highway leading up to and out of the analysis zone. These boxes are essentially the O-D pairs for the trips. Box 3 was set up to capture the downtown route for most cases and Interstate 90 for the 90/94 case. Box 4 was set up to capture the bypass route for most cases and Interstate 94 for the 90/94 case. Boxes 3 and 4 were used to generate the route decisions that the drivers made. Please refer to Figure 4-1 for a graphic representation.
Figure 4-1: Map of Houston road system with boxes covering Highway 10 leading into and away from Downtown Houston in addition to Box 3, which captures the downtown portion of Highway 10 and Box 4, which captures the bypass portion of Highway 610.
The computer program would then sift through the 250,000 truck records to generate instances where a truck was in any given box. The output file consisted of a stream of data that included the Truck ID, Time of Observation, Latitude, Longitude, and the box number that the truck was found in. The output file was then sorted by Truck ID and Time of Observation. The output file was then used as an input into another program that would sort through the data and output all of the trucks that were found in box combinations that matched one of the following sequences: [1,3,2], [1,4,2], [2,3,1], [2,4,1]. The program accomplished this task using a very complicated Finite State Machine or FSM. Because FSMs are quite useful in pattern matching, they are the primary candidates to sort through the data. The simplified method that the FSM used is as follows. First, the data would be read in. Each observation would receive a tag that signified which box the observation was in. Next, the data was examined to see if any of the previously mentioned patterns were found. The four patterns were important because if a truck had position data that was in Box 1, then Box 3, and finally Box 2, that meant that the truck came in from the east, chose to take the downtown route, and exited the city to the west. However, the data was not already in a nice format, so many different error checks were done. For example, there were many cases where the truck would come through Box 2, then the next observation would be in Box 1, and another later observation would be in Box 3. This trip, however, does not help the analysis because it is not the type of trip that this study is looking for. Refer to Figure 4-2 for a graphical representation of the FSM operator. Additionally, C code for the FSM can be found in the appendix.
Figure 4-2: FSM operator used to extract usable routes for analysis The output from the FSM program was a file that broke the data down into usable trips. The program had two file outputs: *data3.txt and *data4.txt corresponding to the downtown route data and bypass route data respectively. These two files contained a stream of trips. The format of that data is: Truck ID, Start Time, Start Date, Start Basis Seconds, End Time, End Date, End Basis Seconds, Travel Time, and Average Speed.
The average speed calculation was one of the derived statistics that the program output.
By taking the location information from Box 1 and Box 2 and then inputting them into a great circle calculation, this thesis was able to derive a good approximation for the total distance traveled and then calculate the average speed by using the distance and travel time. The following is an extract of a great circle algorithm.
An analysis was then done for all of the routes. The summary statistics can be found in the appendix. The output statistics were as follows: Count of Trucks on Downtown and Bypass routes, Average Travel Time, Minimum Travel Time, Variance of Travel Times, Std. Deviation of Travel Time, Mean + 2 Sigma Travel Time, Median Travel Time, and Maximum Travel Time.
4.3.3 Drawbacks Associated with the Box Algorithm When using any heuristic, there are always going to be certain drawbacks. This thesis made a concerted effort to minimize any biases that would occur while using the Box Algorithm. However, a few problems did appear as the analysis was being performed. First, there are a few analysis regions where there is not enough data to perform an accurate analysis. For example, the Wilmington region has only seven trucks that take the bypass route and zero trucks that take the downtown route. This can be explained by two different possibilities. First, the analysis region is no more than 900 square miles. Included in that area are no more than 70 miles of roads encompassed by boxes that are being analyzed. The trucks in this study only report there locations at random times with E[time between observations] = 45 minutes. It is virtually certain that
more than 7 trucks in the data set passed through the Wilmington area, but it is rather unlikely that they would have made a position observation in boxes 1,2, and 3 or 1,2, and
4. The other possibility is that trucks did pass through Wilmington and make observations on the highway leading into the city, out of the city, and either the bypass or downtown routes, but the observation was not included in one of the analysis boxes. If one refers to Figure 4-1, he will notice that neither the entire length of the downtown nor the bypass routes are included in a box. This was done to minimize the number of trucks mistakenly included in the data set that might have been on different highways than the analysis highways.