National Transit Database Sampling Requirements and Guidance

Uploaded by RutgersNTI on 12.07.2012

Good anfternoon, and thank you for participating in this webinar.
Today we will discuss the National Transit Database requirements for sampling and guidance.
My Name is Judy Kolva, your NTI representative.
The National Transit Institute develops, promotes, and delivers training and education programs for the public transit industry in the United States.
We are pleased to have Sergio Maia from FTA as our presenter.
Mr. Sergio Maia is an electrical engineer,
has over ten years experience working for the National Transit Database,
and over twenty years experience working in the transportation sector.
Prior to the National Transit Database, Mr. Maia worked on the
preparation of the St. Louis MetroLink light rail system operation plan and worked as a consultant to the World Bank.
In his free time he is an avid chess player and classical music lover.
Our topic today is the sampling requirements and guidance with a special focus on the National Transit Database template sampling plan.
And why we sampling in the NTD is that because it's a requirement of the law,
the funding for transit agencies is based in part
on the reporting of passenger miles traveled,
and that is information that is not usually collected by transit agencies as part of their normal operations.
And it's also important measure of transit service.
It's nothing more or less than the cumulative distance traveled by passengers in the system.
So, then what is required to determine passenger miles traveled is the boardings and alightings of each passenger because we need to know
the distance traveled by each passenger. So in most cases, the number of boardings - and I noticed
that, based on polling questions, most of you had a 100% count of trips,
that most of you have a full count of boardings - however, alightings are not usually recorded, and therefore a sampling process needs to be used
to determine ons and offs and distance traveled.
And the FTA is issuing this guidance because of the existing FTA plans - and I noticed from the polling questions that a good number of you
are using the existing circular 2710.1a for fixed route, which is a very old sampling plan.
It's old and outdated,
and requires sample sizes that are much larger than what you really would need, in most cases,
to meet the 95 ten percent tolerance that FTA requires.
So, with this sampling guidance, we add the ability to customize sampling plan - plans based on your specific data.
So that means that you can have smaller sample sizes and save dollars.
So, the purpose of the package: It's for estimation of passenger miles traveled.
And it can also be used to estimate unlinked trips when they are not available or they are not reliable.
I noticed that most of you do have a 100% count of trips, which is very good news because many options open up for you too,
in terms of sampling, and FTA requires that you report 100% count of trips when they are available.
You are going to find in this computer package:
There is the sampling manual and there is an Excel workbook, with several tabs that I will show in a little bit,
that has a way for you to enter your data and to see what options are available for you.
And now, for this second year of the sampling package, we added a new plan that's based on sampling every X days.
And if you recall, I'm sure that for many of that are familiar with the plan, for the circular 2710.1a,
it's based exactly on the concept of sampling every X days.
And so we are adding this plan now, because we know that many agencies like sampling in a systematic basis.
Every day, every second day, every third day, etc.
And we are going to call this type of plan, from now on, INTERVAL BASED.
So every time that I say "interval based plan" I am referring to a plan where you sample
systematically every day, every second day, every third day, etc.
The package covers all modes.
And it does not change the requirements of 95% and plus minus 10% margin of error.
You don't need a certification from a statistician to use this plan; it's approved by FTA.
It's voluntary at this point, so if you want to still use the old plans, you can.
Or if you are using an alternative sampling plan, you can also do that.
And key differences, from old FTA plans, is that it has more than one sampling plan,
and the most important feature is that the sample size is not fixed, unlike the old plans.
It depends on your data.
So, the data that you feed into the system.
And the sample is based on sampling every X days, as I said, is interval based sampling plan.
Or - random selections within a weekly, monthly, or quarterly time period, which is another way of sampling.
And I'm going to call this PERIOD BASED SAMPLING.
In the period-based sampling, you sample weekly, but the day you are sampling is random.
And not we are going to see examples illustrating how that works.
And continuing the discussion of the differences from the old plans:
It's a very nice feature, is that it gives you the option, and we are going to discuss this in detail, of grouping your data
based on certain common characteristics of a route.
That is going to be available only for the period based plans, unfortunately, but you know, it's a nice feature.
And the plan revision - it's every five or every nine years, depending whether you have to sample every year or every three years.
That means when you determine a sample size using this template plan, you just need to revise your sample size only in the sixth year
if you are sampling every year.
So if you want to forget the template, you can for five years, and you keep doing exactly what you have been doing,
from the first time that you input your data into the template plan and got a sample size for any available plan.
So, I mean, then again, just to make a summary: Two types of plans.
We have period based and interval based plans. By now we've seen these two types of plans.
The template offers you ready-to-use plans and template sampling plans, which is this concept of inputting your data into the workbook.
The ready-to-use sampling plan - we are going to discuss this in a minute. I'm going to talk about the ready-to-use sampling plan and why we have it.
But let me introduce this concept first, it's that we have options.
And it's called the base option, which is usually, it's when you don't have unlinked trip 100% count of trips
and you need to estimate unlinked trips. And in general, it produces large sample sizes.
APTL option - which, APTL stands for average passenger trip length, and it's a very attractive option.
And it's available for all of you who have a 100% count of unlinked trips, which is required for using this option.
So we are going to see some examples showing how the APTL is, in general, much more efficient than the base option.
And finally, there is PPMT, which stands for potential passenger miles traveled.
We are not covering it in this presentation. It's an interesting option, but we just don't want to overwhelm you with too many options.
So it's covered in the reporting manual, and you can look if you're interested, and you can also contact me if you want to discuss it in more detail.
And finally is the grouping of data, which is available for both plans and allows you to make your samples even more efficient, even smaller.
Let's talk then about ready-to-use sampling plans. Why is that?
What happens is that, let's assume that you don't have sample data for a given mode. Let's say, for instance, you're starting a new mode.
Let's say there is a light rail, a new light rail system that is starting operations. It never existed before.
So, I mean, you don't have any data to input into a template plan.
Then, for that case, we are offering a ready-to-use sampling plan, where you are going to find in the reporting manual,
documents and has tables showing what the sample size should be for the type of plan.
Another situation is that when you're changing or you start a new type of service.
Let's say, for instance, that for the fixed route system, you always had direct operated service,
but you are contracting out for some express routes now.
And NTD requires a whole separate set of statistics for your new public transportation service.
Again, for that case, even though the mode is the same, the type of service is not - and you can use then a ready-to-use sampling plan.
Or, another situation is when you have sampling data from your previous sampling year, but, for whatever reasons, they are not reliable.
Or they are incomplete, and in that case you can also use a ready-to-use sampling plan.
Or let's say that your system's route structure changed very significantly -
when I say I a major change, I'm talking about 25%, let's say, changes in revenue mileage.
So it's not just that you added a new route or you added some small feature to your system,
but it has something that really affects the structure of your service in a way that the data -
to use the data from a previous sampling year is not going to represent what your new system is!
And in that case, you'd also use a ready-to-use sampling plan.
So there are ready-to-use sampling plans for most modes
Bus - and when I say bus, I'm referring to normal bus service, or commuter bus, or BRT, or trolley bus.
And for rail, we have plans for commuter rail and any other rail mode.
And in addition the non-fixed route: plans for demand response and vanpools.
An important consideration to make is the sampling unit.
In the case of fixed route, for instance, the most common sampling unit is a one-way trip.
That is, bus goes from A to B. A the beginning of the route - it goes to the end, that's point B.
And that's a one-way trip. That's a sampling unit.
But you are not limited to that. You can do a two-way trip, or a round trip.
That means that, in that case, your sampling unit goes from A to B, and B goes back to A.
So that's a round trip.
It's very important that you stick to your sampling unit. That is, if you are using one-way trips, you keep using one-way trips.
But more of that a little later.
For non-fixed route service, the sampling unit is vehicle days.
That means the number of vehicles you have available and that are going to be in service during that day. So that would be the unit.
And the ready-to-use sampling plans have a base option and an APTL option - with and without grouping.
So, you even can do a route grouping, if you want, in the APTL option.
Now let's talk about the other type, which is the objective of the template sampling plan...
Which requires the last mandatory sampling year.
So in other words, let's say you sample last year, or the last time you sampled was three years ago.
That's your most recent data.
And that data you are going to feed into the system, as we are going to see a little later,
and like several sampling options we expanded this year, with the introduction of the old interval-based plan, sampling every X days.
The template sampling plan should, in general, produce sample sizes smaller than the ready-to-use sampling plans.
So with that, we are going to now show how you go into the template to choose the option that you need.
The first thing - and at this point I'm going to open the template plan to show how we go over it -
because the first thing that you have to decide is whether you want to do a period-based or interval-based.
By the way, which is only for fixed route.
The interval-based is only for bus, commuter bus, BRT, trolley bus.
It cannot be used, at least from this release, for any other mode.
As you can see, this is the template plan - how it looks like.
And the one I'm going to show you has data already entered into it to illustrate how it works.
And as you can see here, at the bottom there are several tabs.
There is a cover and some introductory remarks here that I'm not going to cover right now.
But it's very useful for you to read this.
And here, on row 66, as you can see there is a check box.
The first thing, then, that you need to do is decide whether you want an interval-based or a period-based.
So if the box is checked, you would be doing the interval-based, which is similar to the circular 2710.1a.
But if you unclick it, then everything else, all the other tabs, are going to change -
and they are going to be in a format that is proper for you to do the period-based plan, which is sampling weekly, monthly, or quarterly.
So that would be then the first thing to decide.
But more of that later. I am going back to the presentation because we have some other important things to cover.
So, what I had here is a screenshot that shows that option, where you decide on what plan to use, what type of plan to use.
And I would like now to talk a little bit about the base option.
And I would like now to talk a little bit about the base option.
Which is an option that is very similar to old FTA circulars.
That circular 2710.1a is a base option type of sampling.
Let's give like a definition of what the base option means.
The base option is related to the quantity that we are sampling.
In this case here, the quantity we are interested is in passenger miles traveled, and linked passenger trips per sample trip.
So in other words, what we want to find to determine at the end of the sampling cycle - it's what was the passenger miles per trip
and the unlinked passenger trips per trip.
So that is the unit that we are interested in
And this plan is used mostly, and I would say they apply mostly in for agencies that do not have a 100% count of trips and require to sample
unlinked trips as well.
And I noticed that a few of you are in that situation, so the base option would be an alternative
available for whoever does not have a 100% count of trips.
It's not advisable and I wouldn't advise this option for whoever has a 100% count of trips,
but the fact is that if you want to use it, even if you have a 100% count of trips,
then you have to keep in mind that you have to report to the NTD the 100% count.
And you do not report the estimate of trips that this option gives you.
And here is an example of what I just said.
Let's say that we have ten one-way bus trips, and the total passenger miles in the sample is 100,000.
So my average is 1,000 passenger miles per one-way bus trip. It's the ratio of the two.
So, now what I do is, I know, let's say from your system, that in the year you have 2,000 one-way bus trips.
Then your passenger miles that you report to NTD is just the average - 1,000 passenger miles traveled - times the number of one way bus trips.
And in that case, that would give you 2,000,000 passenger miles traveled.
So that is the passenger miles - this is assuming, then, that the PMT per sample trip is the same PMT per trip of the universe of trips.
We are extrapolating from our sample to the universe.
The second option - the APTL option, which stands for Average Passenger Trip Length - is available for both interval- and period-based plans.
And, as I said before, the APTL is the average passenger trip length - it's the ratio of passenger miles divided by unlinked trips.
By definition.
So the APTL option. Keep in mind that you need to have a 100% count of trips to use it.
If you don't, it's not available to you.
And here, for example, a second example of the APTL option.
And you are going to see how different it is from the base option.
Because here, from the ten one-one-way bus trips, my total passenger miles was, let's say 225,000,
and my total unlinked trips is a different number here.
What I'm looking for is the ratio of the two - because that's what APTL is.
It's 1.5 miles per trip.
So that's what the sample gives us, and now the extrapolation needs a 100% count of trips - which you know
because you have 100% count of trips. You have a way of determining this statistic.
So you just multiply one by the other the get the total passenger miles traveled.
So the APTL from the sample and the APTL of the universe - we assume they are they are the same -
or not the same, but within the tolerance or the margin of error of 10% that FTA requires.
And grouping. We're going to see practical examples of what grouping does for us.
It's available for the period-based plan. Unfortunately not yet, but may become available in the future for the interval-based plan.
And it's the idea of the routes - they can be grouped based on common characteristics.
Like, let's say you have express routes, or you have downtown circulators - things like that, they are quite different.
So if somehow you can create groups of routes that share these characteristics,
and treat each of these groups as a separate universe to estimate passenger miles, you can reduce sample size.
So that means that you reduce the variance of what you're trying to determine,
and that implies reducing the sample size, which implies greater efficiency. Less cost.
So when you're doing grouping, each group is a separate sample.
In fact, it's like if you have two groups, you are doing two sampling processes simultaneously concurrently.
They never intersect in any way.
What you're going to do is at the end of your fiscal year, you are going to make an estimate of passenger miles traveled
for each of these two, if you are using two groups.
And we are adding your groups together only at the end, after you estimated the passenger miles for each of them.
So the total passenger miles would be then the sum of all these groups.
But in general, you should not do more than three groups -
and ideally, I would even say based on what I have seen from other agencies, two groups is ideal.
I's easier to manage. It gives you most of the benefit of doing grouping, and it's easier to calculate.
I have seen cases where, if you just add three groups, it adds a complexity, and the amount of work that needs to be done to compile the data
without really great increase in benefit, in terms of reducing sample size.
But in fact, the way the template sampling plan was designed, it allows up to 10 groups.
But I would never recommend you to use so many different groups. If you can stay within two, a maximum of three, that would be great.
The grouping, when applied to different options, imply in classifying the data by different ways.
For instance, assign grouping for the APTL option.
It makes a lot of sense to create one group where you have, let's say, routes that have very short average passenger trip length
like downtown circulators, or routes where you know the average trip length is small.
In another group, you have longer routes - more express, commuter-oriented routes.
That would be one way of doing two groups in the APTL option.
And the slide I'm in, again, when you're doing grouping, each groups it's a separate sample.
And each sample it's going to give you an estimate of passenger miles for each of these groups.
Now we are going back to the template sampling plan because how we use it.
So, we want to find and want the template to give us sample size for different options.
So we need the last mandatory sampling year, and we can create groups.
And I would, at this point, go back to our spreadsheet.
Okay, let's assume that you are doing the period-based plan, in which case what you need to do -
on this tab, there is a tab that's called "period input," okay?
And first I want to show you this, on this tab, where the data would be entered.
If you can see starting here, row 71 I have columns for date - so that's the date of the sample -
and ID - that could be a time or it could be another number. That is up to you to decide how you identify your samples.
And here, what is really mandatory are these two columns, columns D and E.
You need unlinked trips and passenger miles traveled for each sample from your last mandatory sampling year.
In this example it's also adding PPNT, but I'm not talking about that.
And you are going to see that there is a column here that is also optional.
But in this example, the routes were grouped and the group here has short routes, medium routes, and long routes.
As you can see, every trip in the sample was one of the three.
So as you can see, this is a sample - let's see how many records it has - it had approximately 540 sample trips that were fed here into the template.
And by looking here, because we were doing grouping for each of three types of routes I entered for that sampling year,
which corresponds to the data we just saw now.
We put the number of service units for each of them - that means, in that report year we have 109,685 trips for short routes,
and the number for medium and long routes are entered on these cells.
And for the upcoming year, you make an estimate,
unless you think it's going to be the same as in the example,
what should the expected number of trips by group - short, medium, and long routes.
With that you are almost done.
The only last thing that you need to do is indicate whether you have a reliable count of unlinked trips at route level.
Because sometimes, an agency can have a count of unlinked trips at system-wide level, but cannot say for sure how many trips they had at route level.
Because sometimes the counter was not reset when a bus changed from one route to another.
So if you do know your count of trips at route level, you enter 1 here. Otherwise you enter 0.
That's it. Now if you go to the next tab, Period Plans, and that's what you're looking for,
it's going to show you this table, which shows your sampling options here.
As you can see, there is for each of annual frequency, quarterly, monthly, and weekly,
you can see that you have the base option, APTL option, and PPMT option.
And the first set of rows - this here is just the annual frequency.
That means that if you are using the base option, the sample size is 348.
But if you use the APTL option, the sample size is only 70.
That means you just need to sample 70 trips for the year.
As opposed to 348 for the base option.
You can see how different this can be, from one option to another.
And most of you would benefit from this APTL option, given the fact that most of you have 100% count of trips.
But even more than that, let's take a look. This part here with grouping.
It's going to show only if you grouped your data by group.
And it's going to show you what's the sample size for each of the groups you defined.
Short routes, medium routes, and long routes.
So, let's say in the APTL option.
When we do the short routes, that means you just need to sample 15 trips for the entire year.
The medium routes, they would be more, 46, and long routes 10.
But when you look here at the total, 71, there is not really a lot of difference of the required sample size without grouping.
Right? I mean, if you look at these two columns here, it's without grouping - and these are with grouping.
When you compare the totals, you see that in the case of the APTL option, there was not a big difference by grouping or not grouping.
So you may well prefer not to group, because that would be extra work.
But if you look at the base option - if you are stuck with the base option because you don't have a 100% count of trips -
then there is a difference, because 295 is the annual sample size in the grouping. It would be less than the sample size without grouping.
So these things are important to help you make a decision.
What's the most convenient and most efficient plan for you?
And another thing that helps you - because, as I mentioned, the interval in the sampling method - you do it quarterly, monthly, or weekly.
So if you are doing weekly, it shows you here how many samples you need to collect on a weekly basis for each of the options.
So in the base it would be seven, which results in a overall sample size of 364.
And the APTL option, as you can see, that would be 104.
And if you are doing monthly or quarterly, it shows you how many samples you need to collect by period
and the annualized sample size.
If you look and compare, it seems that there is a contradiction - because it says here that the APTL option requires only 70 trips.
Let's not think about grouping.
But when you go the weekly frequency, the annualized size is 104.
So it's 30 trips more than the annual frequency indicates.
The reason is that because of rounding.
These two trips here that you see was broken up - the result of the formula was not exactly two, it was, let's say 1.2.
The template rounds up.
So it rounds up every trip, and so when you add them up, you end up with a number that is greater than the annual frequency requirement would be.
Okay, it's just to be on the safe side.
The formula is, always round up the numbers.
Okay, so if you want to minimize this problem, instead of sampling too frequently - say on a weekly basis -
you can think on a monthly or even a quarterly basis.
Because look, in a quality basis the distance is almost the same.
You see, 70 to 72.
In a monthly frequency there is no change either, it is 72.
So perhaps the monthly frequency would be the best alternative to follow.
And so, what you see here is what all of this is about: to give you a sample size for each of these options that you have available.
Now let's talk about the interval-based plan.
And here, unfortunately, this is a limitation of the system at the present time -
which we are working to change in future releases of the template sampling plan -
which is when you want to explore your alternatives, you would like not just to try the period-based plans that are available
but you also want to look into interval-based plans.
Or you are basically are interested in interval-based, but you also would like to look at period-based plans
as a way of studying what's the best alternative available for you.
Here is the fact that data that we just saw entered in the period-based plan has to be re-entered.
The same data has to be re-entered again, because here the methodology is different,
and unfortunately of course, the ideal would be to have to enter only once, and the system fetches the data and does the proper calculations.
But here it's different because we need to enter the days of the week and the trips you had on that given day.
At this point, let's go back to the spreadsheet.
And the difference would be, is that I would, under the cover, the first template, I would check this box.
And by doing that, everything turns into interval-based.
So here there is an interval-based short read, which I recommend you to read,
and finally, it comes to the place where you enter information.
So the first thing is that you enter the number of week days that you operate - it's usually 5, 6, or 7.
The number of daily service units, that would be, let's say for instance,
during weekdays you have something like 500, on weekends you have less - you don't have to do an average or anything like that.
Just enter the maximum number of service units in any day of the week.
So in that case, let's suppose that Tuesday you have 500, which is more or less what you also have the other days of the week.
And you all here are going to see the plan is outlined differently.
Because we have here, let's say this plan was sampled every second day.
Look, it starts January 2nd, 2006. Then the next sampling day was January 4th, January 6th, 8th, 10th etc.
To December 30th, 2006.
There was like, I don't know how many samples.
We have here, look, 160 sampling days.
Because for each of these days, these numbers here represent a sample trip that day.
So as you can see, that tab can accommodate up to ten trips, which is more than sufficient in most cases.
So on January 2nd, 2006, three trips were sampled that day.
So it seems to me, from this example, that the guy was required to sample every second day, three trips randomly.
Three random trips.
And here you can see that the passenger miles traveled for trips 1, 2, and 3,
and the corresponding trips for trips 1, 2, and 3.
So as you keep looking here, let's say in one day they sample only two -
I mean it's not essential that all samples have exactly the same number of trips.
But what's important is that there's at least 2 samples every day, because that's how the system works.
You cannot just have one day to determine the sample size.
So we wanted this data here, which was for your last mandatory sampling year,
and the number of weekdays operated, and the number of days operated,
and you click on the interval plan, and it shows you what options are available to you.
In both APTL and base option.
So let's say the APTL option.
If you want to sample every day, you just need to do one random sample every day.
For a total sample size of 365.
Or if you want to do every second day, every third, fourth, fifth, and sixth.
The options 5 and 6 they show here because this is a seven day operation.
And the annual total sample it shows here, which is the count of what you are going to need to sample.
In the base option, the same thing. Except that as you can see, the sample sizes are significantly bigger than the APTL option.
Especially when you sample less frequently.
Sometimes you don't want to sample every day or every second day,
because the way you operate it's more convenient for you to do one sample a week, let's say.
That would mean every sixth day.
But that would - first of all, that wouldn't be available for the base option.
So when an option is not available, it shows an n/a.
You would have to do, let's say the every 6th day
if you can use APTL option (which you need 100% count of trips) and that requires two trips every 6th day, for a total of 122 samples.
Or if you want to do every day or every second day because it's more convenient to do less trips,
you can do fewer - the difference is insignificant.
But that's an example. Your data can show numbers that are quite different.
But that would be 183 samples. So you can pick whichever is available here.
So with this, this covers the most important features of the template, and I'm going back to covers some additional issues.
And I would like to talk a little bit about a very important feature - or not a feature, but a requirement when you do sampling -
which is the randomness of the sample.
We have noticed that sometimes a very precise and accurate work of collecting information was completely destroyed
because the way the trips were selected was not proper - was not random, as they should.
So by randomness, that means that when you're selecting trips in a day, all trips that you have available or are going to have service that day
should have an equal chance of being selected.
And that you wouldn't discard a sample based on some problems or difficulty that you face for collecting that sample.
So, for instance, I have seen cases of people that would discard any sample that fell randomly after 8pm
because the drivers wouldn't collect the information and there was nobody available after 8pm.
So that is wrong. Because by doing that - by discarding any sample that falls after a certain time or
discarding a sample that belongs to a certain route that is difficult to get the data for - you are invalidating the entire process of a 2 year effort.
So, that's why this is so important.
That violates NTD procedures by not using an accurate sampling method.
It skews results and generates validation questions for an analyst, may result, as I said, in the work of an entire year.
So it's essential for doing that.
The first thing that you need to look for is - it's important to have a complete set of all trips that are going to happen in a sampling day.
It may require sometimes a very good coordination between your scheduling department and whoever is responsible for sampling
if not the scheduling department.
In such a way that the scheduling department is always communicating and updating you on the schedule,
for all days including, very importantly, you are sampling.
Because you want to make sure that you have the complete universe to select from.
And there are many different ways.
I mean, the sampling reporting manual covers a way of generating random numbers that uses random number tables,
which we are not covering here, and I personally don't like them very much. But it's a valid process.
I just think that there are better alternatives.
And I am going to give you an example here of how you can generate a random sample.
And it's perfectly acceptable for NTD purposes, but I want to make sure that everyone understand there are many different other ways of doing this.
But I want to show you the example of the RANDBETWEEN function in Excel.
Which gives you a random number between two parameters, A and B.
So A is the initial number, let's say 1.
And B would be a number that would represent the last trip of that day, let's say you have 1,000 trips a day.
So if you do RANDBETWEEN (1, 1000) it generates a random number between 1 and 1000. Let's say 222.
If you can associate this number to a route and trip number, you just know exactly what you need to sample.
So, I have some examples here that show how to apply that.
Let's say, for instance, the period-based plans - remember what I said that the day that you sample is also random.
It's a weekly, monthly, quarterly basis.
In the interval-based plans, the day's not random, but the route and trip selection is.
Let's give an example here of a plan where the agency's doing period-based sampling weekly.
It has a weekday schedule and it has a Saturday schedule.
And let's say that they are required to sample four trips per week.
And it's a week that goes from November 8th through November 13, 2010.
So I outline here in this fictitious sample - it's a very simple example because I just want to show the technique.
I outline the data in a format that you absolutely don't have to follow exactly.
But for illustration purposes it works.
As you can see, I numbered the columns. Routes and trips.
So the 7 first columns of a route - the first route I call route 1 - it has 7 trips - are followed by route 2 that has 4 trips - and the 3rd route has 3 trips.
And here, the rows represent the days of the week.
So, let's say for Monday to Wednesday, all these routes and trips that I put here in the header are going to happen.
So I numbered them sequentially, as we can see, up to 42.
Just that Thursday, this third trip here for route three, it doesn't happen. The same for Friday.
So I skip this because this is not a trip that can possibly happen.
And I do this for Saturday as well, where I have less trips.
The following week it repeats again, the same pattern.
So as you can see, I just outline here, this is my universe of trips for that week.
There is no trip that's going to happen that's not here.
Because I coordinated with the scheduling department, or I know that whatever change would happen, I would be notified
and I would have a chance to update this.
Then what I need to do is to select four random trips of a total 74 trips. So how do I do that?
You can use the RANDBETWEEN function four times.
Okay, let's say that you get numbers 16, 28, 29. and 53.
And I highlight here in red those numbers.
And now you have your sampling schedule for that week.
You know that the first sample is on a Tuesday November 9th, and it's going to be route 1 trip number 2 - which has a time associated with it.
And I'm going to do another the same Tuesday - there is another one, which is route 3, the third trip.
And finally, then I have Wednesday and I have Thursday, and that's it!
So I just need to have someone - or the driver if it's the person responsible for collecting the data - to collect exactly these trips. Don't skip any!
If you miss or if you lose one for whatever reason, some extraneous circumstance that prevented you,
I'm going to talk about that during my closing remarks.
But it's very important that you keep as much as possible within what your random number generation gave you.
Let's say for instance, in the same example, for the second week, I already had a totally different set of trips to sample -
that's going to be Tuesday, Wednesday, and Thursday.
It could be any combination of days, routes, and trip numbers.
So you need to have, and to be aware - that week you need to have someone available all the time to collect sample data.
I mean, the driver is the most natural individual to think of, but I have some reservations about using drivers too.
So we can talk about that in a minute.
Just to give an example, this can become real overwhelming.
Let's say that for instance, instead of doing a weekly basis you want to do a monthly basis.
In that case, let's say that the plan requires 20 random trips.
And the numbers, even in this very simple case, there are like 424 trips to select that you need to account for - select 20 random trips.
And here's an example of 20 trips selected.
So in this monthly plan, you can easily identify the days of the week that you have to sample.
And route and trip numbers.
So again, in a real type of operation, this can become overwhelming, and this matrix becomes so large that it's unmanageable.
And if that happens, you should either think about a more frequent frequency, like weekly,
or you think about the interval-based plan, which requires you to sample systematically every X day.
Just a little thought here about demand-responsive paratransit, and it also applies to vanpool.
Is that it's very similar in concept to fixed-route, except that the unit, as we discussed before, is the vehicle day, and not the 1- or 2-way trips.
But the process is the same. Let's say that we have 14 vehicles from Monday to Tuesday, but then you have 11 the remainder of the weekdays
and on Saturday you have only two.
So you number them sequentially, and here's an example where the 36, which is vehicle 8, was selected, and the following week
it was already a different one.
So in this case it was one random trip per week.
And contrasting with these two cases, the interval-based example (which many of you should be familiar with because you use the 2710.1a plan)
I think it is less overwhelming. This is also a sampling schedule for an entire month.
But as you can see, there are less numbers to handle, because the numbers never get higher than 14, let's say.
Because here I'm sampling every second day.
You can see, highlighted in green are the required sample days.
And I have exactly the same set of trips as in the previous example.
Except that here, you can see that I have, let's say the requirement is two samples every second day.
So as you can see, the first day I selected these two trips, on Tuesday it would be this and this, etc.
And as you follow, you would then select here your plan for the entire month, of what trips to sample.
And here the day is - you know beforehand what day you are going to sample.
So as I was saying, this function RANDBETWEEN is very useful - I mean, there are many other ways,
if you make a search on the web you are going to find other functions that can give you random numbers and that are acceptable too.
This is just an example, because it's available in Excel.
And we pick the first and the last value, and it's going to pick a random number between these numbers.
Except that if you need to do four times, you need to apply it four times.
So that each time is going to give you a different random number.
So in order to enable the RANDBETWEEN function - and if you are interested in this method I ask you to make a note of this -
You need to go in the add-ins (this is Excel 2007, I don't know if 2010 is the same...)
You need to make sure that the analysis toolpack is checked - the box on the analysis toolpack is checked.
You click 'okay' and then it will enable the RANDBETWEEN function.
In the 2007 Excel version, the RANDBETWEEN function is disabled unless you enable it.
We discussed this. I mean, any changes in the schedule
you should always keep current your master file of trips because that's what you use to generate your random trips.
And that requires coordination between the departments of the agency, if there are more than one department involved.
And I haven't talked much about APC equipped buses. This is a topic of its own.
There are very specific requirements that NTD imposes for using APC data to report APC-based data to the NTD.
But in principle they can be used in the data collection process, for collecting not only unlinked trips,
boardings and alightings, but passenger miles itself. Our APC technology allows you to do that, but there are some requirements
in terms of maintenance plan, calibration plan where you compare APC data with manual counts to make sure there is consistency,
and some other considerations too.
But what is important to keep in mind - and I noticed in a few cases that some agencies
where sometimes the basic method for collecting data was APCs, but not the entire fleet was APC equipped.
And some routes would never have an APC equipped bus assigned to it because of operating details that I'm not aware of.
But that happens during the random sampling. Sometimes that route would be selected randomly.
Well, the agency would simply discard that sample because he could not assign an APC equipped bus to collect the data.
And by doing that, he was destroying his effort for the entire year because he biased his sample.
Only trips that are served with APC equipped buses were part of his final sample.
And I'm sure that most of you that have dome sampling before have seen a table like this,
where it shows actually the data that you collected in one sample - the ons and offs -
in this case you have the sequential stop numbers, and the distance between stops, and the number of boardings highlighted in red,
and the passenger miles traveled.
It's a very straightforward process, by which you just multiply essentially the load between two stops, times the distance between them -
and actually if you look, it's exactly what you see here in this graph in blue.
The passenger miles traveled is the area under this curve, which is this number here.
I'm not going into more detail because it's self-explanatory, here you can look and see how the calculations are made.
If you have questions I will be more than available to help you.
And finally, I want to talk about data validation.
It's very important that once you collect the sample data, it goes into a system of records,
like a spreadsheet or some other database - Access database, whatever,
where the data is entered and subjected to some minimum level of validation.
Such as, let's say there should be some correlation between passenger miles and unlinked trips.
In general passenger miles is greater than unlinked trips. Not always, but in general it is.
Or, like some black and white rules that you can apply to your samples and you can verify whether there is a typo or there is a problem
that in no circumstance can indicate that the data is correct.
Let's say for instance, the route mileage has to be greater than the average passenger trip length.
Because a passenger cannot travel more than the total mileage of the route.
Something like that is a black and white rule, that can be spotted from the data that you get from the drivers or whoever is collecting the information.
The load cannot be greater than the maximum load capacity.
So there are a number of rules that need to be followed in order to guarantee that the data is accurate.
So validating your data early in the process makes your life much easier.
Because once you submit that to the NTD, the analyst that's going to look at the data - they are going to compare estimates for the annual total.
It's going to address to you in a way that you may not have the answer immediately available,
that requires you to look into detailed records to find where the problem is.
But if you do basic data validation checks to your sample, if it cannot be correct,
there is up to 2% of your sample size, in general, you can discard because there was a mistake and it's not something that can be recovered.
And so, just some more tips here for you to read afterwards, because I would like to hear questions.
You know, you can prepopulate, for instance, the distance between stops.
So in other words, in a fixed-route system, where you don't have route deviation, you don't need to measure distances
because the distances are pre-set by the mileage of the stops.
So you just need to collect ons and offs.
And that's what we wanted to cover.
This was developed by Dr. Xuehao Chu from the University of South Florida.
And we are very thankful for him, for his work.
And if you have any questions, don't hesitate to contact us. Myself or our project manager John Giorgis, or the NTD help desk.
We are here to help.
And I enjoy this type of thing, if you want to discuss your case with me I would be more than glad to do that.
And at this point, I will open then for your questions. Thank you.