Data Analysis Methods and Software Applications Final Project

Paper Title: Analyzing the Effect of Household Travel Survey Characteristics
on Number of Trips Made By the Household

Paper Type: Final Course Project
University: Beijing Jiaotong University, China
Major: Traffic and Transport Engineering
Writer: Yalew, Solomon Tesfaye (@22129039)
Degree: Masters
Course: Data Analysis Methods and Software Applications
Research Paper Submitted To: Professor Ma
Year: November 30, 2022
Acknowledgement
My special thanks is to our best Professor Ma for his devoted effort made to introduce us the basic
lessons needed for future career and in guiding us in each and every stage of this work. This work is
only possible because of the constant effort on the course of study, so I want to thank and extend my
sincere appreciation for his valuable advice, constant support, commitment, dedication,
encouragement, precious guidance, creative suggestions, critical comments and his devoted effort
made to introduce the basic lessons needed for future career related Modelling Methods and
Applications in Traffic and Transportation. To conclude, I am grateful and wish him all the bests for
his patience and a strong commitment to the advancement of this course and I accept full responsibility
for any errors that may occur.
Analyzing the Effect of Household Travel Survey Characteristics on Number of Trips Made By the Household
Abstract
With increasing diversification of lifestyles in recent decades, it is becoming more necessary to
consider factors other than gender and age, such as household composition, for accurate travel
behavior analysis. Understanding the amount and type of travel by the residents is important for
planners and policymakers. This study was conducted purposely for addressing the relation to the
change of household characteristics influence number of trip made by the household in the study areas.
Household travel data is a critical component of the travel-demand forecasting process. In addition to
providing information on regional travel characteristics, these data are used to estimate and update
travel-demand models designed to analyze proposed transportation policy decisions. The foundation
of this report is the data collected in household travel survey sample of 2000 households drawn from
the 1990 US National Personal Transportation Survey (NPTS). Household and personal
characteristics influence average number of trips made by the household. The objective of the paper
was to obtain household travel survey data and to determine factors of household characteristics, and
identifies possible interrelationship between travel attribute of number of trips made by the households
and household characteristics variables This study employs simple statistical analytical methods
applied to determine relationships between household characteristics variables and travel attributes.
In addition, Descriptive analysis is used in this paper provides a clearer understanding of household
characteristics behavior in the state travel groups, such as households number of persons, income,
workers, age, driving populations, and number of automobiles in the household. Furthermore,
regression and correlational analyses were used to explore relationships and correlation between
household travel characteristics groups and number of trips made by the households. Which household
composition is an important indicator for trip pattern analysis as and an aggregate level understanding
how the household characteristics is used and how it serves the people and is a critical component of
developing policies, plans, and programs that optimize system performance, provide for the mobility
needs of travelers, and maintain economic vitality. Also, this information is important for assessing
system performance for economic vitality, mobility, equity, and a host of other system objectives.
Key words: Household Characteristics, Number of Trips Made by the Household, , Descriptive
Analysis, Correlational Analyses, Normality/ Skewness, Regression Analyses, SPSS
Beijing Jiaotong University, China 1

1. Introduction
1.1 Background of the Paper
As the country has continued to evolve and change over time, travel behavior has also changed. Many
factors influence transportation demand and in different ways. Some trends increase demand; some
decrease demand; and others shift demand across modes, time of day, and geographies. Technological
advances, coupled with demographic and economic trends, have made it incredibly difficult to predict
the transportation future (Jacob, 2019). While much attention is focused on the commute trip, people
travel the most for social and recreational purposes. A primary way to track and understand this
change has been through the National Personal Transportation Survey (NPTS). The NHTS is
conducted by the U.S. Department of Transportation’s Federal Highway Administration every five to
eight years. Information on these surveys can be found online at the NHTS website. Collectively, data
from recent surveys indicate that potential causes for these fluctuating trends are saturation in
suburbanization and auto availability, aging of baby boomers, and perhaps some fundamental changes
in personal priorities and economic capacity for away-from-home activities and travel.
Several factors appear to be impacting travel behavior in ways not observed over the past several
decades: advances in communication technology as a potential substitution for travel, increases in
fuel costs, effects of environmental concerns regarding the impact of roadway travel, and fundamental
changes in the income and wealth of the public. If a "new normal" for person travel is being shaped,
it is important for planners and policymakers to understand it (Nancy, 2016). The uncertainty in future
per capita travel demand as well as in demographic and economic growth post-recession suggests a
significant degree of uncertainty regarding estimates of future overall travel demand. Thus, there is
greater uncertainty regarding future transportation infrastructure and service needs. This suggests that
transportation planning will need to embrace scenario planning to explore the robustness of plans in
light of various forecasts for future travel demand. Collectively, these factors create a challenging but
exciting time for transportation planning (Tomasz, 2016).
Different studies are employed simple statistical analytical methods applied to determine
relationships between household characteristics variables and travel attributes. The overall findings
indicated that, trips made by the households for work, school, shopping and recreation based on
household size, household income, number of workers in the household, and number of persons aged
in the household, automobiles in the household and number of drivers in the household (Nancy, 2016).

The cost of collecting and analyzing data on travel behavior is a small fraction of the annual
transportation capital program in any state or metropolitan area in the country (Chandra, 2002).
Household travel surveys are the primary means for collecting travel-behavior data in a region. They
provide information to measure and assess transportation system performance, and data used in the
prediction of future demands on the regional transportation system. Other reasons for their conduct
may include the replacement of a prior survey and the development of inputs for re-calibrating an
existing travel forecasting model. Finally, household travel surveys may serve a number of secondary
purposes such as measuring public reaction to a proposed transportation policy decision such as the
imposition of tolls or addition of a light-rail service (Jacob, 2019). Household travel surveys (HTS)
are an important data source for transport planning and research. Household size has consistently
proven the most significant variable in predicting trip rates. Vehicle ownership or availability is also
highly significant in predicting trip rates (McDonald and Stopher, 2014) and is critical for predicting
mode shares (Ou and Yu, 2008). Household income is also significant in predicting trip rates and is
highly significant in the prediction of work trip lengths (Elmi et al., 2017). According to, 1995 NPTS
data. Clearly, while household size is the most critical variable, vehicles and income have significant
impacts across the size categories.
Household surveys are stratified sample surveys. A sampling plan is developed that specifies the
number of households and number of trips made by the household to be surveyed cross-classified by
household size, household income, number of workers in the household, and number of persons aged
in the household. In some areas, the sampling stratification also includes the number of automobiles
in the household and number of drivers in the household. The number of household survey to be
analyzed is based on the estimated number and distribution of households in the population and the
expected amount of travel that will be generated by those households. Statistically, the sampling plan
is designed to evaluate an overall relationship between the characteristics household survey and level
of total person trips in the households (Nancy, 2016).
The household survey was designed to measure the amount of household travel. A target population
is identified, from which a sampling frame is defined. For travel surveys the target population is
further defined as residents in households. Most travel surveys limit households to non-institutional
and non-group homes (such as penitentiaries, dormitories, hospitals, and nursing homes). Specific
information was gathered from 2000 different household surveys ranging from the poor, middle

incomes to rich households. The discussion in this section combines information from the review of
these 2000 surveys with survey methodology studies found in academic publications.
Household travel data is a critical component of the travel-demand forecasting process. In addition to
providing information on regional travel characteristics, these data are used to estimate and update
travel-demand models designed to analyze proposed transportation policy decisions. The data are
typically generated through a household based survey in which a sample of the population records
their travel patterns over a given time period. This information is combined with socio demographic
information about the sample to develop relationships between individual/household characteristics
and their observed travel patterns (Stephen, 2000). While we may be at a tipping point in terms of
fundamental travel behavior trends, transportation resources are increasingly scarce. We cannot
afford to make decisions that are not based on the best possible data on travel behavior (Tomasz,
2016).
1.2 Objective of the Paper

The study objective was to determine factors of household characteristics, and identifies possible
interrelationship between travel attribute of number of trips made by the households and household
characteristics variables, such as households’ number of persons, income, workers, age, driving
populations, and number of automobiles in the household. The foundation of this report is the data
collected in household travel survey sample of 2000 households drawn from the 1990 US National
Personal Transportation Survey (NPTS). To achieve this objective, two research questions were
investigated;
1. Whether there is relationship between numbers of trips made by the individual household other
household characteristics variables.
2. Whether change of household characteristics variables influences individual household travel
number.
1.3 The Paper Limitations

Due to household travel survey analysis is a broad concept the paper is not be developed provide
feedback and may be encountered some limitations due to lack time and some conflicts arising from
different policy issues of countries.

2. LITERATURE REVIEW
2.1 Introduction
The United States relies on its highway system to connect people to their jobs, their communities, and
essential services. Transportation is also especially critical for personal and business mobility by
connecting people and their goods. In 2019 alone, total VMT approached 3.3 trillion miles.
Historically, factors that influence growth in travel beyond population growth include many factors,
such as the age distribution of the population, vehicle ownership, licensure rates, household size, labor
force participation, and income. All of these factors influence travel demand; travel demand
characteristics such as mode, distance, and purpose; and travel demand distribution across population
groups and geographic areas. Over the past five decades, household travel demand has consistently
outpaced population growth. It is measured in person miles traveled (PMT), which accounts for travel
on all modes of transportation, and VMT, which accounts for travel by personal vehicle. Figure 46
compares trends in PMT and VMT from 2001 to 2017.The growth in PMT has outpaced the growth
in VMT. This means that travel via other modes has grown faster than travel by personal vehicle.
The following section reported on a variety of household characteristics obtained from the household
travel survey. In this section, these household and person characteristics are related to household
travel characteristics. Household size, household income, household life cycle, household vehicle
availability, household licensed drivers, and household employment all affect the amount of
household travel. Household Persons living at the same residential address who share meals and have
some type of relationship. Household Characteristics provides an overview of the type of structures
that renters live in, as well as their household living arrangements, number of members per household,
and their number of vehicles per household, etc. The household sector includes individuals or groups
of individuals (Jacob, 2019).
2.2 Household characteristics

A household may consist of one or several people who live in the same household and share food.
These characteristics include average household income, number of household vehicles, number of
household members, number of workers in the household, if the house is owned or rented, if the
household incudes one or more children, and a combination of one versus multiple member
households and households with one or more persons old persons. They may include one family or
another group of people (Nancy, 2016).

The first step in setting up the simulation was to categorize the households into relatively
homogeneous groupings with respect to each dependent variable. The initial idea was to develop one
household schema to predict all four trip attributes. This would entail treating each attribute as a
multivariate entity (i.e., simultaneously) and is one approach used to model household activity-travel
patterns for the TRANSIMS project (Vaughn et al., 1999). In recent years, the incorporation of
powerful classification algorithms into standard statistical software has increased the options
available for this type o f a market segmentation problem. In this paper, the SPSS add-on module was
used to assist with the initial delineation of categories the difference with respect to a dependent
variable.
Both households and families are the basic units in the society to define demography. The main
difference between family and household is that a family refers to a group of members who maintain
kinship with each other by living in the same dwelling or different dwellings, whereas a household
refers to a group of people who may or may not maintain kinship with each other while living in the
same dwelling (Tomasz, 2016). The living arrangements of individuals may have different stages.
Thus, an individual may live in a family household with parents and may leave the family household
when he or she grows up. After that, they may live in a non-family household with their friends.
Likewise, one can form a new household with a spouse and have children. Consequently, it becomes
a separate family household. Due to many reasons like migrations, many people around the world
live in non-family households. Therefore, all households cannot be considered as families. A
household refers to a small social unit composed of members living in the same house, apartment, or
annex. Here, a group of people shares the same dwelling irrespective of their kinship (Westat, 2019).
Most of the time, members of a household are family members.
But in some instances, there can be students and workers who share the same house. These are non-
family households – a dwelling shared by non-related people. In contrast, family households consist
of two or more individuals who share a blood relationship, kinship, or adoption. The main difference
between family and household is that a family refers to a group of members who maintain kinship
with each other while living in the same dwelling or different dwellings, whereas a household refers
to a group of people who may or may not maintain kinship with each other while living in the same
dwelling (Tomasz, 2016).

2.3 Household Members
Household members include all people, whether present or temporarily absent, whose usual place of
residence is in the sample unit. Household members also include people staying in the sample unit
who have no other usual place of residence elsewhere and does not include anyone who usually lives
somewhere else or is just visiting, such as a college student away at school (NHTS, 2014).
2.4 Household Size

Household size is a variables used to stratify the household trip rates calculated from the household
travel surveys. These variables are essential for forecasting future travel. Trip rates and trip rate
variability from previous household surveys are used for this calculation. The desired number of
households to be surveyed in each cell is calculated considering the number of households within
each cell and the expected amount (Nancy, 2016). Household characteristics affect the social and
economic well-being of the members of the household. Large household size may be associated with
crowding. A typical household makes a certain number of trips on most days to meet household needs
such as purchasing food and other necessities, earning an income, attending school, visiting friends
and family, receiving medical car and attending events. For this reason, the number of households is
a better predictor of future travel than using the number of persons (NHTS, 2017). Household size
may have an impact on whether an individual may choose to ride transit. It can be seen in one- or
two-person households constitute more than half of the transit households in the state. Smaller
households could be more likely to use transit because they usually do not include children. The
presence of children in a household typically adds an additional layer of complexity in trip planning
that often discourages transit usage. (Socialdata, 2009).
2.5 Household Income

Household income is the money earned by all family members in a household, including those
temporarily absent. Annual income is the income earned 12 months preceding the interview.
Household income is one of the primary variables used to estimate household trip rates (NHTS, 2017).
As household income increases, the amount of household travel tends to increase. Additionally, as
income increases, vehicle ownership tends to increase and additional financial resources are available
to the household to support increased travel. For sampling purposes, these income ranges were
selected to produce income distributions roughly equal to quintiles of households in the study area
(Gargi, 2020).

Typically, travel is positively correlated with income. Therefore, it can generally be expected that
individuals in households with higher income will tend to travel more than individuals residing in
lower-income households. The analysis reveals that travelers of all income levels tend to take fewer
daily trips than the average traveler in the U.S. Therefore, income affects the number of trips
individuals take as well the distance traveled in each trip. Higher-income households made
substantially more trips and traveled more miles on average than lower-income households.
Households in the highest income quintile made more trips than those in the lowest income quintile,
and, on average, those trips (Elmi et al., 2017).
2.6 Education Level

The number of years of regular schooling completed in graded public, private, or parochial schools,
or in colleges, universities, or professional schools, whether day school or night school. Regular
schooling advances a person toward an elementary or high school diploma, or a college, university,
or professional school degree (NHTS, 2017). Education is an important variable with regard to its
association with demographic behavior. Higher education is usually associated with greater
knowledge and use of well-being practices methods. Also, Education is a key determinant of the
lifestyle and status an individual enjoys in a society. It affects many aspects of life, including
demographic and transport demand. Overall, educational attainment is higher in urban areas than in
rural areas. The proportion with no education in urban areas is about one-third that in rural areas
(Westat, 2019).
2.7 Number of Workers in the Household

The number of full-time workers per households that can be expected to provide support needs to be
country-specific, should usually differ between rural and urban areas in a country, and should be
region-specific in large countries. Labor force is the sum of the number of persons employed and the
number of persons unemployed. The labor force participation rate is the ratio of the labor force to the
working age population, expressed as a percentage (ILO, 1982). Thus, labor force participation rate
provides an estimate of the probability that someone is either working/economically active or
unemployed/looking for work (Westat, 2019).
2.8 Household Vehicle Availability and Licensed Drivers

A household vehicle is a motorized vehicle that is owned, leased, rented or company-owned and
available to be used regularly by household members. Household vehicles include vehicles used

solely for business purposes or business-owned vehicles, so long as they are driven home and can be
used for the home to work trip, (e.g., taxicabs, police cars, etc.). Household vehicles include all
vehicles that were owned or available for use by members of the household during the travel day,
even though a vehicle may have been sold before the interview. Vehicles excluded from household
vehicles are those that were not working and were not expected to be working, and vehicles that were
purchased or received after the designated travel day (NHTS, 2020).
The number of vehicles available to members of a household for making trips and it is the number of
occupants in a vehicle during a vehicle trip including the driver of the vehicle. In general, as the
number of vehicles available to the household increases, daily household travel increases. This
household characteristic also affects forecasting and the demand for public transportation. As
household vehicle availability increases, the household demand for public transportation tends to
decrease. Different studies shows, the household trip rates as a function of the number of vehicles
available to household members for travel. As expected, households with no vehicles available made
fewer trips per household than those households that have vehicles available; however, note that
households with no vehicles still make a meaningful number of trips (CAMPO, 2017).
Generally, vehicle ownership is one of the most important factors influencing transportation demand
is vehicle availability. Growth in vehicle ownership has far outpaced growth in U.S. population, more
than tripling in the last four decades. Vehicle Ownership Vehicle ownership varies across the Nation.
Overall, 8.6 percent of U.S. households do not have access to a vehicle (either by choice or by
circumstance) according to the 2019 American Community Survey. Not surprisingly, income is one
of the major determinates of the number of vehicles in a household (i.e., lower-income households
tend to own less or no vehicles compared to higher income households). However, additional factors
influence vehicle ownership besides income. Households with no vehicles were more likely to live in
urban areas, be renters, and have incomes under $25,000 as compared to households with at least one
vehicle This is likely due to changes in household size, labor force participation, and access to
alternative transportation modes (such as on-demand transportation and shared modes). For example,
as household size decreases, the number of vehicles per household also declines, as there are fewer
drivers (McDonald and Stopher, 2014).
2.9 Number of persons aged in the household

Trip rates measure how often a particular individual travels on a typical day. Daily trip rates were
computed by age and then again by household income. The comparison of trip rates by age travelers

can be presented. Historically trip characterized by a highly-mobile older adult population. Trips
made by travelers older age make up a larger percentage of trips compared with the younger
(Socialdata, 2009). NHTS data show that travel rates typically peak for individuals ages 40 to 64 and
start to decline slowly for older travelers both in Florida and the U.S. Not surprisingly, individuals
age 85 and older travel less than the rest of the population. The analysis reveals that Florida travelers
in all age groups younger than 69 years travel less than the average traveler in the U.S. Overall,
Florida travelers take slightly fewer trips per day than the average U.S. traveler (3.6 trips per day for
Florida travelers vs. 3.8 trips per day for U.S. travelers). At the same time, Florida travelers age 70
and over take more trips than the average U.S. traveler in the same age groups. The last observation
is consistent with prior NHTS analyses that showed higher-than national-average mobility of the older
adult population in Florida. Factors hypothesized to explain this include the Florida land use pattern
and the perception that many of the retirees that choose to live in Florida are, by their very nature,
active (as indicated by their desire and ability to relocate to Florida in retirement) and thus more likely
to be out and about taking advantage of Florida's weather and attractions.
2.10 Number of Trip Made by the Household

Trip: A trip is defined as a one-way course of travel having a single main purpose, e.g. a walk to
school or a trip to work without any break in travel. However, sometimes people go out for a number
of reasons, or go out for one main reason but carry out a number of different activities, perhaps at
different places. Complex travel like this is broken into separate trips so that the data can be analyzed.
Where a stop is entirely secondary to the main purpose (such as a stop to buy a newspaper on the way
to work), the stop is disregarded. Trip rates measure how often a particular individual travels on a
typical day. Daily trip rates can be computed by different household characteristics (Westat, 2019).
A person trip is a trip by one or more persons in any mode of transportation. Each person is considered
as making one person trip. For example, four persons traveling together in one auto are counted as
four person trips. Trip Travel between two addresses for the purposes of carrying out one or more
activities (e.g., a trip from home to work or family trip from home to the beach) or a trip can be
defined as the individual movement by motorized means of transport in one direction (NHTS, 2017).
Each trip possess two ends, the first is located in the start of trip (origin) and the second in the trip
end (destination). Trips are usually divided into home-based and non-home-based. The trip generation
process aims at estimating the total number of trips generated from and attracted to each traffic
analysis zone of the study area for each trip purpose. It predicts the number of trips originating in or

destined for a particular traffic analysis zone (TAZ). As the household size increases, the household
trip rates increase. For travel forecasting applications, households with five or more household
members are grouped and an average trip rate is used to represent that group (CAMPO, 2017).
Trip generation analysis focuses on residences and residential trip generation is thought of as a
function of the social and economic attributes of households. At the level of the traffic analysis zone,
residential land uses "produce" or generate trips. Traffic analysis zones are also destinations of trips,
trip attractors. The analysis of attractors focuses on non-residential land uses (Westat, 2019).
Although different urban settings will have different trip compositions, most of the trips undertaken
in urban areas across the world are work-based:
 Work. Commutes performed towards the workplace, which represent approximately of daily
commutes.
 Business (work). Trips from the workplace to a business destination.
 Personal. Trips related to personal activities such as restaurants, the library, or the post office.
 Shopping. Commutes towards any store regardless of its size, merchandise, or whether or not
any purchases are made.
 Social and recreational. Social trips are related to activities such as visiting family and friends.
Recreational trips are performed with the intention of recreation such as cultural or sports
events. These trips represent about 27% of daily commutes.
 Education. Commutes towards a learning establishment by those seeking any type of training,
regardless of the level of learning. These commutes represent 10% of the daily travel total.
The main factors affecting personal trip production include:

 Income.
 Vehicle ownership.
 Household size and structure.
 Type of dwelling unit.
 Land use.
 Distance of the zone from city center (CBD).
 Accessibility to public transport system.
 Employment opportunities and Number of workers
 Age of head of household
 Driving populations
 Household tenure (own, rent/other).

2.11 Techniques of Trip Generation Analysis
There are several general alternative structures for specifying trip generation analysis models and
statistical analytical methods applied to determine relationships between household characteristics
variables and travel attributes. Some of them are stated on the below sections.
2.12 Simple Statistical Analytical Methods

Statistical analysis, or statistics, involves collecting, organizing and analyzing data based on
established principles to identify patterns and trends. It is a broad discipline with applications in
academia, business, the social sciences, genetics, population studies, engineering and several other
fields. Statistical analysis has several functions. You can use it to make predictions, perform
simulations, create models, reduce risk and identify trends (IET, 2022).In general, Statistical Analysis
is the science of collecting, exploring, organizing and exploring patterns and trends using its various
types, each of the types of these statistical analysis uses statistic methods such as, Regression, Mean,
Standard Deviation, Sample size determination and Hypothesis Testing. It is results in the output that
is used by the organizations to reduce the risk and predict the upcoming trends to make their positions
in the competitive market.
2.12.1 Descriptive Research Analysis

The descriptive research analysis is a type of study that companies use to understand the specific
subject matter. It’s something anyone can do, but only if they understand the purpose of this analysis.
It can only do so much, and if you’re not aware, it may not be helpful. Descriptive research is
understanding the “what” rather than the “why” about a particular phenomenon. The focus falls to
what something is based on unbiased information (Kiesha, 2019).
The method has a number of advantages over difference testing in that it is quantitative and can be
used to describe differences between products and the main sensory drivers (be they positive or
negative, identified within products or especially when combined with objective consumer testing
and objective multivariate data analysis). However, the method can be expensive and time consuming
because of the necessity to train and profile individual panel lists over extended periods of time; days
or even weeks. It is also not a method that can be readily used for routine analysis. Later we will
discuss ‘flash profiling’ (FP) as a compromise method of analysis. Descriptive analysis is a method
where defined sensory terms are quantified by sensory panel lists. A list of descriptive terms are
determined initially and are referred to as a lexicon or descriptive vocabulary and describe the specific
sensory attributes in a meat sample and can be used to evaluate the changes in these attributes (Byrne

and Bredie, 2002). There are two methods of descriptive analysis, the Spectrum and the QDA
(quantitative descriptive analysis) methods. The Spectrum method’s principal characteristic is that
the panel list scores the perceived sensory intensities with reference to pre-learned ‘absolute’ intensity
scales.
Data aggregation and data mining are two techniques used in descriptive analysis to churn out
historical data. In Data aggregation, data is first collected and then sorted in order to make the datasets
more manageable. Descriptive techniques often include constructing tables of quintiles and means,
methods of dispersion such as variance or standard deviation, and cross-tabulations or "crosstabs"
that can be used to carry out many disparate hypotheses. These hypotheses often highlight differences
among subgroups. Measures like segregation, discrimination, and inequality are studied using
specialized descriptive techniques. Discrimination is measured with the help of audit studies or
decomposition methods. More segregation on the basis of type or inequality of outcomes need not be
wholly good or bad in itself, but it is often considered a marker of unjust social processes; accurate
measurement of the different steps across space and time is a prerequisite to understanding these
processes (Ayush, 2021).
A table of means by subgroup is used to show important differences across subgroups, which mostly
results in inference and conclusions being made. When we notice a gap in earnings, for example, we
naturally tend to extrapolate reasons for those patterns complying. But this also enters the province
of measuring impacts which requires the use of different techniques. Often, random variation causes
difference in means, and statistical inference is required to determine whether observed differences
could happen merely due to chance. A crosstab or two-way tabulation is supposed to show the
proportions of components with unique values for each of two variables available, or cell proportions.
For example, we might tabulate the proportion of the population that has a high school degree and
also receives food or cash assistance, meaning a crosstab of education versus receipt of assistance is
supposed to be made. Then we might also want to examine row proportions, or the fractions in each
education group who receive food or cash assistance, perhaps seeing assistance levels dip
extraordinarily at higher education levels (Ayush, 2021).
Column proportions can also be examined, for the fraction of population with different levels of
education, but this is the opposite from any causal effects. We might come across a surprisingly high
number or proportion of recipients with a college education, but this might be a result of larger
numbers of people being college graduates than people who have less than a high school degree

(Ayush, 2021). This is an out and out advantage of the survey method over other descriptive methods
that it enables researchers to study larger groups of individuals with ease. If the surveys are properly
administered, it gives a broader and neater description of the unit under research.
2.12.2 What is Regression Analysis

Regression analysis is a set of statistical methods used for the estimation of relationships between a
dependent variable and one or more independent variables. It can be utilized to assess the strength of
the relationship between variables and for modeling the future relationship between them. Regression
analysis includes several variations, such as linear, multiple linear, and nonlinear. The most common
models are simple linear and multiple linear. Nonlinear regression analysis is commonly used for
more complicated data sets in which the dependent and independent variables show a nonlinear
relationship. Regression analysis is the relationship between dependent and independent variables as
it depicts how dependent variables will change when one or more independent variables change due
to factors. Therefore, the formula for calculation is Y = a + bX + E, where Y is the dependent variable,
X is the independent variable, a is the intercept, b is the slope, and E is the residual (Ashish, 2022).
Using regression analysis helps you separate the effects that involve complicated research questions.
It will allow you to make informed decisions, guide you with resource allocation, and increase your
bottom line by a huge margin if you use the statistical method effectively. Regression analysis
statistically shows if two elements relate to one another, though it's also important to consider human
intuition along with the data. Skilled managers and smart companies can look at regression analysis
results and compare them against their business wisdom, experience and understanding of the
situation. If the data you receive from your regression analysis doesn't seem right or if the study's
error terms are off, ask a more experienced colleague for their opinion. This can help you learn when
to look at both the analysis results and the human aspects that can affect an outcome (Carrier Guide,
2022).
 Simple regression analysis can estimate the relationship between a dependent variable and a
single independent variable. For example, you could assess the connection between how much
money a person makes and their education level or the number of crop yields compared to the
seasonal rainfall. Some individuals may also refer to this method as a single regression
analysis.
 In comparison, you can use multiple regression analysis to estimate the relationship between
a dependent variable and two or more independent variables. For example, you might evaluate
the relationship between how much money a person makes and their experience, education

and geographic location. Running a multiple regression analysis study is more complex, but
it offers more realistic and specific results than simple regression analysis.
 Regression analysis evaluates the strength of the correlation between independent and
dependent variables.
 While a simple regression analysis evaluates the relationship between two variables, multiple
regression analysis assesses the correlation between a dependent variable and more than one
independent variable.
 Businesses can use regression analysis to predict future sales, evaluate growth opportunities,
explain past occurrences and make strategic decisions.
2.12.3 Correlation Analysis

Correlation Analysis is statistical method that is used to discover if there is a relationship between
two variables/datasets, and how strong that relationship may be. Correlation analysis in research is a
statistical method used to measure the strength of the linear relationship between two variables and
compute their association. Simply put – correlation analysis calculates the level of change in one
variable due to the change in the other. In correlation analysis, we estimate a sample correlation
coefficient, more specifically the Pearson Product Moment correlation coefficient. The sample
correlation coefficient, denoted r, ranges between -1 and +1 and quantifies the direction and strength
of the linear association between the two variables (FlexMR, 2019). The correlation between two
variables can be positive (i.e., higher levels of one variable are associated with higher levels of the
other) or negative (i.e., higher levels of one variable are associated with lower levels of the other).
The sign of the correlation coefficient indicates the direction of the association. The magnitude of the
correlation coefficient indicates the strength of the association. For example, a correlation of r = 0.9
suggests a strong, positive association between two variables, whereas a correlation of r = -0.2 suggest
a weak, negative association. A correlation close to zero suggests no linear association between two
continuous variables. It is important to note that there may be a non-linear association between two
continuous variables, but computation of a correlation coefficient does not detect this. Therefore, it is
always important to evaluate the data carefully before computing a correlation coefficient. Graphical
displays are particularly useful to explore associations between variables (Boston University, 2013).
 Positive– A positive correlation implies that this straight correlation is positive, and the two
factors increased or lessened in a similar heading.
 Negative– A negative correlation is an exact inverse, wherein the correlation line has a
negative slant and the factors move opposite to one another, i.e., one variable reduces while
different increases.
 No correlation– No correlation essentially implies that the factors act contrastingly and,
along these lines, have no linear correlation.

3. Paper Methodology
In determining relationships between household characteristics and number of trips made by the
household, data from 1990 US National Personal Transportation Survey (NPTS) sources were
applied. The main data were a sample of 2000 households travel surveys report in the study area that
drawn from the 1990 US National Personal Transportation Survey (NPTS). The report provides
information Total number of persons in the household, Number of persons aged 0-4 in the household,
Number of persons aged 5-21 in the household, Number of workers in the household, Number of
drivers in the household income Household income, Number of automobiles in the household, and
about Number of trips made by the household. A total of 2000 households, 5347 people, and 14,164
trips were ultimately described.
There is no specific study which has been conducted to address relationships between household
characteristics and number of trips made by the household. But some of recommendations from
different researchers were taken as advantage for this study. Therefore, this study employs simple
statistical analytical methods applied to determine relationships between household characteristics
variables and travel attributes. Descriptive analysis is used in this paper provides a clearer
understanding of household characteristics behavior in the state travel groups, such as households
number of persons, income, workers, age, driving populations, and number of automobiles in the
household. Furthermore, regression and correlational analyses were used to explore relationships and
correlation between household travel characteristics groups and number of trips made by the
households.
In conclusion , Seven household characteristics are used in describing number of trips made by the
household in this document: number of persons in the household, number of persons aged 0-4 in the
household, number of persons aged 5-21 in the household, number of workers in the household,
number of drivers in the household, household income, and household vehicle ownership. Dependent
Variables describing number of trips made by the household (such as: Work trips - trips generated by
individual households for work purposes School trips - trips generated by individual household for
school purposes Shopping trips - trips generated by individual household for shopping purposes
Recreation trips - trips generated by individual household for recreation purposes). Independent
Variables-Household Characteristics (such as: number of persons in the household, number of
persons aged 0-4 in the household, number of persons aged 5-21 in the household, number of workers
in the household, number of drivers in the household, household income, and household vehicle
ownership).

4. Results and Discussions
4.1 Descriptive Analyses
Descriptive analysis basically comprise measures of central tendency and variability. Both concepts
are easy to understand from a statistical perspective. As with the name, measures of central tendency
emphasize “central” or “middle/average” values in the data. The first output from the analysis is a
table of descriptive statistics for all the variables under investigation. Typically, the mean, standard
deviation, and the number of respondents (N) who participated in the survey are given. The mean
value describes the characteristics of the most common response among the stated dataset. Therefore
there is no minimum value required.
The descriptive analysis includes mean, standard deviation, skewness and kurtosis values. Mean or
average value, a measure of central tendency, is popularly used to indicate the center of distribution.
In addition, the standard deviation is used to see how the data have deviated from the mean. Kurtosis
and skewness are generally used to delineate the shape of the distribution. The mode is the value that
has the most occurrence frequency. Different levels of measurement can be measured using the central
tendencies as grouped: Nominal Mode; Ordinal: Median, Mode; and Scale: Mean, Median, Mode
Descriptive statistics are numerical and graphical methods used to summarize data and bring forth the
underlying information. The numerical methods include measures of central tendency and measures
of variability. Descriptive statistics in SPSS can be accessed by clicking Analyze Menu → Descriptive
Statistics. Detailed information can be obtained using Frequencies, Descriptive, Explore or Crosstabs.
There are, however, different procedures depending on whether you have a categorical or continuous
variable. Some of the statistics (e.g. mean, standard deviation) are not appropriate if you have a
categorical variable. Key Takeaways
 Descriptive statistics summarizes or describes the characteristics of a data set.

 Descriptive statistics consists of three basic categories of measures: measures of central
tendency, measures of variability (or spread), and frequency distribution.
 Measures of central tendency describe the center of the data set (mean, median, mode).
 Measures of variability describe the dispersion of the data set (variance, standard deviation).
 Measures of frequency distribution describe the occurrence of data within the data set (count).

4.1.1 Total Number of Persons in the Household
Frequencies
Statistics
Total number of persons in household
N Valid 2000
Missing 0
Mean 2.67
Median 2.00
Mode 2
Std. Deviation 1.416
Variance 2.004
Skewness .943
Std. Error of Skewness .055
Minimum 1
Maximum 10
Sum 5347

Cumulative
Frequency Percent Valid Percent Percent
Valid 1 429 21.5 21.5 21.5
2 657 32.9 32.9 54.3
3 374 18.7 18.7 73.0
4 325 16.3 16.3 89.3
5 145 7.2 7.2 96.5
6 49 2.5 2.5 99.0
7 13 .7 .7 99.6
8 2 .1 .1 99.7
9 4 .2 .2 99.9
10 2 .1 .1 100.0
Total 2000 100.0 100.0

4.1.2 Number of Persons Aged 0-4 in the Household

Frequencies
Statistics
Number of Number of
persons aged 0-4 persons aged 5-21
N Valid 2000 2000
Missing 0 0
Mean .21 .56
Median .00 .00
Mode 0 0
Std. Deviation .518 .936
Variance .268 .876
Skewness 2.628 1.771
Std. Error of Skewness .055 .055
Minimum 0 0
Maximum 3 7
Sum 414 1119

Frequency Table
Number of persons aged 0-4
Cumulative
Valid 0 1683 84.2 84.2 84.2
1 227 11.4 11.4 95.5
2 83 4.2 4.2 99.7
3 7 .4 .4 100.0
Total 2000 100.0 100.0
Bar Chart

Number of persons aged 5-21
Cumulative
Valid 0 1345 67.3 67.3 67.3
1 318 15.9 15.9 83.2
2 239 12.0 12.0 95.1
3 75 3.8 3.8 98.9
4 19 1.0 1.0 99.8
5 3 .2 .2 100.0
7 1 .1 .1 100.0
Total 2000 100.0 100.0

Bar Chart
4.1.4 Number of Workers in the Household
Frequencies
Statistics
Number of workers in the HH
N Valid 2000
Missing 0
Mean 1.18
Median 1.00
Mode 1
Std. Deviation .933
Variance .870
Skewness .679
Minimum 0
Maximum 7
Sum 2368
Number of workers in the HH
Cumulative
Valid 0 503 25.2 25.2 25.2
1 791 39.6 39.6 64.7
2 579 29.0 29.0 93.7
3 99 5.0 5.0 98.6
4 20 1.0 1.0 99.6
5 7 .4 .4 100.0
7 1 .1 .1 100.0
Total 2000 100.0 100.0

4.1.5 Household Income

Frequencies
Statistics
Household income
N Valid 2000
Missing 0
Mean 37091.19
Median 35789.00
Mode 17500
Variance 476254161.748
Skewness 1.063
Minimum 2500
Maximum 100000
Sum 74182384

Household income
Cumulative
Valid 2500 51 2.6 2.6 2.6
7500 139 7.0 7.0 9.5
12500 109 5.5 5.5 15.0
17500 151 7.6 7.6 22.5
22500 110 5.5 5.5 28.0
27500 128 6.4 6.4 34.4
31005 25 1.3 1.3 35.7
32100 46 2.3 2.3 38.0
32500 121 6.1 6.1 44.0
32665 25 1.3 1.3 45.3
35571 86 4.3 4.3 49.6
35789 18 .9 .9 50.4
36153 73 3.7 3.7 54.1
37500 122 6.1 6.1 60.2
39593 58 2.9 2.9 63.1
40565 97 4.9 4.9 68.0
42500 91 4.6 4.6 72.5
43371 98 4.9 4.9 77.4
47500 82 4.1 4.1 81.5
52500 59 2.9 2.9 84.5
57500 71 3.6 3.6 88.0
62500 37 1.8 1.8 89.9
67500 52 2.6 2.6 92.5
72500 30 1.5 1.5 94.0
77500 19 1.0 1.0 94.9
100000 102 5.1 5.1 100.0
Total 2000 100.0 100.0

4.1.6 Number of Drivers in the Household
Frequencies
Statistics
Number of drivers in the HH
N Valid 2000
Missing 0
Mean 1.67
Median 2.00
Mode 2
Std. Deviation .837
Variance .701
Skewness .607
Minimum 0
Maximum 5
Sum 3347
Number of drivers in the HH

Cumulative
Valid 0 105 5.3 5.3 5.3
1 739 37.0 37.0 42.2
2 928 46.4 46.4 88.6
3 172 8.6 8.6 97.2
4 44 2.2 2.2 99.4
5 12 .6 .6 100.0
Total 2000 100.0 100.0

4.1.7 Number of Automobiles in the Household
Frequencies
Statistics
Number of automobiles
N Valid 2000
Missing 0
Mean 1.79
Median 2.00
Mode 2
Variance 1.040
Skewness .769
Minimum 0
Maximum 8
Sum 3574
Number of automobiles
Cumulative
Valid 0 149 7.4 7.4 7.4
1 652 32.6 32.6 40.1
2 823 41.2 41.2 81.2
3 267 13.4 13.4 94.6
4 78 3.9 3.9 98.5
5 26 1.3 1.3 99.8
6 3 .2 .2 99.9
7 1 .1 .1 100.0
8 1 .1 .1 100.0
Total 2000 100.0 100.0

4.1.8 Number of Trips Made By the Household
Frequencies
Statistics
Number of trips made by a household
N Valid 2000
Missing 0
Mean 7.08
Median 6.00
Mode 4
Variance 39.212
Skewness 1.650
Minimum 0
Maximum 53
Sum 14164

Cumulative
Valid 0 233 11.7 11.7 11.7
1 13 .7 .7 12.3
2 250 12.5 12.5 24.8
3 102 5.1 5.1 29.9
4 273 13.7 13.7 43.6
5 110 5.5 5.5 49.1
6 166 8.3 8.3 57.4
7 109 5.5 5.5 62.8
8 138 6.9 6.9 69.7
9 76 3.8 3.8 73.5
10 93 4.7 4.7 78.1
11 56 2.8 2.8 81.0
12 65 3.3 3.3 84.2
13 50 2.5 2.5 86.7
14 44 2.2 2.2 88.9
15 30 1.5 1.5 90.4
16 34 1.7 1.7 92.1
17 18 .9 .9 93.0
18 19 1.0 1.0 94.0
19 13 .7 .7 94.6
20 21 1.1 1.1 95.7
21 13 .7 .7 96.3
22 13 .7 .7 97.0
23 5 .3 .3 97.2
24 12 .6 .6 97.8
25 5 .3 .3 98.1
26 10 .5 .5 98.6
27 4 .2 .2 98.8
28 8 .4 .4 99.2
29 3 .2 .2 99.3
30 1 .1 .1 99.4
31 3 .2 .2 99.5
32 4 .2 .2 99.7
33 2 .1 .1 99.8
34 1 .1 .1 99.9
35 1 .1 .1 99.9
50 1 .1 .1 100.0
53 1 .1 .1 100.0
Total 2000 100.0 100.0

4.1.9 Summary of Descriptive Analyses

For continuous variables it is easier to use Descriptive, which will provide you with ‘summary’
statistics such as mean, median and standard deviation. Since you don’t want to count every single
value if incase there are hundreds of values. Continuous variables can also be categorized and then
it’s meaningful to take frequency distribution of the variable.
 Use the maximum to identify a possible outlier or a data-entry error. One of the simplest ways to
assess the spread of your data is to compare the minimum and maximum. If the maximum value
is very high, even when you consider the center, the spread, and the shape of the data, investigate
the cause of the extreme value.
 Use the minimum to identify a possible outlier or a data-entry error. One of the simplest ways to
assess the spread of your data is to compare the minimum and maximum. If the minimum value
is very low, even when you consider the center, the spread, and the shape of the data, investigate
the cause of the extreme value.
 The median and the mean both measure central tendency. But unusual values, called outliers, can
affect the median less than they affect the mean. If your data are symmetric, the mean and median
are similar.
 Use the mean to describe the sample with a single value that represents the center of the data.
Many statistical analyses use the mean as a standard measure of the center of the distribution of
the data.
 The mode is the value that occurs most frequently in a set of observations. Minitab also displays
how many data points equal the mode.
 Skewness is the extent to which the data are not symmetrical. Use skewness to help you establish
an initial understanding of your data.
 Use the standard deviation to determine how spread out the data are from the mean. A higher
standard deviation value indicates greater spread in the data.
 The variance measures how spread out the data are about their mean. The variance is equal to the
standard deviation squared. The greater the variance, the greater the spread in the data.

4.2 Correlations Analyses
Correlation Analysis is statistical method that is used to discover if there is a relationship between
two variables/datasets, and how strong that relationship may be. Essentially, correlation analysis is
used for spotting patterns within datasets. A positive correlation result means that both variables
increase in relation to each other, while a negative correlation means that as one variable decreases,
the other increases. Correlation Coefficients: There are usually three different ways of ranking
statistical correlation according to Spearman, Kendall, and Pearson. Each coefficient will represent
the end result as ‘r’. Spearman’s Rank and Pearson’s Coefficient are the two most widely used
analytical formulae depending on the types of data researchers have to hand. Therefore, in this paper
we use Pearson correlation method in order to discover the relationship and how strong that
relationship may be between number of trips made by the household and each seven household
characteristics variables/datasets (such as: number of persons in the household, number of persons
aged 0-4 in the household, number of persons aged 5-21 in the household, number of workers in the
household, number of drivers in the household, household income, and household vehicle
ownership).

Correlations
Number of trips Total number of
made by a persons in
household household
Number of trips made by a Pearson Correlation 1 .554**
household Sig. (2-tailed) .000
N 2000 2000
Total number of persons in Pearson Correlation .554** 1
N 2000 2000
**. Correlation is significant at the 0.01 level (2-tailed).

Correlations
Number of trips Number of
made by a persons aged 0-
household 4
N 2000 2000
Number of persons aged 0-4 Pearson Correlation .125** 1
Sig. (2-tailed) .000
N 2000 2000

Correlations
made by a persons aged 5-
household 21
N 2000 2000
Number of persons aged 5- Pearson Correlation .533** 1
21 Sig. (2-tailed) .000
N 2000 2000

Correlations
made by a workers in the
household HH
N 2000 2000
Number of workers in the HH Pearson Correlation .438** 1
N 2000 2000

Correlations
Number of trips
made by a Household
household income
N 2000 2000
Household income Pearson Correlation .247** 1
N 2000 2000

Correlations
made by a drivers in the
household HH
N 2000 2000
Number of drivers in the HH Pearson Correlation .433** 1
N 2000 2000

Correlations
Number of trips
made by a Number of
household automobiles
N 2000 2000
Number of automobiles Pearson Correlation .356** 1
N 2000 2000

4.2.8 Summary of Correlations Analyses
Pearson Correlation used in these paper – This is the most widely used correlation analysis formula,
which measures the strength of the ‘linear’ relationships between the raw data from both variables,
rather than their ranks. This is a dimensionless coefficient, meaning that there are no data-related
boundaries to be considered when conducting analyses with this formula, which is a reason why this
coefficient is the first formula researchers try. The correlation coefficient can range from -1 to +1,
with -1 indicating a perfect negative correlation, +1 indicating a perfect positive correlation, and 0
indicating no correlation at all. (A variable correlated with itself will always have a correlation
coefficient of 1. Also, we can think of the correlation coefficient as telling you the extent to which
you can guess the value of one variable given a value of the other variable. From the scatterplot of
the variables read and write below, we can see that the points tend along a line going from the bottom
left to the upper right, which is the same as saying that the correlation is positive. If the correlation
was higher, the points would tend to be closer to the line; if it was smaller, they would tend to be
further away from the line. Also note that, by definition, any variable correlated with itself has a
correlation of 1.
Interpreting Results: Positive Correlation is any score from +0.5 to +1 indicates a very strong
positive correlation, which means that they both increase at the same time. The line of best fit, or the
trend line, is places to best represent the data on the graph. In this case, it is following the data points
upwards to indicate the positive correlation. Negative Correlation is any score from -0.5 to -1 indicate
a strong negative correlation, which means that as one variable increases, the other decreases
proportionally. The line of best fit can be seen here to indicate the negative correlation. In these cases
it will slope downwards from the point of origin. No Correlation is very simply, a score of 0 indicates
that there is no correlation, or relationship, between the two variables.
Therefore, as shown the above tables of interest model summary, which indicates all household
characteristics variables are positively correlated with number of trips made by the household. In
additions, number of persons in the household, number of persons aged 5-21 in the household, number
of workers in the household, and number of drivers in the household; have relatively high degree of
correlation with number of trips made by the household; and number of automobiles in the household
and household income have moderate degree of correlation with number of trips made by the
household; but number of persons aged 0-4 in the household have relatively minimum degree of
correlation with number of trips made by the household.

4.3 Normality or Skewness Analyses
Checking the normality assumption is necessary to decide whether aparametric or non-parametric test
needs to be used. Descriptive statistics in SPSS can also provide different statistics; one is the
distribution of score on continuous variables (Skewness and Kurtosis). These statistics are important
when using parametric statistical techniques (t-tests, ANOVA, Correlation or regression). Skewness
provides indication if the distribution is symmetric or not, while Kurtosis on the other hand provides
information about the ‘peakedness’ of the distribution. In statistics, normality tests are used to
determine whether a data set is modeled for normal distribution. Many statistical functions require
that a distribution be normal or nearly normal. There are both graphical and statistical methods for
evaluating normality: Graphical methods include the histogram and normality plot, and Statistically,
two numerical measures of shape – skewness and excess kurtosis – can be used to test for normality.
If the distribution is perfectly normal, you would obtain a Skewness and kurtosis value of 0 (rather
an uncommon occurrence in the social sciences). Positive Skewness values indicate positive skew
(scores clustered to the left at the low values). Negative Skewness values indicate a clustering of
scores at the high end (right-hand side of a graph). Most researchers consider data to be approximately
normal in shape if the Skewness and kurtosis values turn out to be anywhere from – 1.0 to + 1.0. In
order to Display Skewness and Kurtosis on the output, Select Options Button after entering the
variables in the Variable(s) list box, and select you will be shown the following dialog box
Statistics
N Valid 2000
Missing 0
Skewness 1.650
Statistics
N Valid 2000
Missing 0
Skewness .943
Statistics
Number of Number of Number of Number of
persons aged persons workers in drivers in the Household Number of
0-4 aged 5-21 the HH HH income automobiles
N Valid 2000 2000 2000 2000 2000 2000
Missing 0 0 0 0 0 0
Skewness 2.628 1.771 .679 .607 1.063 .769
Std. Error of Skewness .055 .055 .055 .055 .055 .055

1. Total Number of Persons in the Household
2. Number of Persons Aged 0-4 in the Household

3. Number of Persons Aged 5-21 in the Household
4. Number of Workers in the Household

5. Household Income
6. Number of Drivers in the Household

7. Number of Automobiles in the Household

Interpretation
You can use a histogram of the data overlaid with a normal curve to examine the normality of your
data. A normal distribution is symmetric and bell-shaped, as indicated by the curve. It is often
difficult to evaluate normality with small samples. A probability plot is best for determining the
distribution fit. Skewness is the extent to which the data are not symmetrical. Use skewness to help
you establish an initial understanding of your data.
 Symmetrical or non-skewed distributions: As data becomes more symmetrical, its skewness value
approaches zero. Figure A shows normally distributed data, which by definition exhibits relatively
little skewness. By drawing a line down the middle of this histogram of normal data it's easy to
see that the two sides mirror one another. But lack of skewness alone doesn't imply normality.
Figure B shows a distribution where the two sides still mirror one another, though the data is far
from normally distributed.
 Positive or right skewed distributions: Positive skewed or right skewed data is so named because
the "tail" of the distribution points to the right, and because its skewness value will be greater than
0 (or positive). Salary data is often skewed in this manner: many employees in a company make
relatively little, while increasingly few people make very high salaries.
 Negative or left skewed distributions: Left skewed or negative skewed data is so named because
the "tail" of the distribution points to the left, and because it produces a negative skewness value.
Failure rate data is often left skewed. Consider light bulbs: very few will burn out right away, the
vast majority lasting for quite a long time.
 In statistics, skewness is a measure of the asymmetry of the probability distribution of a random
variable about its mean. In other words, skewness tells you the amount and direction of skew
(departure from horizontal symmetry). The skewness value can be positive or negative, or even
undefined. If skewness is 0, the data are perfectly symmetrical, although it is quite unlikely for
real-world data. As a general rule of thumb:
o If skewness is less than -1 or greater than 1, the distribution is highly skewed.

o If skewness is between -1 and -0.5 or between 0.5 and 1, the distribution is moderately skewed.
o If skewness is between -0.5 and 0.5, the distribution is approximately symmetric.

4.4 Linear Regression Analyses
Simple linear regression is used to estimate the relationship between two quantitative variables. You
can use simple linear regression when you want to know how strong the relationship is between two
variables and the value of the dependent variable at a certain value of the independent variable. Linear
regression is the next step up after correlation. It is used when we want to predict the value of a
variable based on the value of another variable. The variable we want to predict is called the dependent
variable (or sometimes, the outcome variable). The variable we are using to predict the other variable's
value is called the independent variable (or sometimes, the predictor variable).
Regression models describe the relationship between variables by fitting a line to the observed data.
Linear regression models use a straight line, while logistic and nonlinear regression models use a
curved line. Regression allows you to estimate how a dependent variable changes as the independent
variable(s) change.
Assumptions of simple linear regression: Simple linear regression is a parametric test, meaning that
it makes certain assumptions about the data. These assumptions are:
 Homogeneity of variance (homoscedasticity): the size of the error in our prediction doesn’t
change significantly across the values of the independent variable.
 Independence of observations: the observations in the dataset were collected using statistically
valid sampling methods, and there are no hidden relationships among observations.
 Normality: The data follows a normal distribution.
 The relationship between the independent and dependent variable is linear: the line of best fit
through the data points is a straight line (rather than a curve or some sort of grouping factor).
 The dependent variable and independent variable measured at the continuous level
 There needs is linear relationship between the two variables. By check creating a scatterplot
using SPSS Statistics where can plot the dependent variable against independent variable.
 An outlier is an observed data point that has a dependent variable value that is very different
to the value predicted by the regression equation.

Regression
Variables Entered/ Removeda
Variables
Model Variables Entered Removed Method
1 Total number of . Enter
persons in
householdb
a. Dependent Variable: Number of trips made by a household
b. All requested variables entered.
Model Summary
Adjusted R Std. Error of the
Model R R Square Square Estimate
1 .554a .307 .307 5.214
a. Predictors: (Constant), Total number of persons in household
ANOVAa
Model Sum of Squares df Mean Square F Sig.
1 Regression 24061.191 1 24061.191 884.965 .000b
Residual 54323.361 1998 27.189
Total 78384.552 1999
b. Predictors: (Constant), Total number of persons in household
Coefficientsa
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) .530 .249 2.125 .034
Total number of persons in 2.451 .082 .554 29.748 .000
household


Regression
Variables Entered/Removeda
Variables
1 Number of . Enter
persons aged 0-4b
Model Summary
Adjusted R Std. Error of
Model R R Square Square the Estimate
1 .125a .016 .015 6.214
a. Predictors: (Constant), Number of persons aged 0-4
ANOVAa
1 Regression 1232.614 1 1232.614 31.921 .000b
Residual 77151.938 1998 38.615
Total 78384.552 1999
b. Predictors: (Constant), Number of persons aged 0-4
Coefficientsa
Standardized
1 (Constant) 6.768 .150 45.229 .000
Number of persons aged 1.516 .268 .125 5.650 .000
0-4

Regression
Variables
1 Number of . Enter
persons aged 5-
21b
Model Summary
a
1 .533 .284 .283 5.301
a. Predictors: (Constant), Number of persons aged 5-21
ANOVAa
1 Regression 22240.097 1 22240.097 791.453 .000b
Residual 56144.455 1998 28.100
Total 78384.552 1999
b. Predictors: (Constant), Number of persons aged 5-21
Coefficientsa
Standardized
1 (Constant) 5.088 .138 36.840 .000
Number of persons aged 5-21 3.564 .127 .533 28.133 .000

Regression
Variables
1 Number of . Enter
workers in the HHb
Model Summary
1 .438a .192 .192 5.630
a. Predictors: (Constant), Number of workers in the HH
ANOVAa
1 Regression 15050.109 1 15050.109 474.783 .000b
Residual 63334.443 1998 31.699
Total 78384.552 1999
b. Predictors: (Constant), Number of workers in the HH
Coefficientsa
Standardized
1 (Constant) 3.598 .204 17.681 .000
Number of workers in the HH 2.942 .135 .438 21.790 .000

Regression
Variables
1 Household . Enter
incomeb
Model Summary
1 .247a .061 .060 6.070
a. Predictors: (Constant), Household income
ANOVAa
1 Regression 4770.041 1 4770.041 129.466 .000b
Residual 73614.511 1998 36.844
Total 78384.552 1999
b. Predictors: (Constant), Household income
Coefficientsa
Standardized
1 (Constant) 4.457 .268 16.647 .000
Household income 7.078E-5 .000 .247 11.378 .000

Regression
Variables
1 Number of drivers . Enter
in the HHb
Model Summary
1 .433a .188 .187 5.646
a. Predictors: (Constant), Number of drivers in the HH
ANOVAa
1 Regression 14700.773 1 14700.773 461.219 .000b
Residual 63683.779 1998 31.874
Total 78384.552 1999
b. Predictors: (Constant), Number of drivers in the HH
Coefficientsa
Standardized
1 (Constant) 1.663 .282 5.892 .000
Number of drivers in the HH 3.238 .151 .433 21.476 .000

Regression
Variables
1 Number of . Enter
automobilesb
Model Summary
1 .356a .127 .126 5.853
a. Predictors: (Constant), Number of automobiles
ANOVAa
1 Regression 9934.490 1 9934.490 289.979 .000b
Residual 68450.062 1998 34.259
Total 78384.552 1999
b. Predictors: (Constant), Number of automobiles
Coefficientsa
Standardized
1 (Constant) 3.176 .264 12.026 .000
Number of automobiles 2.186 .128 .356 17.029 .000

4.4.8 Summary of Linear Regression Analyses
The above Regression models describe the relationship between variables by fitting a line to the
observed data. Also the regression models use a straight line and the regression allows us to estimate
how a dependent variable changes as the independent variable(s) change. An independent variable,
sometimes called an experimental or predictor variable, is a variable that is being manipulated in an
experiment in order to observe the effect on a dependent variable, sometimes called an outcome
variable. The formula for a simple linear regression is:
 y is the predicted value of the dependent variable (y) for any given value of the independent
variable (x).
 B0 is the intercept, the predicted value of y when the x is 0.
 B1 is the regression coefficient – how much we expect y to change as x increases.
 x is the independent variable ( the variable we expect is influencing y).
 e is the error of the estimate, or how much variation there is in our estimate of the
regression coefficient.
Linear regression finds the line of best fit line through your data by searching for the regression
coefficient (B1) that minimizes the total error (e) of the model.
SPSS Statistics generate quite a few tables of output for a linear regression. In this section, we show
you only the three main tables required to understand the results from the linear regression procedure,
assuming that no assumptions have been violated.
 The first table of interest is the Model Summary table, as shown table provides
the R and R2 values. The R value represents the simple correlation. Therefore, according to
the above tables using the (the "R" Column), which indicates number of persons in the
household and number of persons aged 5-21 in the household have relatively high degree of
correlation with number of trips made by the household; number of workers in the household,
number of drivers in the household, and number of automobiles in the household have
moderate degree of correlation with number of trips made by the household; and number of
persons aged 0-4 in the household and household income have relatively minimum degree of
correlation with number of trips made by the household.

 The R2 value (the "R Square" column) indicates how much of the total variation in the
dependent variable, can be explained by the independent variable. In our case, number of
persons aged 0-4 in the household, household income, number of workers in the household,
number of drivers in the household, and number of automobiles in the household can be
explained, which is very low; and number of persons in the household and number of persons
aged 5-21 in the household can be explained, which is moderate.
 The next table is the ANOVA table, which reports how well the regression equation fits the
data (i.e., predicts the dependent variable). This table indicates that the regression model
predicts the dependent variable significantly well. How do we know this? Look at the
"Regression" row and go to the "Sig." column. This indicates the statistical significance of
the regression model that was run, that is less than 0.05. Therefore, Here, p < 0.0005, which
is less than 0.05, and indicates that, overall, the regression model statistically significantly
predicts the outcome variable (i.e., it is a good fit for the data).
 The third table is the Coefficients table provides us with the necessary information to predict
dependent variable from independent variable, as well as determine whether independent
variable contributes statistically significantly to the model (by looking at the "Sig." column).
Furthermore, we can use the values in the "B" column under the "Unstandardized
Coefficients" column. Therefore, from the above tables analysis all independent variables are
contributes statistically significantly to the model (by looking at the "Sig." column).
 The last three lines of the model summary are statistics about the model as a whole. The most
important thing to notice here is the p value of the model. Here it is significant (p < 0.001),
which means that this model is a good fit for the observed data.
 It can also be helpful to include a graph with your results. For a simple linear regression, you
can simply plot the observations on the x and y axis and then include the regression line and
regression function.

4.5 Multiple Regression Analyses
Multiple Regression analysis allows for investigating the relationship between variables. Usually, the
variables are labelled as dependent or independent. An independent variable is an input, driver or
factor that has an impact on a dependent variable (which can also be called an outcome). All
experiments examine some kind of variables. A variable is not only something that we measure, but
also something that we can manipulate and something we can control for. To understand the
characteristics of variables and how we use them in research, this guide is divided into three main
sections. First, we illustrate the role of dependent and independent variables. Second, we discuss the
difference between experimental and non-experimental research. Finally, we explain how variables
can be characterized as either categorical or continuous.
Assumptions: Multiple linear regression follows the same logic as univariate linear regression
except multiple regression, there are more than one independent variable and there should be non-
collinearity among the independent variables. Also, multiple regression analyses are affected by
factors, namely, sample size, missing data and the nature of sample.2
 Small sample size may only demonstrate connections among variables with strong
relationship. Therefore, sample size must be chosen based on the number of independent
variables and expect strength of relationship.
 Many missing values in the data set may affect the sample size. Therefore, all the missing
values should be adequately dealt with before conducting regression analyses.
 The subsamples within the larger sample may mask the actual effect of independent and
dependent variables. Therefore, if subsamples are predefined, a regression within the sample
could be used to detect true relationships. Otherwise, the analysis should be undertaken on the
whole sample.
Regression
Model Variables Entered Variables Removed Method
1 Number of automobiles, Number of . Enter
persons aged 0-4, Number of persons
aged 5-21, Household income,
Number of workers in the HH, Number
of drivers in the HH, Total number of
persons in householdb

Model Summaryb
Std. Error Change Statistics
R Adjusted R of the R Square Sig. F
Model R Square Square Estimate Change F Change df1 df2 Change
1 .632a .399 .397 4.862 .399 189.116 7 1992 .000
a. Predictors: (Constant), Number of automobiles, Number of persons aged 0-4, Number of persons aged 5-21,
Household income, Number of workers in the HH, Number of drivers in the HH, Total number of persons in
household
b. Dependent Variable: Number of trips made by a household
ANOVAa
1 Regression 31294.464 7 4470.638 189.116 .000b
Residual 47090.088 1992 23.640
Total 78384.552 1999
b. Predictors: (Constant), Number of automobiles, Number of persons aged 0-4, Number of persons aged 5-
21, Household income, Number of workers in the HH, Number of drivers in the HH, Total number of persons
in household
Coefficientsa
Unstandardized Standardized
Coefficients Coefficients Collinearity Statistics
Toleran
Model B Std. Error Beta t Sig. ce VIF
1 (Constant) .575 .324 1.777 .076
Total number of .454 .184 .103 2.460 .014 .173 5.764
persons in
household
Number of persons .355 .289 .029 1.228 .220 .528 1.894
aged 0-4
Number of persons 2.325 .209 .348 11.145 .000 .310 3.224
aged 5-21
Number of workers 1.036 .156 .154 6.657 .000 .561 1.783
in the HH
Number of drivers 1.044 .206 .140 5.066 .000 .397 2.519
in the HH
Household income 1.495E-5 .000 .052 2.713 .007 .818 1.222
Number of .219 .151 .036 1.450 .147 .499 2.003
automobiles

Collinearity Diagnosticsa
Variance Proportions
Eig Total Number Number Hous Number
env number of of Number of of Number ehold of
alu Condition persons in persons persons workers of drivers incom automo
Model Dimension e Index (Constant) household aged 0-4 aged 5-21 in the HH in the HH e biles
1 1 5.8 1.000 .00 .00 .00 .00 .00 .00 .00 .00
96
2 .84 2.647 .00 .00 .50 .00 .00 .00 .00 .00
1
3 .64 3.033 .01 .00 .00 .29 .00 .00 .02 .01
1
4 .24 4.938 .06 .00 .00 .01 .54 .01 .19 .00
2
5 .16 5.975 .07 .01 .00 .00 .18 .04 .69 .07
5
6 .12 6.889 .36 .01 .02 .00 .10 .00 .04 .48
4
7 .06 9.748 .15 .00 .00 .00 .17 .82 .06 .41
2
8 .02 14.290 .34 .98 .47 .69 .01 .11 .00 .02
9
Residuals Statisticsa
Minimum Maximum Mean Std. Deviation N
Predicted Value 1.07 29.87 7.08 3.957 2000
Residual -19.872 39.333 .000 4.854 2000
Std. Predicted Value -1.520 5.760 .000 1.000 2000
Std. Residual -4.087 8.090 .000 .998 2000

Charts

Summary and Interpretations
It's well -known statistical technique for fitting mathematical relationships between dependent and
independent variables. In the case of trip generation equation, the dependent variable is the number
of trips and the independent variables are the various variable factors that influence trip generation.
These independent variables are land use and socioeconomic characteristics which discussed earlier.
Furthermore, regression analysis has four primary purposes: description, estimation, prediction and
control. By description, regression can explain the relationship between dependent and independent
variables. Estimation means that by using the observed values of independent variables, the value of
dependent variable can be estimated. Regression analysis can be useful for predicting the outcomes

and changes in dependent variables based on the relationships of dependent and independent
variables. Finally, regression enables in controlling the effect of one or more independent variables
while investigating the relationship of one independent variable with the dependent variable.
Assumes that the distribution of dependent data is normal, there is linear relationship between
dependent and independent variables, and the independent variables are not be highly correlated.
Because, higher correlation among the independent variables may affect the relationship between
independent and dependent variable.
 The first table of interest is the Model Summary table. This table provides the R, R2, adjusted
R2, and the standard error of the estimate, which can be used to determine how well a
regression model fits the data: The "R" column represents the value of R, the multiple
correlation coefficient. R can be considered to be one measure of the quality of the prediction
of the dependent variable; in this case, number of trips made by the household. A value of
0.632 indicates a good level of prediction.
 The "R Square" column represents the R2 value (also called the coefficient of determination),
which is the proportion of variance in the dependent variable that can be explained by the
independent variables. You can see from our value of 0.399 that our independent variables
explain 39.9% of the variability of our dependent variable, number of trips made by the
household. However, you also need to be able to interpret "Adjusted R Square" (adj. R2) to
accurately report your data. We explain the reasons for this, as well as the output, in our
enhanced multiple regression guide, i.e. 0.397.
 Statistical significance: the F-ratio in the ANOVA table (see above) tests whether the overall
regression model is a good fit for the data. The table shows that the independent variables
statistically significantly predict the dependent variable, F (7, 1992) = 189.116, p < .0001, p
< .0005, i.e., the regression model is a good fit of the data.
 Estimated model coefficients: The general form of the equation to predict Number of trips
made by the household from seven household characteristics (number of persons in the
household, number of persons aged 0-4 in the household, number of persons aged 5-21 in the
household, number of workers in the household, number of drivers in the household,
household income, and household vehicle ownership) is:
Predicted ntrip = 0.575 + (0. 454 x hhsize) – (0.355 x num0to4) – (2.325x num5to21) +
(1.036x numwork) + (1.044 x numdrive) + (1.495E-5 x income) + + (0.219 x numcars)

 Unstandardized coefficients indicate how much the dependent variable varies with an
independent variable when all other independent variables are held constant. Correspondingly,
we can test for the statistical significance of each of the independent variables. This tests
whether the unstandardized (or standardized) coefficients are equal to 0 (zero) in the
population. If p < .05, you can conclude that the coefficients are statistically significantly
different to 0 (zero). The t-value and corresponding p-value are located in the "t" and "Sig.".
Therefore, according to the above tables, which indicates number of persons in the household,
number of persons aged 0-4 in the household and household income have relatively have
minimum degree of significance or effect to number of trips made by the household, but
number of persons aged 5-21 in the household, number of workers in the household, number
of drivers in the household, and number of automobiles in the household have relatively high
degree of positive effect to number of trips made by the household.
 Generally, a multiple regression was run to predict number of trips made by the household
characteristics from number of persons in the household, number of persons aged 0-4 in the
household, number of persons aged 5-21 in the household, number of workers in the
household, number of drivers in the household, household income, and household vehicle
ownership. These variables statistically significantly predicted ntrip , F (7, 1992) = 189.116,
p < .0001, p < .0005, i.e., the regression model is a good fit of the data. The four variables
added statistically significantly to the prediction, p < .05 and three variables added but not
statistically significantly to the prediction, p > .05.
Therefore, Regression analysis is a powerful and useful statistical procedure with many implications
for nursing research. It enables researchers to describe, predict and estimate the relationships and
draw plausible conclusions about the interrelated variables in relation to any studied phenomena.
Regression also allows for controlling one or more variables when researchers are interested in
examining the relationship among specific variables. Some of the key considerations are presented
that may be useful for researchers undertaking regression analysis. While planning and conducting
regression analysis, researchers should consider the type and number of dependent and independent
variables as well as the nature and size of sample. Choosing a wrong type of regression analysis with
small sample may result in erroneous conclusions about the studied phenomenon.

5. Conclusion
This study was conducted purposely for addressing the relation to the change of household
characteristics influence number of trip made by the household in the study areas. In determining
relationships between household characteristics and number of trips made by the household, data
from 1990 US National Personal Transportation Survey (NPTS) sources were applied. The main data
were a sample of 2000 households travel surveys report in the study area that drawn from the 1990
US National Personal Transportation Survey (NPTS). The report provides information Total number
of persons in the household, Number of persons aged 0-4 in the household, Number of persons aged
5-21 in the household, Number of workers in the household, Number of drivers in the household
income Household income, Number of automobiles in the household, and about Number of trips
made by the household. A total of 2000 households, 5347 people, and 14,164 trips were ultimately
described.
This paper found that household characteristics exhibit significantly important to predict Number of
trips made by the household and this suggests that how household composition is an important
indicator for trip pattern analysis. Additionally, in order to capture these behavior by household the
study argues that it is necessary to employ simple statistical analytical methods applied to determine
relationships between household characteristics variables and travel attributes. Therefore,
Descriptive, regression, and correlational analyses are used to explore character, relationships, and
correlation between household travel characteristics groups and number of trips made by the
households.
According to correlation analysis indicates all household characteristics variables are positively
correlated with number of trips made by the household, but number of persons aged 0-4 in the
household have minimum degree of correlation relative to other characteristics. Besides, according
the regression results, number of persons in the household, number of persons aged 0-4 in the
household and household income have relatively have minimum degree of significance, but number
of persons aged 5-21 in the household, number of workers in the household, number of drivers in the
household, and number of automobiles in the household have relatively high degree of positive effect
to number of trips made by the household. Therefore, the output of this paper will be helpful to
academic knowledge and enable understanding of the subject matter as it paves the way for further
investigation on the issues and it indicates constraints, low standards, and challenges to related sectors
and concerned offices for efficient budget allocation and resource management.

References
1. Professor Ma, (2022). Different Power Points and Documents
2. Abley, S., Chou, M., & Malcolm, D. (2008). National travel profiling part A: description of daily travel
patterns (Research Report 353). http://worldcat.org/isbn/9780478334081
3. Akinlotan, M., Primm, K., Khodakarami, N., Bolin, J., & Ferdinand, A. O. (2021). Rural-Urban
Variations in Travel Burdens for Care: Findings from the 2017 National Household Travel Survey.
JULY, 1–20. https://srhrc.tamhsc.edu/docs/travel-burdens-07.2021.pdf
4. Aschauer, F., Hössinger, R., Axhausen, K. W., Schmid, B., & Gerike, R. (2018). Implications of survey
methods on travel and non-travel activities: A comparison of the austrian national travel survey and
an innovative mobility-activity-expenditure diary (MAED). European Journal of Transport and
Infrastructure Research, 18(1), 4–35. https://doi.org/10.18757/ejtir.2018.18.1.3217
5. Cong, X. (2012). Using traditional household survey and {GPS} data for advanced travel behavior
and emission analysis. https://drum.lib.umd.edu/handle/1903/13550
6. Council, M. R., & City, K. (2020). 2019 Kansas City Regional Household Travel Survey Final Report.
3129(February).
7. Currans, K. M., & Clifton, K. J. (2015). Using household travel surveys to adjust ITE trip generation
rates. Journal of Transport and Land Use, 8(1), 85–119. https://doi.org/10.5198/jtlu.2015.470
8. Dambula, I., & Chibwana, E. N. B. (2004). Characteristics of households and household members.
Population (English Edition), 9–24.
9. Department for Transport. (2019). Analyses from the National Travel Survey. Statistical Release,
January, 34.
https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/fil
e/775032/2019-nts-commissioned-analyses.pdf
10. Dowds, J., Harvey, C., Lamondia, J., Howerter, S., Ullman, H., & Aultman-Hall, L. (2018). Advancing
Understanding of Long-Distance and Intercity Travel with Diverse Data Sources A Research Report
from the National Center for Sustainable Transportation About the National Center for Sustainable
Transportation.
11. Extract, D., & Guide, U. (1978). National travel survey. Annals of Tourism Research, 5(4), 466.
https://doi.org/10.1016/0160-7383(78)90383-3
12. Federal Highway Administration. (2009). 2009 National Household Travel Survey - Florida Data
Analysis. March. http://www.fdot.gov/planning/trends/special/nhts.pdf
13. Federal Highway Administration. (2017). Typecasting Neighborhoods and Travelers. December.
14. Garrett, M. (2014). National Household Travel Survey. In Encyclopedia of Transportation: Social
Science and Policy. https://doi.org/10.4135/9781483346526.n341
15. Gauteng Province Department of Roads and Transport. (2020). Gauteng Province Household Travel
Survey Report 2019/20.
16. Greaves, S. (2000). Simulating Household Travel Survey Data in Metropolitan Areas.
http://digitalcommons.lsu.edu/cgi/viewcontent.cgi?article=8356&context=gradschool_disstheses
17. Highway Administration, F. (2021). The Transportation Future: Trends, Transportation, and Travel.
November.
18. HOBBS, F. D. (1979). Traffic Surveys and Analysis. Traffic Planning and Engineering, 94–172.

https://doi.org/10.1016/b978-0-08-022697-2.50010-2
19. Hubrich, S., Wittwer, R., & Gerike, R. (2018). Household vs. individual survey practices - Implications
for household travel survey expenditures in Germany. Transportation Research Procedia, 32, 404–
415. https://doi.org/10.1016/j.trpro.2018.10.061
20. Kazaura, W. G. (n.d.). Effects of Household Characteristics on Effective use of Modes of Transport in
Dar Es Salaam City , Tanzania. 20(4).
21. Kim, K., Pant, P., & Yamashita, E. (2013). Using national household travel survey data for the
assessment of transportation system vulnerabilities. Transportation Research Record, 2376, 71–80.
https://doi.org/10.3141/2376-09
22. Levinson, D., Lindsey, G., Fan, Y., Cao, J., Iacono, M., Brosnan, M., Guthrie, A., Schoner, J., University
of Minnesota, T. C., Transportation, M. D. of, & Cities, M. C. of T. (2015). Travel Behavior Over Time.
June, 270p.
http://www.dot.state.mn.us/research/TS/2015/201523.pdf%0Ahttps://trid.trb.org/view/1411916
23. Limtanakool, N., Dijst, M., & Schwanen, T. (2006). The influence of socioeconomic characteristics,
land use and travel time considerations on mode choice for medium- and longer-distance trips.
Journal of Transport Geography, 14(5), 327–341. https://doi.org/10.1016/j.jtrangeo.2005.06.004
24. Mattson, J. (2012). Travel behavior and mobility of transportation-disadvantaged populations:
Evidence from the National Household Travel Survey. Upper Great Plains Transportation
Institute,North Dakota State University., December, 49p.
http://www.ugpti.org/pubs/pdf/DP258.pdf%5Cnhttps://trid.trb.org/view/1225801
25. McDonald, N. C. (2006). Exploratory analysis of children’s travel patterns. Transportation Research
Record, 1977, 1–7. https://doi.org/10.3141/1977-03
26. Mcguckin, N., Casas, J., & Wilaby, M. (2016). MI Travel Counts III Travel Characteristics Technical
Report. 3129(301). www.travelbehavior.us
27. McGuckin, N., & Fucci, A. (2018). Summary of Travel Trends: 2017 National Household Travel
Survey. 148. https://nhts.ornl.gov/assets/2017_nhts_summary_travel_trends.pdf
28. Metropolitan, A., & Transportation, A. (2014). REGIONAL HOUSEHOLD TRAVEL.
29. National Household Travel Survey Pre- and Post-9 / 11 Data Documentation. (n.d.).
30. Pearson, D. F., Hard, E. N., Farnsworth, S. P., Forrest, T. L., Spillane, D. L., Ojah, M., Womack, K.,
Boxill, S. A., Lewis, C. A., Institute, T. T., Transportation, T. D. of, & Administration, F. H. (2010).
Improving Accuracy in Household and External Travel Surveys. 7(2), 336p.
http://tti.tamu.edu/documents/0-5711-1.pdf%5Cnhttps://trid.trb.org/view/915795
31. Pendyala, R. M., & Bhat, C. R. (2004). Emerging Issues in Travel Behavior Analysis. Workshop on
Emerging Issues, National Household Travel Survey Conference, 813, 1–53.
http://www.caee.utexas.edu/prof/bhat/ABSTRACTS/Workshop_EmergingIssues_Full.pdf
32. Polzin, S., Chu, X., & Raman, V. (2009). Exploration of a Shift in Household Transportation Spending
from Vehicles to Public Transportation. Transportation Research, January.
http://pubsindex.trb.org/view.aspx?id=850128
33. Polzin, S. E., & Chu, X. (2005). Public Transit in America : Results from the 2001 National Household
Travel Survey. September. http://www.nctr.usf.edu/pdf/527-09.pdf
34. Reiffer, A., Barthelmes, L., Kagerbauer, M., & Vortisch, P. (2022). Representation of Work-Related
Trip Patterns in Household and Commercial Travel Surveys. Transportation Research Record:
Journal of the Transportation Research Board, 2676(11), 59–73.

https://doi.org/10.1177/03611981221091559
35. Report, F. (2018). User Insight into Pan Northern Travel. July.
36. Singh, G., Gerke, M., Chigoy, B. T., & Hard, E. N. (2020). 2017 CAMPO HOUSEHOLD TRAVEL SURVEY:
Technical Summary. May.
37. Stokes, G., & Lucas, K. (2011). Travel Behaviour of Low Income Households – An Analysis of the
2002-2008 National Travel Survey. March, 1–73. http://www.tsu.ox.ac.uk/pubs/1053-stokes-
lucas.pdf
38. Stopher, P. R., Studies, T., Stecher, C., Franklin, T., Group, H., & Monica, S. (2010a). S Tandards for H
Ousehold T Ravel S Urveys –. Transportation Research, 21, 1–24.
39. Stopher, P. R., Studies, T., Stecher, C., Franklin, T., Group, H., & Monica, S. (2010b). S Tandards for H
Ousehold T Ravel S Urveys –. Transportation Research, 61(21), 1–24.
40. Transport Department. (2014). Travel Characteristics Survey 2011 Final Report. Transport
Department, Hong Kong, February.
http://www.td.gov.hk/filemanager/en/content_4652/tcs2011_eng.pdf
41. UIS. (2004). Guide To the Analysis and Use of Household. In Analysis.
42. Wang, K., & Wang, K. (2021). Challenges , Issues and Opportunities in Household Travel Surveys.
43. Wang, X., Shaw, F. A., Mokhtarian, P. L., & Watkins, K. E. (2022). Response willingness in
consecutive travel surveys: an investigation based on the National Household Travel Survey using a
sample selection model. In Transportation (Issue 0123456789). Springer US.
https://doi.org/10.1007/s11116-022-10312-w

Data Analysis Methods and Software Applications Final Project

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Analysis Methods and Software Applications Final Project

Uploaded by

Copyright:

Available Formats

Paper Title: Analyzing the Effect of Household Travel Survey Characteristics

on Number of Trips Made By the Household

Beijing Jiaotong University, China 1

Beijing Jiaotong University, China 2

Beijing Jiaotong University, China 3

1.2 Objective of the Paper

1.3 The Paper Limitations

Beijing Jiaotong University, China 4

2.2 Household characteristics

Beijing Jiaotong University, China 5

Beijing Jiaotong University, China 6

2.4 Household Size

2.5 Household Income

Beijing Jiaotong University, China 7

2.6 Education Level

2.7 Number of Workers in the Household

2.8 Household Vehicle Availability and Licensed Drivers

Beijing Jiaotong University, China 8

2.9 Number of persons aged in the household

Beijing Jiaotong University, China 9

2.10 Number of Trip Made by the Household

Beijing Jiaotong University, China 10

The main factors affecting personal trip production include:

Beijing Jiaotong University, China 11

2.12 Simple Statistical Analytical Methods

2.12.1 Descriptive Research Analysis

Beijing Jiaotong University, China 12

Beijing Jiaotong University, China 13

2.12.2 What is Regression Analysis

Beijing Jiaotong University, China 14

2.12.3 Correlation Analysis

Beijing Jiaotong University, China 15

Beijing Jiaotong University, China 16

 Descriptive statistics summarizes or describes the characteristics of a data set.

Beijing Jiaotong University, China 17

Total number of persons in household

Beijing Jiaotong University, China 18

4.1.2 Number of Persons Aged 0-4 in the Household

Beijing Jiaotong University, China 19

4.1.3 Number of Persons Aged 5-21 in the Household

Beijing Jiaotong University, China 20

4.1.4 Number of Workers in the Household

Beijing Jiaotong University, China 21

4.1.5 Household Income

Beijing Jiaotong University, China 22

Beijing Jiaotong University, China 23

Number of drivers in the HH

Beijing Jiaotong University, China 24

Beijing Jiaotong University, China 25

Number of trips made by a household

Beijing Jiaotong University, China 26

4.1.9 Summary of Descriptive Analyses

Beijing Jiaotong University, China 27

4.2.1 Total Number of Persons in the Household

Beijing Jiaotong University, China 28

4.2.3 Number of Persons Aged 5-21 in the Household

4.2.4 Number of Workers in the Household

Beijing Jiaotong University, China 29

4.2.6 Number of Drivers in the Household

4.2.7 Number of Automobiles in the Household

Beijing Jiaotong University, China 30

Beijing Jiaotong University, China 31

Beijing Jiaotong University, China 32

1. Total Number of Persons in the Household