You are on page 1of 6

Farmers' Exchange

Experiment Design

Team:

Matt Gedigian, Jessica Santana, Joyce Tsai

Project:

Farmers' Exchange

Sessions:

4/12 7:00-9:45
• The Berkeley team further discussed implementation and brainstormed experiment
design.
4/13 3:00-3:30
• Matt, Joyce, and Neil discussed the experiment design and the interactive prototype
over the phone.

Contributions:

All: Brainstormed experiment design, including qualitative and quantitative methods, what
users to recruit, where to test, the format of the test.

Matt: Control conditions, experiment variants


Jessica: Quantitative and qualitative valuation, tasks
Joyce: Control conditions, experiment variants, evaluation

Tasks
The primary goal of our experiment with this prototype is to determine how farmers react to
the system components, namely the phone tree structure and the search function. We have
simplified our prototype to test these factors, removing extraneous options. Testing this
stripped-down version will allow us to get feedback on the basic design and evaluating two
different designs. Although we have designed a more comprehensive system capable of
handling some edge cases, we chose not to test this because it requires more complex
scenarios for the test subjects. We have also added a second search structure that allows
the user to browse questions and answers based on topic hierarchy. By allowing the user to
test both the hierarchical browsing and the keyword search, we can identify user
preferences and areas for improvement. After validating the basic design and determining
preferences, we can pursue refinements which enhance features and usability in future
work.
Our other goal in designing these tasks is to learn more about the farmer's mental model in
approaching the system. We have incorporated open-ended tasks as well as pre-determined
scenarios to learn what the farmer would expect to do in each case without external
guidance. In this manner, we can observe the questions that farmers ask, whether or not
they provide enough information when posing the question or if they will more often need to
follow up with more information, and whether or not the answer provided will suffice or if
the farmer will need more information.

We will have the participants complete an informed consent form at the start of the session.
Before the first task, we will briefly describe the system to the farmer to appropriately set
their expectations. We will then ask the farmer a series of short questions about
demographics and how they currently shared and receive agricultural advice (which crops
they produce, how long they've farmed, where they are from, where they get most of their
agricultural advice, how they share advice and in what cases). We will explain that they are
to complete the tasks on their own and that we will not intervene or provide assistance.
Each participant will perform all four tasks. We will ask questions about each task
immediately after it is completed, as well as a set of overall questions at the end.

TASK 1. Leave a question.


Proposed scenario: Your crops [we can personalize this based on the farmer] are wilting
and you aren't sure why. You need to find out what could be the cause. Call the Farmers'
Exchange phone number and leave a question about this problem.
TASK 2. Retrieve an answer.
Farmers' Exchange will call to notify you that your question has been answered. Follow
the system's instructions to retrieve your answer.
TASK 3. Browse for a specific topic using keyword search.
Proposed scenario: You are searching for information on which pesticides you can use on
your organic crops. Call the Farmers' Exchange phone number and search for related
questions and answers.
TASK 4. Browse for a specific topic using hierarchical search.
Proposed scenario: You are searching for information on which pesticides you can use on
your organic crops. Call the Farmers' Exchange phone number and browse for this topic.

Control Conditions
Since there is not existing technology server the needs of our target users, there is no
obvious control condition. One possibility would be to compare our service against a farmer
directly asking an advisor, interpreter, or fellow farmer for information. This is problematic
for a few different reasons. This is a large degree of variability in the existing methods of
getting information, so we would have to essentially choose how well we want the
competing method to perform. In our test setting, we can't simulate the process taking
several days (leaving messages, playing phone tag, asking follow-up questions). For these
reasons, we chose not to include a control condition, and instead we will use A/B testing (to
compare different designs) and ask subjects to compare these methods to their existing
alternatives.

Our experiment variants are: 1) if the users try asking first, then browsing or vice versa,
and 2) comparing browsing-via-search with browsing-via-navigation. We are switching the
order for who asks first and who browses first so that we can get a set of users who are
effectively new to the system for both ask and browse. For browsing-via-search and
browsing-via-navigation, we will test both versions of browse on all users. This is based on
research that demonstrates that users who see more than one version of a prototype are
more likely to critique the prototype and offer negative opinions. Since the browse
functionality of the phone system is the least familiar, we want to be particularly sure of
getting the interaction design for it right.

Users
We will recruit at least 6 farmers to test our phone interface. Although the Farmers'
Exchange system is designed for low-English-proficiency farmers, we will do our initial
testing on English-speaking farmers. However, like our target audience, the test users will
also be small farmers who do not use the internet frequently. Testing with LEP farmers
would necessitate hiring or otherwise finding an interpreter, and because this is our first
round testing with actual users, we want to have a more polished version of the system
before spending resources on bringing in interpreters. Also, one of our partner groups, CAFF
(Community Alliance for Family Farmers), recently lost their Hmong interpreter.

We plan on recruiting test participants via the Small Farm Program and CAFF. We will not be
offering monetary compensation, although we will bring cookies. Since each of our farmer
testers will be completing four tasks, we anticipate some learning fatigue. However,
because we anticipate each task taking less than a minute to complete, the learning fatigue
should not be very severe. Furthermore, as our farmers will be using the system when they
are working or at the end of the day, when they are exhausted, we would like to know how
their usage differs when they have just started testing versus when learning fatigue has set
in.

If our advisor and interpreter phone interface is ready, we plan on demoing it at a FarmLink
meeting on 4/20. However, for the scope of this assignment, we will be focusing on testing
the farmer interface.

Evaluation
The success of our system depends on how quickly a user can complete their task and the
amount of value this task adds to the user's work. If a user has to repeat a prompt or has
difficulty finding their destination, the user will take more time to complete the task. Here
we equate non-completion with infinite time.

We will measure time-based user behavior with a stop-watch and will also have the system
log each conversation as a back-up. We will measure satisfaction using survey techniques
(binary and gauged response) as well as open-ended questions.

Quantitative Metrics

Completion Rate:
• How many participants complete each task.
• Measured by "Proportion Tasks Completed per User" and "Average User
Completion Rate"
Time Spent on Each Task:
• How long each participant takes to complete each task.
• Accounts for consistency of a participant in taking a longer or shorter amount of
time on all tasks.
Errors:
• Number of invalid keys pressed or invalid commands.
• Number of lacks of response where response is required, resulting in repeated
prompts.
• Accounting for noise, including side comments.
• Number of unrecognized voice commands.
Assists:
• Number of requested assists.

The quantitative metrics we will analyze include completion rate, time spent on each task,
error rate, and assist rate. Completion rate is calculated based on how many participants
complete each task. Completion rate consists of both the proportion of tasks completed per
user and the average user completion rate. We will also ask participants questions in the
qualitative survey about tasks that took longer to complete -- why it took longer and how
we might improve the system.

Time spent on each task is calculated based on how long each participant takes to complete
each task. Some participants may dedicate more time to each task than other participants.
We will weight the results to account for consistant participant dedication to all tasks.

Error rate is calculated based on the number of invalid keys pressed or invalid voice
commands, the number of lacks of response where response is required (resulting in
repeated prompts), and the number of unrecognized voice commands. These errors will be
divided into participant-based error and system-based error. We will account for noise, such
as side comments, in our analysis.

Finally, assist rate is calculated based on the number of requested assists. We will explain to
the participant before the tasks that we are unable to assist them, but we still anticipate
participants requesting assistance if they become confused. We will signal to the participant
that we cannot assist them, and will stop the experiment if they are obviously unable to
complete the task. Any signal from the user requesting assistance will be counted as an
assist.

Qualitative Metrics

Reality Degree of Experiment


• Was there anything you would change about the question you asked?
• On a scale of 1 to 5, with 1 being least important and 5 most important, how
important would you say the question you asked is to your work? Why?
Work Flow Disruption Level
• Would you call the Farmers’ Exchange phone number while you were working
outside, when you are not working, both, or neither? Why?
• What activities can you imagine yourself doing when you decide to ask a question
like this?
Satisfaction/Engagement/Interest
• What kinds of questions would you have after you received the answer to your
question? How would you look for answers to such questions?
• How can we make Farmers’ Exchange easier to use?
• What did you like most about the system?
Frustration
• What did you like least about the system?
Likelihood to Use Again
• Would you call this number again?
• Would you use the browse feature again? In what instance can you imagine using
it?
• Which browse feature (keyword search or topic browse) did you prefer? Why?
Likelihood to Tell Others
• Would you tell your friends and colleagues about Farmers’ Exchange? Why or why
not?

The qualitative metrics we will analyze are packaged into a qualitative survey that we will
initiate with each participant after each experiment. These questions can be categorized as
the degree of experiment reality, the work flow disruption level, satisfaction, frustration,
likelihood to use again, and likelihood to tell others.

The degree of experiment reality will alert us to any misrepresentation of reality in the
experiment. Our aim is to mimic a real scenario as much as possible. If we fail to
incorporate a significant facet of reality, the results of our experiment may fail to indicate
likely results in actual use of the system. We calculate the degree of experiment reality
based on the realistic nature of the questions posed to the system.

The level of work flow disruption indicates how well the system fits into the participant's
lifestyle. Our goal is to design a system that fits seamlessly with the participant's routine.
The participant may reject the system if the participant's perception of the value added by
the system is less than the level of work flow disruption. We calculate the level of work flow
disruption based on the environment in which the participant can imagine using the system
[this is after testing the system].

Engagement or interest in the experiment indicate the participant's level of satisfaction with
the prototype. We measure satisfaction based on the amount of detail the participant
provides in how they would continue to use the system. Frustration is measured by the
amount of detail they provide in how they would improve the system.

Likelihood to use again, more specifically than level of satisfaction, indicates the
participant's willingness to use features again. Likelihood to tell others about the system
indicates the participant's identification with the system. If the farmer is unwilling to be
associated with the system, this alerts us that the system has significant failures that must
be remedied. We would follow up on this question to determine why the farmer is not
satisfied with the system.

In addition to the survey questions, we may also ask qualitative questions after each task to
clarify the participant's responses.

Conditions of Success
We are using a composite measure of success which combines values from the different
metrics.

Although we are interested in the results from all our metrics, the three most important
ones we will be looking at are high satisfaction, high likelihood to use again, and high
likelihood to tell others, as these three seem most correlated with how widely Farmers'
Exchange will be adopted. We currently have no numerical benchmark, as we do not know
what the average is and therefore cannot compare. Our first experiments will most likely
provide numbers for benchmarking, and we will compare later experiments with the first to
see if there is any improvement.
Supporting Materials
• Qualitative survey
• Quantitative checklist: One team member will be timing the user according to the
metrics on the checklist
• Timer for quantitative data
• Video/audio recording equipment: We plan on recording the user's face and
recording his or her entire interaction with Farmers' Exchange
• Script with the scenario, if the user is ask-first or browse-first, and scenarios for
helping confused users
• Informed consent/permission forms
• Phone with speaker phone: We will put the user on speaker phone so the entire
team can hear the same prompts the user does