You are on page 1of 8

Scaling Learning Analytics across Institutions of Higher Education

SEI Case Study

May 2013


Marist College (lead), a private, highly selective comprehensive liberal arts college with approximately 6,300 students. Partners: Savannah State University, the oldest public HBCU in Georgia, with 4,550 students; Cerritos College, a public comprehensive community college in California with 21,340 students; and College of the Redwoods, a public two-year community college in California with 7,600 students May 2011 to January 2013 Josh Baron, Senior Academic Technology Officer, Marist College,; Maliece Whatley, OAAI Project Coordinator/Accounting Instructor, Savannah State University,; JoAnna Schilling, Vice President for Academic Affairs, Cerritos College,; MaryGrace McGovern, OAAI Project Coordinator, College of the Redwoods,

Timetable: Contacts:


The Open Academic Analytics Initiative (OAAI), led by Marist College and funded under a Wave I grant from Next Generation Learning Challenges (NGLC), seeks to increase college student retention by performing early detection of academic risk using predictive analytics. The emerging field of learning analytics raises many questions regarding the degree to which predictive models built on data from one institutional type and student population can be effectively deployed at different kinds of institutions. In initial research, OAAI examined the degree to which a model built using data from Marist College would compare to the original model built at Purdue University, which led to the development of Course Signals. In general, we found the models to be statistically similar in that they had the same predictive elements (e.g., cumulative GPA) and similar correlation strengths between these elements and student success in courses.1 For the research project reported in this paper, the OAAI sought to improve understanding of how learning analytics can best be scaled across institutions of higher education. (By learning analytics we mean the use of analytic techniques to help target instructional, curricular, and support resources to support the achievement of specific learning goals.)2 During the spring 2012 semester, the OAAI successfully deployed an open-source learning analytics solution at two community colleges (Cerritos College and College of the Redwoods) and one historically black university (Savannah State University) as a means to further research in this emerging field.

2013 EDUCAUSE. CC by-nc-nd.

1. Project Overview
1.1. Project Goals, Context, and Design
Building on work pioneered at Purdue University, the OAAI developed an open-source predictive model for academic success that draws on learning analytics. Marists open academic early-alert system uses a predictive model based on student demographics, aptitude (student SAT scores, for example), and data from learning management systems (such as data about student log-ins to course sites and the number of assignment submissions). The system uses learning analytics technologies and resources that include the Sakai Collaboration and Learning Environment, which was the LMS used in the study (the University of the Redwoods and Cerritos College already used Sakai, and Marist hosted Sakai for Savannah State University). The system also uses Pentaho, an open-source business intelligence suite that includes tools for data mining, analysis, and reporting. In addition to these technologies, we have also released a number of resources under an open license, including the OAAI-developed predictive model and intervention strategies that leverage open educational resources. We sought to explore the portability between institutions of the OAAI predictive model. Because predictive models alone will not impact student course completion and retention rates without being combined with effective intervention strategies aimed at helping at-risk students succeed, we also investigated the effectiveness of two different intervention strategiesawareness messaging and the Online Academic Support Environment (OASE)in increasing student content mastery, course completion, and semesterto-semester persistence. In our awareness messaging intervention, students who are identified by our predictive model as being at risk to not complete their course receive a standardized message from their instructor, making them aware of the concern and suggesting how they might improve (e.g., meet with a tutor, take more practice exams, etc.). With the OASE intervention, students receive a similar message but are invited to join an online support community that has been designed to aid academic success. The OASE leverages Sakai Project Sites to provide students with resourcesincluding open education resourcesfor remediation and study skill development, facilitation by a professional academic support specialist, and student mentors who serve as peer coaches. To research the effectiveness of these two intervention strategies, course pilotswhich included control groupswere conducted at our partner institutions using the OAAI early-alert system.

1.2. Data-Collection Methods

Conducted during the spring 2012 semester, this research focused on data from 1,379 students at the four participating institutions who were enrolled in introductory-level courses in several different disciplines in which, generally, the same instructor taught three sections of the course. Of these students, 67% were considered low-income, as measured by Pell Grant status (a population of particular interest to our research team). The three course sections were assigned to be a control group or one of two treatment groups. For the control group, no interventions were deployed, although data on student performance and predictions from the OAAI early-alert system were collected. The other two sections were treatment groups and received either the awareness messaging or OASE intervention based on predictions from the OAAI system. This research design allowed us to compare student performance between our control and treatments group as well as between the two different intervention approaches.

Our work involved three primary data sets: student aptitude and demographic data from the institutions student information systems (SIS); event log (e.g., course site visits, discussion postings) and grade book data from the Sakai LMS; and a student survey that was administered for every course at the conclusion of the semester. Two semesters worth of student aptitude and demographic data as well as event log and grade book data were initially extracted from Marist Colleges SIS and LMS as a means to develop our OAAI predictive model.3 We also administered a student survey and conducted a series of focus groups with project participants at Savannah State University; this was not part of our original proposed research plans, and the data are still being analyzed. Additional focus groups were beyond the scope of our funded work, and thus we did not conduct them at the other institutions. The student survey consisted of approximately 90 questions that covered topics ranging from student attitudes toward the use of learning management systems and their satisfaction with their course to the level of engagement they had with their institution and faculty. These questions were used, with permission, from previously validated survey instruments, including the National Survey of Student Engagement (NSSE).

1.3. Data-Analysis Methods

Our initial OAAI predictive model was developed through a data-mining process that was applied to a large data set that included all of the Marist College Sakai LMS event log and the student demographic and aptitude data from two full semesters. A range of data-mining and machine-learning techniques were used to optimize the predictive power of the model. Once an initial model was developed, it was further enhanced by retraining it using grade book data from the LMS, a data set that was not originally included in the research conducted at Purdue University but that showed promise for improving the predictive power of the model. The OAAI enhanced predictive model was evaluated using a Marist test data set that had been excluded during the development of the model. Using data that was not used in the development process is important in predictive model validation because it ensures testing is done with completely unbiased data. This testing identified a number of predictive elements that were the most strongly correlated to student success in a course (e.g., earning a grade of C or better). The top-five predictive elements were semester GPA (from the previous semester), current grades in the LMS grade book, cumulative GPA, number of times content was viewed, and verbal SAT scores. It is important to note that when processing LMS event log data, a metric was used that compared specific student data to the class average. For example, rather than just looking at the number of times a student viewed content, we looked at how many times that person took this action as compared to the average in the class. This approach helped us address differences in student expectations regarding use of the LMS, which can vary a great deal from one instructor to another. This model was then deployed in spring 2012 at our three partner institutions, which included community colleges and one of the historically black colleges and universities (HBCUs), to compare its efficacy in academic contexts that differed significantly from that of Marist (a private four-year liberal arts institution). The models performance was evaluated at three points during the semester (one quarter of the way, halfway, and three quarters of the way through the semester) to evaluate how the models performance improved as more LMS and grade book data became available over the academic term. To determine this, we used data from our control groups (who did not receive any interventions) and compared our predictions to the final outcome in these courses (that is, did the model accurately predict which students would not complete the course successfully?).

Based on this analysis, we determined that the accuracy of the model remained largely intact when we deployed it at our partner institutions. For example, at the three-quarter point of the semester, we found the accuracy to be 7579%. This was only 610% lower than the accuracy of the model when tested with Marist data (8687%), which was historical and thus represented 100% of the semester completed. Given that we expected a much larger difference between the models performance with Marist data and its performance at community colleges and HBCUs, this was an encouraging finding, suggesting that predictive models developed at one institution could be shared with others. Our current theory about why models are more portable than expected relates to the specific predictive elements used by the model. For example, predictive elements such as cumulative GPA and grade book scores are generally predictive of student success, regardless of the type of student population or institution. In other words, whether you are at a four-year research institution or a local two-year institution, how well you do on the first couple of quizzes in a particular course will generally be predictive of your overall success.

1.4. Findings
In assessing our two primary research areasthe portability of predictive models from one academic context to another and the effectiveness of different intervention strategieswe have so far found the following: The predictive model developed using student data from Marist College shared many of the predictive elements (e.g., cumulative GPA) and correlation strengths with the predictive model developed at Purdue University. The accuracy of the predictive model built using Marist student data performed considerably higher than random selection (a standard benchmark for assessing predictive models) when deployed at both community colleges and HBCUs, which exceeded our expectations. Figure 1 shows the accuracy of the predictions at each partner institutions (CC = Cerritos College, SSU = Savannah State University, RC = College of the Redwoods) as a function of the numbers of weeks into the semester when the predictions were made.

Figure 1. Predictive Model Accuracy

85% 80% Accuracy of Predictions 75% 70% 65% 60% 55% 3 weeks 6 weeks 9 weeks RC SSU CC

Number of Weeks into the Semester

We found a statistically significant difference between mean course grades when comparing all students in our two treatment groups (awareness and OASE) to our control groups; we did not find significant differences between the two treatment groups. We plan additional research to investigate several hypotheses about why neither treatment group outperformed the other. Figure 2 shows the average course grade (out of 100%) for students in our control group and in the treatment groups. The graph also contains the results (F[2,448] = 8.484, p =.000*) of our oneway ANOVA test for statistical significance.

Figure 2. Refined Course Grade Analysis

75% F(2,448) = 8.484, p = .000* 70%

Mean Course Grade


60% Awareness *Statistically Significant Difference OASE Treatment Groups Control

We found statistically significant differences in content mastery (a C grade or above) and withdrawal rates between our combined treatment groups and our controls. When analyzing performance among only low-income students (as measured by Pell Grant status) we found similar trends in overall course grades, content mastery, and withdrawals, although these trends were not statistically significant due to the smaller sample size. We expect to find statistically significant findings when we include data from our fall 2012 pilots, which will increase the overall sample size.

These findings are noteworthy because they indicate that predictive models used in learning analytics are more portable then initially anticipated and that our intervention strategies were effective in improving student outcomes.4

1.5. Communication of Results

A range of stakeholders were associated with our project, and we worked to align our communication approaches to match their needs. Our key project stakeholders included four groups: Institutional decision makers/executives: With support and guidance from NGLC staff, we developed a four-minute presentation that provided an elevator pitch overview of our project and its benefits. In addition, we also developed a two-page handout that provides a high-level overview of our work and identifies its strategic implications (e.g., improving course completion rates). We have also presented our work at executive summittype events, which tend to draw these stakeholders. Faculty: We developed short (12 page) project overview documents designed to help recruit faculty for the project and more extensive findings reports (710 pages) that provide faculty with updates on our research efforts. Sakai community: We have given presentations on our work at Sakai conferences. We have also maintained a comprehensive set of project documentation and resources on the Sakai Wiki, which is a central resource used by all Sakai community members (see Finally, we have held several community webinars to update those interested in adopting our work at their institutions.

Researchers in learning analytics: We have become engaged with the Society for Learning Analytics and Research (SoLAR) and have presented our work at their Learning Analytics and Knowledge conferences as well as published research papers through SoLAR.

1.6. Influence on Campus Practices

The impact on campus practices and strategy has varied from institution to institution and is still evolving. Here are a few examples of the projects influence to date: Because of the success of the project and the potential that Marist College sees in this technology, we have created a new learning analytics specialist position in the Office of Academic Technology and eLearning. This role, which has been filled by one of the lead graduate students who worked on the project until graduating recently, will help us expand our learning analytics work with other institutions and apply what we have learned internally. For example, we are now working to create a highly customized version of our predictive model for use in one of our online graduate programs. Several of our partner institutions have expressed interest in continuing to use the OAAI earlyalert system. We are currently in discussions about how to best support expansion of our work. We are considering expanding our work to other institutions, particularly community colleges, which have learned of our work through our communication strategy.

An instructor from Cerritos College put the value of the OAAI into practical context: Not only did this project directly assist my students by guiding them to resources to help them succeed but it also changed my pedagogy. I became more vigilant about reaching out to individual students and providing them with outlets to master necessary skills. This semester, I received the highest volume of unsolicited positive feedback from students, who reported that they felt I provided them exceptional individual attention.

2. Reflection on Design, Methodology, and Effectiveness

2.1. Project Design, Data Collection, and Analysis
Overall, the outcomes of OAAI far surpassed our expectations, given that we were able to obtain statistically significant results in a project of relatively short duration involving multiple partners. At the same time, we encountered a number of challenges along the way and learned our fair share of lessons. Some of the factors that we believe led to our successas well as some important lessons learned include the following. Robust research design: Although a perfect research design is almost never achievable when working with human subjects, the design we used was instrumental in our overall ability to produce valid results. Specifically, our approach of using three sections of the same course taught by the same instructor helped us remove potentially confounding variables due to different teaching styles and course content. It also helped ensure some level of consistency in how interventions were delivered. Finally, the number of pilots we conducted enabled us to achieve a sample size that was large enough to allow for statistically significant outcomes. IRB procedures and data security: Because of the sensitive nature of the data we were working with, early in our project we dedicated significant time to working with our Institutional Review Board (IRB), registrars office, and Institutional Research and Planning office to ensure

we had the proper protocols in place. As we expanded the project to our partner sites, those institutions worked with their IRBs or related offices to ensure that all local requirements were met. Investing this time up front helped avoid potentially significant delays and challenges later in the project when we were implementing our pilots. Coordination in a distributed research project: With four institutional partners in different parts of the United States, coordination and communication were vital to our success. To address this challenge, we asked each partner to identify a local coordinator who acted as a liaison to Marist College. Local coordinators participated in monthly phone conferences, received regular e-mail updates, and participated in a face-to-face meeting associated with an EDUCAUSE conference. One lesson learned about coordination was the timing of the face-to-face meeting, which took place six months into the project. Holding this meeting earlier in the project would have benefited the group dynamics and further improved communication. Automation in data collection: Instructors were provided with Academic Alert Reports three times per semester, which listed students who were at risk not to complete their course successfully, along with ratings indicating the degree of confidence the model had in its own predictions. Based on this information, instructors were allowed to decide whether an intervention should be deployed, thus acting as a human filter to ensure interventions were not sent if circumstances (such as a death in the family) did not warrant them. To track which students had actually received interventions, instructors were required to keep research logs using Excel spreadsheets. As we analyzed this data, we found we had to manually code it because of the range of ways in which instructors recorded their information. This manual coding process was rather labor- and time-intensive and led us to subsequently create dropdown menus in the spreadsheet that forced instructors to select from standardized responses. This standardization allowed us to quickly convert their logs to numerical data for analysis, saving significant time. Clarity in survey questions: As part of our student survey we asked several questions that used project-specific terminology. In reviewing student responses, we realized that they were not always aware of the meaning of this terminology, and this invalidated their responses. In later versions of the survey we included definitions that helped avoid this confusion.

2.2. Effectiveness and Influence on Campus Practices

Given some initial skepticism on the part of faculty members regarding the value of learning analytics, the results from our research have helped instructors understand its power and how this emerging technology can benefit their students. For example, many have commented that, given the constraints of time and other resources, the ability to know which students are in greatest need of academic assistance has been invaluable. Although more successful at some institutions than others, the Online Academic Support Environment has also proven to be a valuable resource for students, with some instructors opting to expand and continue this work on their own beyond the duration of our project. Without our research findings, it might have been much more difficult to influence instructors and convince them of the benefits of this technology. We have also worked since early in the project to influence campus-level practices and priorities through engagement with senior institutional leaders. In most cases, when we started our project each institutional partner identified a senior executive sponsor, and these individuals ranged from vice presidents of academic affairs to chancellors. Although they were not involved in the day-to-day operational activities of the project, these sponsors received updates on our work and initial research findings. As we move out of the grantfunded phase of the project, these senior leaders are now working to continue our initiatives on their

campuses and are reaching out to their colleagues at other institutions to suggest they consider adopting learning analytics. Our research findings have provided these institutional leaders with vital evidence of the success of learning analytics, without which it would be difficult for them to convince others of its value.

3. Supporting Materials
Open Academic Analytics Initiative (OAAI) Sakai Wiki ( OAAI work is documented on this page, which is also the location of related materials released under open licenses. Laura, Eitel J. M., Joshua D. Baron, Mallika Devireddy, Venniraiselvi Sundararaju, and Sandeep M. Jayaprakash. Mining Academic Data to Improve College Student Retention: An Open Source Perspective. In Proceedings of the 2nd International Conference on Learning Analytics and Knowledge. New York: ACM, 2012, 139142. Laura, Eitel J.M., Erik W. Moody, Sandeep M. Jayaprakash, Nagamani Jonnalagadda, and Joshua D. Baron. Open Academic Analytics Initiative: Initial Research Findings. Presented at LAK 2013, April 812, 2013, Leuven, Belgium. Josh Baron and Kimberlee Thanos. Harnessing the Power of Technology, Openness, and Collaboration. EDUCAUSE Quarterly, December 15, 2011.

1. Eitel J. M. Laura, Joshua D. Baron, Mallika Devireddy, Venniraiselvi Sundararaju, and Sandeep M. Jayaprakash, Mining Academic Data to Improve College Student Retention: An Open Source Perspective, in Proceedings of the 2nd International Conference on Learning Analytics and Knowledge (New York: ACM, 2012), 139142, Angela van Barneveld, Kimberly E. Arnold, and John P. Campbell, Analytics in Higher Education: Establishing a Common Language, ELI Paper 1: 2012, January 2012, The SIS and LMS data used to build our predictive model, as well as the data from our partner institutions used during the course pilots, was highly confidential, FERPA-protected information, which required us to follow a data-security protocol in all aspects of our work. The first step in this protocol was the removal of all student identifiers (e-mail addresses, campus ID numbers, etc.) from our data sets prior to the data being provided to us. This was accomplished by first creating a Master Student Identification Key, which correlated the student identifier with a randomly generated 32-character-and-number string combination (which we referred to as the secret code). This Master Key was then used to scrub or clean the data sets by removing the student identifier and replacing it with the secret code. The work of scrubbing the data sets was done by each institutions Office of Institutional Research and Planning, which also secured the Master Key by first encrypting the file and then placing it on a signal-secure computer in their office in case it was needed for reference purposes in the future. Finally, the scrubbed data sets were transferred to Marist College through a secure 256-bit encrypted HTTPS protocol as an extra measure of precaution. Taking this approach of removing student identifiers also qualified our research as secondary data analysis, which meant that it did not require review by an Institutional Review Board. That said, we did review our security protocol with the Marist IRB and received confirmation that we did not need an IRB review for this aspect of our work. Moreover, we filed a full IRB application related to our intervention and survey work and worked with our partner institutions to ensure their internal IRB requirements were also met. Findings drawn from Eitel J.M. Laura, Erik W. Moody, Sandeep M. Jayaprakash, Nagamani Jonnalagadda, and Joshua D. Baron, Open Academic Analytics Initiative: Initial Research Findings, presented at LAK 2013, April 812, 2013, Leuven, Belgium.

2. 3.