1. Outline
This note describes hazards and errors in the evaluation of web- and computer-based educational materials. Brief summaries follow:
Section 2: Preconceptions
The only certainty about the future of education is that it will be different from the past. Preconceptions are more likely to mislead than help, and in particular traditional classroom procedures should not be taken uncritically as models for web materials.
This is illustrated with a discussion of homework and class attendance.
Section 3: Data and statistics
Sophisticated statistical methods lull us into believing that what we want to know can be extracted from the data we collect. Or at least we can test the extent to which this is true. However this seriously underestimates the complexity of real students and the range of motivations and behaviors invisible to our data.
This is illustrated by a discussion of sub-populations with different characteristics, and assessment of practice work.
Section 4: Resource constraints
Most educational systems are highly resource-constrained. Methods that require more resources cannot be widely adopted no matter what their virtues, and consequently assessments that ignore resource requirements have little relation to real-life potential.
This is illustrated with computer-based precalculus courses that are no better in educational outcomes but are more efficient and release resources that profoundly benefit the rest of the program.
2. Preconceptions
It is widely believed that homework is beneficial. The study Webwork effectiveness in Rutgers calculus by Chuck Weibel and Lew Hirsch (2002) is cited as showing this is true for web-based assignments. In this study a number of sections were involved, some using the system and others not, with a common final exam. Careful statistics showed a correlation between doing well on the homework and doing well on the final, and this correlation was strongest for freshmen.
The WebWork homework system provides a static set of exercises that are generally not challenging, and students can keep working and checking scores until they get a perfect score. These are nearly free points; why would any student not take advantage of this, particularly if there is no mechanism to prevent them from getting help? Actually it is only free if you count student time as worthless. Most students value their time and the fact that most won't do the work suggests they see the benefit and credit as not worth the time. Freshmen may be more tolerant because boring homework and point accumulation are facts of life in high school.
Does good homework cause good exam performance? Or do they correlate (roughly) because students with high grade goals both learn the material and are willing to do the busywork? The study design and analysis do not distinguish these possibilities. The study concludes that WebWork is effective because causality is assumed.
Another possible conclusion: most students do not seem to feel WebWork exercises - even with guaranteed credit - are worth their time. Is there some other approach to practice work that they would see as more beneficial? Or in other words, could we make better use of their time?
Data from a course with opportunities for non-credit practice but no homework at all in the standard sense leads me to believe this second interpretation is more valid. In this case more emphasis on WebWork would probably be counterproductive. In any case the study depends on an unexamined preconception.
Another preconception is that class attendance is beneficial. Non-attendance penalties are often used to generate a correlation with grades but actual benefit is largely untested. I had several hundred students go through a computer-tested course with supporting web materials. They were told they should come to class if they found it useful but attendance was optional. For analysis they were put in three groups with the first group attending fewer than ten percent of class meetings. Grade distributions were virtually identical. Note that comparing distributions is much more sensitive than averages that may hide balancing increases in high and low grades.
Cautious conclusions: in this college calculus course some students found classroom contact beneficial, some didn't need it and didn't need to be forced to come. In this course, at least, the preconception that everyone should come to class was not justified.
A final preconception: students entering courses should be filtered (via placement tests) to ensure they have the necessary background and skills. In fact student skills are somewhat plastic and a different approach is a "gateway". Students who do not pass immediately have a week or so in which to get up to speed and retry the test. Gateways are tougher than placement tests and the objectives are positive (ensure good skills) rather than negative (weed out likely failures).
3. Data and Statistics
Statistical analysis destroys most of the information in a data set. This is a good thing if all important parameters are measured and the information destroyed is irrelevant "noise". Unfortunately this is rarely the case: parameters not measured are presumed to be unimportant because they are invisible, and information destroyed is defined to be noise. It is hard to show that something important is missing but there are some illustrations.
In the homework study referenced above they report:
"It quickly became apparent that three sub-populations of students had different profiles. First-year students made up the largest sub-population ..... 'non-repeaters,' consisted of upper-class students who were taking calculus for the first time. The third sub-population consisted of students who were repeating the course.. "
Many investigators, particularly with smaller sample sizes, would not have made these distinctions and to the extent that different sub-populations need different treatment they would have reached faulty conclusions. This was a non-science calculus course. In a science and engineering calculus course I found an additional distinct sub-population: freshmen out of sequence because they got advanced placement. In this example sub-populations that would usually be overlooked could be identified using the data at hand and we can compare the results of using or discarding the information.
In other cases there is little or no data. For example in a course with tests that could be taken multiple times for credit the strongest correlate with performance seemed to be procrastination! There seemed to be a sub-population (about ten percent) who could wait until the last minute and do fine. Either they knew the material very well or they worked best under pressure. Outside this population there seemed to be an additive negative consequence for putting off a test to the last minute. This makes sense but was unexpected because it is far outside the things usually measured and it was found by chance. At any rate the conclusion was that developing psychological incentives to take tests early had a very high priority.
Student time is another example of an invisible but vitally important factor. Students instinctively do cost/benefit analysis and are reluctant to do tasks that do not use their time effectively. Sub-populations with time constraints (jobs or important dating) are particularly intolerant. This seems to be impossible to measure even by proxy. I believe that non-credit practice tests with solution hints and links to web reference materials are more efficient and effective than for-credit homework such as WebWork. I believe this because I have watched students use the materials in our computer lab, and talked with them. On the other hand I have records of when and how many practice tests were used, and the scores, and see no signal at all in this data. Scores for instance are typically zero: practice tests are used mainly as study guides and since there is no credit or penalty they don't bother to fill them out. Students who get fifty or more tests are probably looking for patterns or systematic omissions that would let them skip something. I believe that traditional homework is relatively ineffective but - because there is credit attached - there is at least a signal in the data.
The conclusion is that developing new ways of learning is going to be delicate and complicated, and current use of data and statistics are not likely to be good guides.
4. Resource constraints
At one time it was the policy in my department to have some computer involvement in every class. The involvement ranged from demonstrations to labs. Computer consoles and projectors were put in nearly all classrooms and a few had computers for each student. It quickly became clear that this took a lot of faculty time, and not just learning curves and development of materials, but additional time every time the facilities were used. There was no possibility of compensating by reducing teaching loads or class size. We did not get a sense of whether computer work was effective but this didn't matter: the additional workload was unsustainable.
I yet to see an educational study that even acknowledged resource constraints. The words "... this study was supported by a grant ..." in most cases mean "... these results were obtained using unrealistic and unsustainable resources. Even if correct they are probably not relevant to real life."
The urgent questions are:
- Can we get better outcomes with the same resources?
- Can we get similar outcomes with fewer resources?
It is essential that "resources" include faculty time. Ideally it would also include student time. Add-ons for instance are never cost-effective unless something is omitted to balance time requirements.
In the current system the focus is on outcomes and resources are disregarded. An answer to the second question would be evaluated negatively. Ironically this is probably where the greatest short-term potential is.
In my department precalculus courses are now computer-based and more than 10,000 students a year take these courses. There are no teachers in the traditional sense but there is a large and vitally important help staff so students can get more - and more timely - personal help than is possible in a traditional course. I don't believe educational outcomes are better but the system is much more efficient. The cost per student is significantly less than traditional classes with adjuncts or graduate students, and well under half the cost of professorial-rank faculty.
During the same period math enrollments have risen and budget cuts have forced a significant reduction in math faculty size. Simple arithmetic shows that if we were still teaching traditional classes then class sizes would be higher, low-enrollment graduate courses would have to be curtailed, and most of the faculty would have higher teaching loads. Instead - thanks to the computer-based course efficiency - advanced undergraduate classes and the graduate program have not been effected and teaching loads for some research faculty have actually been reduced.
The conclusion is that the computer-education effort has been enormously beneficial, but this can only be seen in the context of the entire departmental mission.
Please see the essays and reports at Frank Quinn's web page for details and additional information.