(really bright) College Students

I’ve been teaching college students since 2003. I’ve taught at six different schools and I have met some pretty bright students. This (brief) post is intended to give some general thoughts on college students and some more specific thoughts on a few students with whom I have had the opportunity to work closely.

First, the bad news. From my perspective, too many students come to college without any real desire to actually be “in” college. Oh, they enjoy the social aspects of college, but the idea of learning holds no real appeal for these students. I regularly teach fairly large sections of introductory economics courses and I would guess that as many as 40 percent of my young students should not be enrolled in college. It’s not that they are not necessarily smart enough to be in college, they just are not mature enough nor mentally ready to be there. These are the students who regularly miss class, fail to complete assignments, or submit assignments late. These students do little studying, rarely read the text (that’s another story, by the way) and generally perform poorly on exams. Maybe as many as 20 percent don’t care at all. They do about 20 percent of the work, but they don’t complain (although a few do when they receive their course grade). Another perhaps 20 percent care more, but not enough to keep track of their coursework.

I could go on and on about the students who really need to spend a year or two, after high school, living in mom’s and dad’s basement and working at Taco Bell (after all, I’m the guy who missed nearly 30 days a semester in high school, graduated 410th out of 509 and had a 2.0 GPA, but spent four years in the army), but I want to focus on a few other students.

There is ample research that suggests a close, mentoring relationship between students and college professors offers a profoundly positive experience for the student. One such example is Bernier, A., Larose, S., & Soucy, N. (2005). Academic mentoring in college: The interactive role of student’s and mentor’s interpersonal dispositions. Research in Higher Education, 46(1), 29-51. However, rather than focus on the literature, I’d rather look at an article like The Blown Opportunity. That article reported on a survey of college students that examined mentors in college.

That article discussed results of a poll released by Gallup, which essentially asked if students ended up with “great jobs and great lives.” The article summarized by noting that “[f]eeling supported and having deep learning experiences during college means everything when it comes to long-term outcomes after college.” The three most important indicators of future success all involved having a significant link to at least one college professor. Admittedly, the article implicitly recognizes that given student-to-faculty ratios, it is not feasible for every student to have a close, supportive relationship with a faculty member, but the importance is not diminished.

This post discusses my experiences.

I have largely held one-year positions since I finished graduated school and it is pretty much impossible to establish close relationships over two semesters. I have been in my current position nearly three years and during this time I have been able to work closely with several students. I teach mostly larger sections of principles of microeconomics, which is largely populated by freshmen. Freshmen students, in general, aren’t ready to work closely with faculty. By and large they lack clear goals, they are still exploring their new personal freedoms and school life is pretty much a social experience for them.

However, I have been assigned to teach a few upper division, elective courses (law and economics is the primary one) and this is where I met most of the students with whom I have worked closely. So far there have been six such students. These students have all been extremely bright and motivated (or at least until the semester they graduate, then they tend to run out of steam). I have taught independent study courses with five of the six. One student I met in Virginia. We kept in touch and he told me he wanted to research and was wondering how to go about it. I suggested that he could contact faculty at the school he was attending (although I did warn him that they would likely ignore him). I mentioned that I had some ideas and, if he was interested, we could work on one together. We presented an early version of our paper, “An Examination of Crime and the Macroeconomy—A New Framework” at the Western Economic Association International 2016 Conference in Portland. He just accepted his first, post-graduation job.

Every student with whom I have worked closely has been very curious and open about knowledge and learning. All have worked on a research project and five of the six have worked on serious, co-authored research projects with me. Although they lack experience and specific skills, undergraduates are able to learn what is research and how to do it properly. Working with them is a lot of work for me because when they start out they do not really understand the quality of the process, but it turns out that they learn a lot and are very willing to put in a significant amount of time and effort. So far, three projects have resulted in conference presentations (in addition to the WEAI, I have taken students to the Southern Economic Association 2015 Conference in New Orleans, and the Eastern Economic Association 2016 Conference in Washington D.C.) and the fourth will happen fairly soon. Publications should be coming soon as well.

This work requires a significant amount of  time and effort and, combined with the six classes and nearly 1000 students I teach each year, it slows my personal research agenda to a crawl. Overall, however, it is has become an intense source of pride and accomplishment having been able to work with such a group of smart, creative, talented and energetic college students.

The last observation I would like to share is the difficulty students face after they graduate. Of the six outstanding students I have discussed, three have already graduated and graduation is approaching this semester for the other three, only two have “reasonable” jobs that match their potential. Two have definite plans to continue on to graduate school, but the others have and are struggling. I think college faculty could help with that. We have the ability to make personal contacts among regional businesses (most students want to “stay local”) that can be used to develop internships and entry-level jobs for these exceptional students. Personally, I would like to see more of that. Many college professors develop consulting contacts they use for personal gain. How about sharing that wealth?


Course Evaluations (SETs)

It has been a few years since I have very carefully read student evaluations of the courses I teach. Why? That’s the easy question. By and large college students have no experience with anything, so why should I pay any attention to how they view the courses I teach or how I teach them? The difficult question is “Why do colleges pay so much attention to student course evaluations?” That’s a more interesting question.

It has been argued that course evaluations address four areas

1) diagnostic FEEDBACK to faculty about the effectiveness of their teaching; 2) a measure of teaching effectiveness to be used in PERSONNEL DECISIONS; 3) information for students to use in INSTRUCTOR/COURSE SELECTION; 4) an outcome or a process  description for RESEARCH ON TEACHING; Marsh, H. W. (1987). Students’ evaluations of university teaching: Research findings, methodological issues, and directions for future research. International journal of educational research, 11(3), 253-388.

Interestingly, there have been hundreds of research papers published on (students’ evaluations of teaching effectiveness) SETs. The literature generally recognizes two purposes: (1) formative and (2) summative. Formative assessments deal with improving individual teaching methods while summative assessments deal with evaluating the teacher.  Before I delve more deeply into the literature, I have to confess that something bothers me about this. I have been teaching college classes for about 14 years and I would not be comfortable with giving college students control over much of anything. Why? First, in general, they are immature. There is a physiological question whether their brains are fully developed and, frankly, their behaviors are bizarrely predictable. For example, last week included two of the warmest days of the Spring – highs in the 80’s. Fully 40% of my students didn’t attend class because “sun.” I regularly supervise independent study courses with graduating seniors and getting them to work is like pulling teeth without pain numbing medicine.

It is difficult to explain what bothers me about this. When I question SETs, supporters refuse to engage and just mumble things like “students are fair” or ” people have looked at this,”  but they ALL refuse to engage the issue(s). How about this example? Let’s survey every attendee of a live play in the United States. We can ask them questions about the set, the actors, the story, the director, the venue, etc. What do we learn? If I were to take such a survey, I confess that I really have no expertise in anything related to theater. I might be able to identify really bad acting, but beyond that, I’m not sure. I’ve been to many plays and theater performances, but I am not qualified to tell you much more than I like this play or didn’t like that one. The question then becomes is there a relevant explanation behind why I like one play, but not another? College administrators would emphatically argue yes.

For example, ” If student ratings influence personnel decisions, it is recommended that only crude judgments (for example, exceptional, adequate, and unacceptable) of instructional effectiveness be used (d’Apollonia and Abrami 1997). Because there is no single definition of what makes an effective teacher, committees and administration should avoid making fine discriminating decisions; for example, committees should not compare ratings across classes because students across classes are different and courses have different goals, teaching methods, and content, among other characteristics (McKeachie 1997). Algozzine, B., Gretes, J., Flowers, C., Howley, L., Beattie, J., Spooner, F., … & Bray, M. (2004). Student evaluation of college teaching: A practice in search of principles. College teaching, 52(4), 134-141.  These authors continue by noting that “[s]till other authors are critical of using any aggregate measures of teaching performance (Damron 1995; Haskell 1997a; Mason, Steagall, and Fabritius 1995; Sproule 2000; Widlak, McDaniel, and Feldhusen 1973). They argue that an effective teaching metric does not exist and that students’ opinions are not necessarily based on fact or valid (Sproule 2000). Haskell (1997a) suggested that SET infringes on the instructional responsibilities of faculty by providing a control mechanism over curriculum, content, grading, and teaching methodology, which is a serious, unrecognized infringement on academic freedom.” Algozzine, B., Gretes, J., Flowers, C., Howley, L., Beattie, J., Spooner, F., … & Bray, M. (2004). Student evaluation of college teaching: A practice in search of principles. College teaching, 52(4), 134-141.

Like jurors, it appears that students take their responses seriously. Spencer, K. J., & Schmelkin, L. P. (2002). Student perspectives on teaching and its evaluation. Assessment & Evaluation in Higher Education, 27(5), 397-409. However, like juries, students can be wrong or mistaken. These authors further note in the conclusions that students “wish to have an impact but their lack of (a) confidence in the use of the results; and (b) knowledge of just how to influence teaching, is reflected in the observation that they do not even consult the public results of student ratings.”

Jackson, M. J., & Jackson, W. T. (2015). The Misuse of Student Evaluations of Teaching: Implications, Suggestions and Alternatives. Academy of Educational Leadership Journal, 19(3), 165,  examined “how to best use [teaching evaluations] as an indicator of teaching effectiveness.” (at p. 167). The authors address this question resignedly in the face of the recognition that administrators will continue to use SETs as summative measures of teaching quality. Those authors further recommend that SETs be used in a summative capacity to only measure three broad categories: below average, average, and above average. (at pp. 167-68). The sample examined by these authors was found to not be normally distributed, which makes it problematic to use the mean class score as a measure of effectiveness/quality. (at pp. 168-70). The authors conclude that SETS should likely be used as a formative assessment for improvement (as originally intended), but if used as a summative measure of performance, that “[i]t is strongly suggested that the current practice of comparing an instructor’s average score to the average of the department or college be avoided. Instead either a global score or an average of individual dimensions of the SET should be normalized. From this distribution identify the outliers, the faculty members scoring above or below one standard deviation from the normalized mean. It is suggested that those faculty members scoring within the mid or average category be viewed as scoring the same. Statistically, the scores of these individuals are not significantly different. The outliers should be considered and either recognized as exceptionally strong and/or weak.” (at p. 171).

As a previous blog post suggests, I have a problem with the last suggestion, (Learning Outcomes). My fear is that faculty have had time to adjust to being measured by student surveys and have been able to devise strategies, like Kip’s, to artificially increase SET scores at the expense of rigor with diminished learning outcomes. Now, I am obviously making the assumption that a desired goal of attending college is learning and that faculty evaluations should promote that goal. To that end, McCallum, L. W. (1984). A meta-analysis of course evaluation data and its use in the tenure decision. Research in Higher Education, 21(2), 150-158, noted that ” all of these measures attempt to assess degree of student learning as the primary criterion. The techniques range from the most-frequently used common final examinations across course sections to be evaluated (Centra, 1977; Orpen, 1980) to nationally prepared normative examinations (Gessner, 1973).” (at p. 151).  One problem with these approaches is obviously “do they measure learning outcomes effectively?”

For example, when I teach upper division courses, my goals are more unrelated to information and fact delivery. When I teach business law, my primary goal is issue identification rather than rote memorization of legal facts. This is premised on the ideas that (1) anyone can go online and figure out how the law generally treats a specific issue, (2) it is impossible to meaningfully teach a large proportion of the law related to specific legal issues, and (3) students forget much specific information rather quickly. Thus, if you gave a common, standardized exam to my class, they would likely perform worse than a class taught by an instructor who emphasized memorizing legal requirements. However, I would hope that when my students take their places in the work world, they will be able to observe a set of facts and deduce potential legal issues inherent in those facts and would be able to perform basic research to attempt to refine and resolve those potential issues.

With this in mind, Clayson, D. E. (2009). Student evaluations of teaching: Are they related to what students learn? A meta-analysis and review of the literature. Journal of Marketing Education, 31(1), 16-30, conducted a meta-analysis of research that examined the link between learning and SETs. The author started off by noting that “[e]ssentially, no one has given a widely accepted definition of what “good” teaching is, nor has a universally agreeable criterion of teaching effectiveness been established (J. V. Adams, 1997; Kulik, 2001).” (at p. 16).  Some researchers have even argued that good teaching and “the most” learning are not related (at p. 17).  The author further notes that, as with most aspects of SET research, the conclusions of researchers are mixed.  One interesting aspect is the apparent division between “pure” education researchers (those researchers largely employed in education colleges) and researchers who primarily work in business colleges and publish in more business-related journals. Educational researchers generally support use of SETs, while researchers outside education are generally skeptical and tend to believe that any relationship between SETs and learning is accidental (at p. 17).

I share the author’s view that “[i]f both learning and SET are related to good teaching, then SET should be found to be related to learning. A test of this assertion has been hindered by several methodological difficulties, the most fundamental of which is how learning can be measured”  (at p. 18). The author then continues by noting three prominent means used to measure student learning: the connection between SETs and grades, the student perception of learning, and the relationship between grades and learning (at p. 18).

I once surveyed three large sections of principles of micro and asked them what was their primary motivation in class. It was multiple choice and included learning, parental demands, and getting a high grade. 85% of students chose getting a high grade as their primary motivator. So, when the author notes that grades are important to students, I agree (at p. 18). The author’s main point in this discussion is the possibility of a quid pro quo with grades and SET scores. Another issue is the extent to which students reliably perceive their own learning. The author noted prior research found that” [s]tudents’ perceived grades need not be strongly related to their actual grades (Baird, 1987; Clayson, 2005a; Flowers, Osterlind, Pascarella, & Pierson, 2001; Sheehan & DuPrey, 1999; Williams & Ceci, 1997),  (at p. 18). Some of this confusion may be related to what is now termed the Dunning-Kruger Effect The third issue relates to whether or not actual grades reflect actual learning. The author finds the literature to suggest that students’ grades likely do not reflect students’ actual learning, (at p. 19).

The question then becomes, “how do we measure student learning?” Clayson (2009) notes five suggested methods, (at p. 19). These five suggestions are (1) using mean class grades rather than individual grades, (2) common tests across multiple sections controlling for instructor variance, (3) difference in pre- and post-test scores, (4) performance in future classes controlling for student characteristics, and (5) using standardized, subject-specific tests.

Clayson spent a significant amount of the paper discussing the relationship between SETs and course rigor. First, he notes that a number of researchers have found that courses perceived as more rigorous received lower SET scores (at pp. 19-20). Five arguments were posited to explain these results. First, the negative relationship may be the result of methodological artifice (the author”s own research showed that the date of the survey mattered), (at p. 20). Second, the relative level of rigor may be a more appropriate measure than the absolute level of rigor as students do not object if they believe the rigor is appropriate for the course (at p. 20). Third, students may choose to avoid courses known to be more rigorous. One paper, “Wilhelm (2004) compared course evaluations, course worth, grading leniency, and course workload as factors of business students choosing classes. Her findings indicated that ‘students are 10 times more likely to choose a course with a lenient grader, all else being equal'”, (at p. 21). Fourth, some researchers have found a chain where rigor is “positively related to the students’ perceptions of learning, but negatively linked to instructional fairness, which made its total effect on the [SET] negative,” (at p. 20).  Finally, it is possible that students view education differently than researchers assume. “A survey of 750 freshmen in business classes revealed that almost 86% did not equate educational excellence with learning. More than 96% of the students did not cite ‘knowledgeable’ as a desirable quality of a good instructor (Chonko, Tanner, & Davis, 2002). Students do not generally believe that a demand for rigor is an important characteristic of a good teacher (Boex, 2000; Chonko et al., 2002; Clayson, 2005b). Furthermore, students seem to have decoupled their perception of grades from study habits,” (at p. 20).

Given these issues, Clayson (2009) conducted a meta-analysis of the literature with regard to prior studies using common examinations across classes. In general, results show a small, but insignificant, positive relationship between learning (as measured by common exam scores) and SET scores. The studies showing a strong positive relationship tend to be from education/psychology classes located in education or liberal arts colleges and have been conducted less recently, (at p. 24-25). These results were between-class results and did not hold within-class, (at p. 25).

The author provides a single summary explanation, “[o]bjective measures of learning are unrelated to the SET. However, the students’ satisfaction with, or perception of, learning is related to the evaluations they give,” (at p. 26). Finally, the author notes that “[t]o a certain extent, the explanation can be summed up by a rather dark statement about human behavior by the American journalist and author Donald R. P. Marquis, who once wrote, ‘If you make people think they’re thinking, they’ll love you. If you really make them think, they’ll hate you’ (as cited in Morley & Evertt, 1965, p. 237),” (at p. 27). This conclusion is supported by by anecdotal discussion of “Kip” in my earlier referenced post “Learning Outcomes.”

Clayson (2009) did note two papers that used common examination results to try and measure the connection between SETs and learnign outcomes. THe first, Soper, J. C. (1973). Soft research on a hard subject: Student evaluations reconsidered. The Journal of Economic Education, 5(1), 22-26, looked at economics classes that took the TUCE test (the Test of Understanding in College Economics) both pre- and post-course. The author found that SET scores did not significantly explain changes in the pre- and post-TUCE scores and in many cases coefficients were negative.

In the second study, Marlin Jr, J. W., & Niss, J. F. (1980). End-of-course evaluations as indicators of student learning and instructor effectiveness. The Journal of Economic Education, 11(2), 16-27, the authors proposed an educational production function. The authors examined outputs that included ” measures of cognitive performance: grade in the course, improvement in knowledge as indicated by test performance, ability to reason as indicated by performance on test questions requiring application of theory, and retention of knowledge over time. Other measures of output are indicated by changes in student attitudes and time spent on the course,” (at p. 17). Inputs included “three general categories; institutional (I), student (A and E), and teacher (T and V),” (at p. 17). For cognitive ability the authors examined course grades, examination scores, a “gap-closing” measure of pre- and post-TUCE scores, and TUCE scores obtained after a lapse of one semester.

Variable inputs measured relating to the instructors included “teacher personal attributes, and we include such matters as empathy for the student, ability to respond to student needs, effort at teaching the course in an understandable manner, and basic preparation in the subject matter. The course attributes include text selection, method of presentation, examination procedures and policies, and general difficulty of the course,” (at p. 19). The authors examined 289 students across 8 sections of economics classes in the Fall 1978 semester, (at p. 20). The authors found that student-specific variables explained most of student learning, (at p. 23). The authors further concluded that “if there is a correlation between educational output and student ratings of the variable teacher inputs, we can conclude that student evaluations can be used as surrogates for direct evaluation and do indeed measure the level of teacher input. Since the canonical correlations of student ratings and outputs are significant and since the canonical correlation coefficients are reasonably high, we conclude that student evaluations can be used to measure teacher effectiveness,” (at p. 24).

I have a couple of problems with these results. First, the authors suggest a model of a production function involving 5 outputs and multiple inputs that they characterize as fixed, variable, etc. However, they completely abandon this theoretical model as an estimation framework and use it, instead, as an argument for an ad hoc examination of certain variables that “should” influence certain other variables. Second, they use cannonical correlations for estimating these relationships. Cannonical correlations are rarely used in economics because correlation does not equal causation and it feels more like a “throw everything at the wall and see what sticks” procedure. Additionally, cannonical correlations suffer at least three limitation, “(1) the deficiency of the canonical Rc statistic as an indicator of the variance shared by the sets, (2) weight instability and correlation maximization, and (3) problems associated with attempts to partition the sets into correlated constructs,” Lambert, Z. V., & Durand, R. M. (1975). Some precautions in using canonical analysis. Journal of Marketing Research, 12(4), 468-475, (at p. 469). Finally, the authors state that multiple output variables (five) make it problematic to use multivariable regression techniques, but then freely combine output variables in their cannonical analysis.

What are the conclusions from all this? First, there is a lot existing research into SETs and I doubt no college administrator has taken the time to work through it. I have spent several hours over more than a week writing this short blog post and barely made more than a dent in the research record. Second, the research record results are clearly split between SETs are highly valuable and SETs are completely worthless. Third, the literature literally spans nearly 100 years and some more recent papers have suggested that students have changed over time. Just yesterday an article appeared in Gothamist touting perceived and recorded changes in youth labeled “millennials.” If, as some research has suggested, student attitudes toward learning are changing and have changed, then research on SETs from, say, 20-plus years ago is likely no longer useful. As Clayson (2009) pointed out, it was this body of research from education colleges that most strongly supported the use of SETs to evaluate learning. Finally, my own experiences and observations from teaching college for the last 14 years is that, as teachers, we are largely the same. On a 5 point scale, most of us will regularly fall between 3.5 and 4.5 and the instructors that regularly rate below or above that range need to be investigated more carefully. Kip regularly scores close to 5. My beliefs/perceptions align with the general feel from the literature that suggests SETs, when used as a summative tool, should be used only to identify possible poor and excellent performers.

What about the formative role of SETs? In the past I have taken great steps to elicit student feedback to use to improve teaching and course delivery. I have encountered a few issues. Anecdotally,  I have had students spend several minutes completing their SET in my class and offer very detailed recommendations. Once, when I was teaching at Colorado College, I adopted several of the recommendations given by one of principles students. None of them worked. The reason they didn’t work was because he was pretty atypical. For example, most students don’t read the assigned text, or if they do, they do so without much effort or enthusiasm (I surveyed my students about their primary learning/study tools and 10% chose the assigned text). I have, however, received advice from students who ALWAYS diligently read the text. How valuable is such advice in general? So, yes, I am naturally skeptical when I read SET questions like “Objectives for course were clearly presented. ”

I usually teach 400 to 500 students across 3 classes and I have found many (most?) students will not read the syllabus, and many (most?) struggle with paying attention. I receive around 2500 email messages from students during a typical semester and most of the questions are answered in the syllabus, in mass emails I have sent, in announcements posted on the course website, and by statements I have made during lectures. I wrote a research paper that showed students were much more productive when doing out-of-class activities as opposed to in-class activities, so I decided to “bite the bullet” and organize a variety of in-class, extra-credit activities. During one such activity, as soon as it started 40 out of 180 students got up and walked out the door. One student came to my office hours to complain that the exercise seemed pointless (admittedly, it’s hard to coordinate 180 students by yourself), but pointless? I followed up these exercises by surveying the classes whether they preferred doing in-class exercises for extra credit, or listening to me lecture and 65% chose lecture.

I am convinced that students love chalk-and-talk precisely because it does not require them to think. They can sit, look at their smartphone, text their friends (#fomo), watch YouTube vidoes, etc. In-class exercises require thought, exertion, activity. When I was in Virginia I taught a Law and Economics class, which was largely lecture. Honestly, even I felt bored so I read my SETs with trepidation. The students really liked the class! I continue to consistently try new things to (hopefully) increase learning outcomes, but reading and acting on SETs is not high on my list.

Luck in Economic Analysis

I recently read a couple of articles about the role of luck applied in economic analysis. The first was by Moshe Levy. Levy’s article noted research he had done that showed the effect of luck on executive compenation (Levy, M. (2016). 90 Cents of Every’Pay-For-Performance’Dollar Are Paid for Luck. Browser Download This Paper) and earlier research examining the link between luck and wealth inequality (Levy, M., & Levy, H. (2003). Investment talent and the Pareto wealth distribution: Theoretical and experimental analysis. Review of Economics and Statistics, 85(3), 709-725.). The other article by Bob Henderson was an interview with Robert Frank also about the role of luck in being successful. Frank recently published a book (Frank, R. H. (2016). Success and luck: Good fortune and the myth of meritocracy. Princeton University Press) in which he discusses the connection between luck and success and how they relate to the notion of a meritocracy.

In a sense, both articles were attacking the notion of self-driven success. Levy looked at so-called “pay for performance” in executive compensation. The context is a response to the observation that the average CEO earns 480 times the salary of the average worker. One argument for this disparity is that the business must compensate “rare managerial talent. Levy (2016, p. 1).  Building on an earlier work, Bertrand, M. & Mullainathan, S. Are CEOs rewarded for luck? The ones without principals are. Quarterly Journal of Economics 116, 901-932 (2001), Levy’s paper seeks to quantify the proportion of “pay-for-performance” that is really attributable to luck and calculates a figure of 90%.

Levy’s earlier paper, Levy and Levy (2003) employed simulations in attempt to explain why the distribution of wealth fits a power function distribution. Their results rely upon two details. First, that the distribution of wealth fits a power function distribution, and second, that their simulation shows that an approximate power function distribution of wealth only emerges from an investment market if there is no distribution of talent, i.e., luck dominates investment outcomes. One additional interesting finding is that it does not take very long for the distribution of wealth to fit a power function distribution (years, as opposed to generations).

While Levy shows that pronounced impacts from luck are readily observable, Henderson’s interview with Frank focuses on how people perceive the impact of luck within their own lives. Frank notes that experimental evidence has shown that people do not tend to appreciate the role of luck attributable to success. In his book, Frank goes through the likelihood of good luck occurring, how good luck impacts individuals, and provides some simulation results that explain why very successful people are always very lucky people (or, rather, that their immense success is largely attributable to their good luck). Frank also notes in the interview that it is possible to get people to recognize the effect of luck on their successes.

A few years ago I was teaching two classes of introduction to economics (micro and macro combined) and at the end of the term I decided to talk about the distribution of wealth and wealth inequality. My students were nearly unanimous in their belief that people are largely rewarded for their marginal product and that talent and effort largely drives income and wealth. As an exercise I had both classes play 5-card draw poker. I brought in fake money and decks of playing cards. I was initially surprised by the number of students who had no idea how to play poker. For the exercise, the class was split into groups of 5 students, each student was given an equal sum of fake money and each group was given a deck of playing cards. Groups played three hands of 5-card draw poker and then the students were re-grouped. Students played in three groups. Students were awarded extra credit points depending on the final amount of fake money they possessed, so they did have some incentive.

At then end of class, students were instructed to count and report the amount of fake money they ended up with. The resulting income distribution for one class is shown here:



In the other class, the rules were changed so that from each group, in subsequent rounds, the most successful student was given  2 additional cards, the second most was given 1 additional card, the least successful student had 2 cards taken away and the next least successful student had 1 card taken away. This was repeated at the end of round 2 as well. The resulting wealth distribution from this class is shown here:


In those class exercises we could expect that some students would be more talented poker players and that some students would be luckier than others, which did result in a less equal wealth distribution. What we (not so clearly from the graphs) observe is that if past luck can be used to influence future luck, the distribution of wealth become even less equal.

If luck significantly influences economic outcomes, what are the implications for economic analysis? Denrell, J., Fang, C., & Liu, C. (2014). Perspective—Chance explanations in the management sciences. Organization Science, 26(3), 923-940 argue that random chance can be the basis for explaining many “empirical regularities” (Denrell, Fang and Liu (2014) abstract).  So, what is the point of this post? Only that luck, in the form of random events, may very well motivate much of what we observe from markets, but that getting economics to admit this is not likely to happen very quickly.


Quantitative Decision-Making

I recently read a piece by Robert X. Cringely, Previous Welcome to the Post-Decision Age, where he reviewed Michael Lewis’ latest book The Undoing Project. Lewis’ book is about the research efforts of Daniel Kahneman and Amos Tversky, the two psychologists who developed the field of behavioral economics. I am a big fan of both Michael Lewis and Kahneman and Tversky. Kahneman won the 2002 Nobel Prize in economics (Tversky, supposedly the smarter of the two, died a few years before, in 1999, and one cannot receive the prize posthumously). In any event, behavioral economics is broadly a field of economics that seeks to devise a coherent theory to explain why standard economic theories appear to be violated in practice. In essence, Kahneman and Tversky  over the years observed many experimental instances where individual behavior violated the economic theory of rationality.

Lewis’ latest book (that I confess I have been unable to read yet), however, is concerned with the role of quantitative analysis in decision-making. Cringely does an excellent job of describing this phenomenon using analogy. According to Cringely, an old time archery instructor, there are two types of archers that he labels “sight-shooters” and “instinct-shooters.” Sight-shooters are like decision-makers who rely upon data. They slowly and carefully line up each and every shot to only shoot at the optimal time. “Instinct-shooters,” however, are like individuals who decide from the gut. They shoot when it feels right.

Not surprisingly, I have observed this dichotomy in academia, but worse. Most people in leadership positions at colleges and universities are Cringely’s instinct-shooters. They are deans, provosts and chancellors who shoot from the hip. They believe in their guts that they know the truth, so they look for numbers that support that. Instead of using data to guide them, they use data to support instinct and opinion. How have we ended up in a situation where higher education is dominated by instinct-shooters? Well, part of the reason is that most disciplines don’t necessarily involve statistical or mathematical analysis. Another part is that most of these leaders are 60 years or older where analysis becomes physically more challenging and experience and instinct begin to dominate. I’ve been a chess player for 45 years and in chess it is pretty commonly known that older players begin to study less and begin to rely on their experience more. Compounding this is the extensive advances in statistical methods that have occurred over the last 20 or so years. These leaders of academia were well-established by then; they were tenured and had stopped advancing with the state of art.

Such a situation is not normally fatal – look at politics and governing. We do not typically elect young people, but the people we do elect tend to appoint young, energetic, intelligent, and capable subordinates to help frame policy. So, what’s the problem?

Remember, that, by and large, the leaders in higher education were all college professors. Yes, college professors with PhD’s who have spent decades at least pretending to be oracles of wisdom and knowledge. In my experience college professors are some of the most arrogant, discriminatory people around. They are so confident of their abilities (they do have PhD’s and ARE college professors) that they believe no matter how little time they think about something, the decisions they make will be right. If they discriminate against you because you are old, or female, or black, that’s okay because it’s what they decided and see everything. Free speech is only available when you agree with them. They are SO smart, they can do anything and everything.

So, at this point we have an instinct-driven leadership who almost uniformly suffer from some sort of “god complex” and who lack the ability to collect and use meaningful data. No, the crisis in higher education will likely be getting worse and will likely last another 10 or 15 years until the next wave of leadership retirements. But hey, cool dining halls, dorms and athletic facilities!

Learning Outcomes

Snowflakes attend college. Why? One might assume to learn. Hmmm …

I have been teaching college classes since 2003. Since my goal is not be a complete ass, I try and improve my teaching. Aside from my perceptions, how do I know if I am doing a good job teaching college?

Well, it turns out that’s the $64,000 question that nobody seriously asks. What? Yes, no college administrator in America asks how well are their faculty doing at teaching students. Why? I have some ideas I will share, but you really need to ask them – and there are a lot of them to ask. New Analysis Shows Problematic Boom In Higher Ed Administrators.

Now, I am not suggesting that college administrators do not care about learning outcomes, but what I am asserting is that they do not necessarily care to try and get at the answer. Why? Too bad you asked, because I have no real idea. I work as a lecturer and I know the administration spends some effort spying on faculty by slinking around and querying students about their opinions of various faculty and courses, but I have observed no systematic efforts to measure learning outcomes. They do administer the now nearly universal and obligatory course evaluations. Course evaluations are a series of questions given the students in a class to find out what the students thought of the course and the instructor. They vary somewhat from school-to-school, but not by much. They usually ask “on a scale of 1 to 5, 5 being the highest, 1 being the lowest” rating questions. The responses to these surveys are then used to evaluate the instructor’s effectiveness.

Now, one might be tempted ask a few questions about this process. Where to start? First, who designed these surveys? Were they put together scientifically by experts in survey design methods, question design, behavioral psychology? Given that I have taught at six different colleges and universities and the questions are nearly always the same, I’m guessing the answer to that question is “no.” In fact, the situation reminds me of the old joke “How many college professors does it take to change a light bulb? Change? What’s that?” Second, aside from the questions themselves, do twenty year old college students really have the experience to evaluate college courses? I would say “no,” these kids don’t know any more about teaching because they take classes than I do about car repair because I drive a car. Also, anecdotally, I have received low scores from students with comments like “how can you take him seriously, the way he dresses,”  and once a student gave me the lowest possible score on each question because he missed an exam due to illness and my policy was (clearly stated in the syllabus and pointed out in class) that I would drop the lowest exam score, so if you missed an exam, you got zero, which would be dropped. He was convinced that he would have done well on that exam (even though he never scored above a 65% on any other exam and the class average on that exam was 10% lower than on any of the other exams. So, it is clear that often students don’t like something about the class, but they cannot articulate just what that is and the format of these surveys is not designed to elicit that information. What value is my opinion of you and your efforts, if you don’t know why I have that opinion?

In addition, it is clearly possible to manipulate the results of those surveys (more on that in a minute). Once I had a student who spent a considerable amount of time writing very detailed ways that I could try to improve the course (I used to spend some time trying to get students to provide meaningful feedback, rather than “this guy sucks” or “great instructor!”). When I read his suggestions I decided to implement many of them in the next course. How did that turn out? It didn’t work at all. None of that student’s suggestions worked at all. Why? In retrospect, he was a lot smarter than average, he was motivated and he put substantial effort into the course. Looking back, I realized that all of his suggestions would probably have worked quite well for a class of students like him, but those suggestions were completely ineffective in a class with a distribution of students. I used to put a question on my exams that asked students to provide suggestions on how to improve the course. I made sure to offer substantial extra points for particularly good suggestions. I did get a couple of “speak more slowly” types of suggestions, but the suggestion I got most often was “give more homework.” I honestly have trouble getting them to do the homework I actually do assign, so I’m guessing they were trying to tell me what they thought I wanted to hear. Why not? College students, even the very brightest, have no experience, so they really are clueless when it comes to providing a meaningful evaluation.

Now, I do feel that course evaluations can potentially offer some helpful information. Like the canary in the mine, consistently low scores/ratings can be an indication that there is some problem with a particular instructor. Such low scores (I’m not even sure how low is too low) demonstrate that an instructor is particularly unpopular without providing much of indication as to why. Being unpopular does not necessarily correlate with poor learning outcomes, but it might, so it should be subject to further evaluation. Scores that a consistently very high might also suggest there is a problem. I worked with another instructor, I’ll call him Kip, who was widely popular. Kip’s course evaluation rating scores are almost always near 5/5, students fought to enroll in Kip’s courses, his comments on instructor rating websites were uniformly glowing and I once ran into one of Kip’s former students at a local restaurant and when he found out I taught economics, he couldn’t say enough good things about Kip. “Best teacher I ever had!” “What course?” I asked. Well, he couldn’t remember that, but Kip was awesome.

I have taken a lot of college courses over the years and I have a had some instructors I liked better than others, but I never thought any one of them was a significantly better instructor than the rest. I had a math professor as an undergrad that I really enjoyed. I took nearly every course he taught whether I was interested in the particular subject or not, but I didn’t really think he was the “best teacher ever.” In fact, looking back, they all seem to have been about the same. So, why is Kip so widely popular? Is he that much better at teaching? Being curious, I decided to start asking students to describe how Kip conducted his classes. From my inquiries I have discovered why Kip was so popular. First, Kip taught more difficult classes – more “mathy” problem-based, courses that give most student trouble. Second, Kip’s classes are all “chalk and talk.”

I wrote a paper (we’ll be presenting it at the Eastern Economic Association conference in NYC in February) which examined the marginal rates of substitution between in-class and out-of-class learning activities. The analysis found that students are around three times more productive doing out-of-class learning activities. There are a number of reasons for this, but the end result was my determination to try and make in-class time more productive for student learning. To do this, I started organizing in-class extra credit activities that students would do in small groups during class. One day of the week I would lecture, the other we would do one of these activities. After a few activities I surveyed the class and asked them which they preferred, lecture or extra-credit activities. Now, I have about 500 students across three sections, so these activities were a lot of extra work for me, but, hey, if they helped performance … . Student preferred, by two to one, lectures. Why not? As I lecture I look out and see students napping, chatting, watching videos, texting their friends, looking at social media. Lecture is easy, it is the REM cycle equivalent of teaching and learning, but students think they are accomplishing something positive. So, Kip had the right idea, chalk and talk.

Third, Kip’s classes are “easy.” At the end of the day, the bright student has to study about 20% of the course content to get an A in the course. Kip tells students exactly what will be covered on the exam (often he gives them the exact problems on the exam and works those problems in class, but other questions require that the students understand what’s going on in the problems so Kip does get a grade distribution). Finally, Kip’s students genuinely believe they are learning. Are they really? Does it matter? I used to teach math econ and students struggled greatly. Part of the problem was that when I did the problems on the board, I made it look easy (of course it was easy for me), but when they went to do the exam problems, they struggled, therefore they weren’t learning and I was obviously a poor instructor (okay, at least not a great instructor). Kip, on the other hand, would tell them exactly which problems were going to be on the exam and then work those problems out before the exam. Students would then either learn those problems or at least be able to memorize the steps, parrot them back on the exam and receive their grade based on the other questions Kip asked. Brilliant!

Two more Kip anecdotes and I’ll let you decide what is your opinion of Kip’s methods. I asked one of Kip’s former students about her class with Kip. “Oh, he’s great!” Why? Did you work really hard? Doesn’t Kip tell you exactly what is going to be on the exam? “Yeah and there was one guy who never came to class – I don’t think he even owned to text – but he did the homework problems, always showed up for the exam review classes and got an A.” Once, Kip and I had a student in common. She struggled greatly with Kip’s class because (a) she was really a slacker, and (b) she was convinced she couldn’t do math. She came to me before Kip’s final exam and showed me one of the problems that was going to be on it. Of course she hadn’t bothered to go to the exam review (she was really a slacker that semester), so she wanted me to show her the steps to solving that problem. I know she didn’t learn a thing, but she ended up with a B+ in the class.

So, you decide. Increased learning outcomes or popular?

I will leave with a couple of observations. I surveyed my 500 students and asked them about their primary motivation for the course. It was multiple choice, so I gave them responses like “learn the material,” “parental expectations,” etc. It turns out that 85% chose “get a high grade.” It suggests, to me at least, that we have convinced college students that high grades are perfectly correlated with learning. If I get an A, I learned the material (I have to share a further observation; curves or scaling grades does not count. Students want a high grade, but giving a difficult exam that results in an average score of 40%, which is subsequently curved or scaled does not count. The high grade has to come without any scaling or the students are quite unhappy because they then do not believe they have actually learned anything). The last observation is this. There are very few university administrators that actually understand statistics. They do not understand sample size, they do not understand means relative to variances or distribution theory. When I worked in industry it became apparent over time that people in leadership could not distinguish a realistic number from any other number, so if you could show them a number that met their expectations, that number was viewed as correct. The same thing is true for higher education. I end by recounting an anecdote I heard, but did not witness. The bureaucrats were having a “come to Jesus” meeting where they were discussing course evaluations. The person in charge of deciding what those scores mean made the following comment, “We want all instructors to score above the average.” Now, ignoring the absurdity of that remark, it is clear that college administrators want instructors to teach to be popular, but haven’t the slightest ability to explain how their measure is relevant.

Well played, Kip! Well played!


Why do people blog?

I could, of course, do some research and suggest some responses to that question, but I won’t do that. Why?  Because I only care why I blog. As you will note, I have not published an entry here for quite some time. The reason is not that I do not feel I have much to say, but rather I believed (and still do) that I may as well share my thoughts in the peer reviewed academic arena.

So, why blog now? Have I given up on publishing the results of my creative capacity? No.

I am refreshing my blog out of bizarre necessity. I am currently employed as a Lecturer of Economics. Now, that title sounds more august than it is. In fact, although I am nominally a member of the Department of Economics, within that department I am labeled an “adjunct” and largely discouraged from attending department functions. Why might that be?

Well, aside from the fact that professors of economics tend to, by and large, be arrogant prats, I am an insidious infection. What?

Yes, I am a scab.

I am scab, but not in the traditional sense. Nobody where I work is on strike, but here I am, brought in to replace a potential tenure track position. How did that happen? Don’t ask me, I have no idea. Well, maybe it is a by-product of a weak or ineffective faculty union, but, in any event, here I am.

At his point you might be thinking “So, scab, what does that have to do with blogging?” Ah, I’m getting there. I found out a couple of days ago – actually that’s not entirely accurate – colleges must be re-accredited and as a part of that, the faculty must do “things.” “Things” you might ask, what are “things”and why must you do them? Well, the “why” I am not sure, but the what, from my perspective, is simple.

Okay, not so simple. I have PhD in Economics (okay, that and $3 will get you on the subway, but still), so I have received a certain amount of training to do research. “Sweet,” you might say, you need to do “things” and you are trained to do research …

Not so simple.

What? Well, recently the lecturers  unionized and negotiated a collective bargaining agreement with the college. Now, I am very pro-union, but it turns out that part of the collective bargaining agreement specifies that the University cannot recognize research activities of lecturers. Ouch! WTF! So, here I am, PhD in hand and everything I’ve been trained to do is worthless as far as my employer is concerned. Well WTF!

Initially I assumed that because most lecturers only have a masters degree (who doesn’t) and wouldn’t know research if it bit them in the butt, I was being made to suffer for being different. However, a meeting I sat through earlier in the week made the picture more clear. It turns out that the union for tenured faculty specifically bargains to insure that the college not recognize research activities of lecturers! OMG! Now it is clear. Despite the fact that the faculty allowed the college to begin hiring a significant number of lecturers, they are stepping forward to hobble them by way of union contract negotiations.

So, despite the fact that I have about 12 papers nearly completed, I get zero credit for them in my current position. Thus, the blog!

Yeah, I’ll get these papers published. Yeah, I’ll move to a position where my research efforts are appreciated, but, in the mean time, I’ll spend a little time earning points by writing this blog.

Inflection points

I’ve been aware of various economic forecasts that pop-up from time-to-time in the popular press. There is no consensus, of course, but one would at least expect some coherency. This expectation, sadly, is unlikely to be met. Why?

Inflection points. For my purpose here, an inflection point is a point in a data series where the data stops going in one direction (increasing or decreasing) and starts going in the other direction (decreasing or increasing).  In the case of current economic forecasts, it is the point when the “economic data” stops decreasing and starts increasing.

Most economic forecasters are terrible when it comes to forecasting these inflection points. Why is that? Most economic forecasts are made using trends in the data and it is pretty easy to get a good forecast if the data is moving in only one direction. Also, most economic data series are highly correlated (they tend to move in the same direction at about the same rate), so it’s tough to find evidence that a data series is about to change direction even by looking at evidence from other data series.

To illustrate this point I went to the census.gov website (governments are a great source of data) and downloaded the most recent history of Retail Sales data. This series includes total retail sales from January 1992 through October 2010 (October is listed as provisional). This data series illustrates another couple of points. One is that it’s almost January 2011, but the latest provisional month of data is for October. Yes, the data lags. The second point is this notion of provisional. Yes, the data history changes as we move forward. It can change a lot. Nearly all of these data series are “estimated” in the sense that the totals from a given series are derived from survey samples.

I took the monthly retail sales data history and used it to forecast retail sales for the next 36 months (November 2010 through October 2013). There are many ways to forecast this series, but all involve the interaction between the forecaster, the data and some form of statistical software. What I did with this forecast was to intervene as little as possible in this process. The process took several steps and the end result is this graph.

In the graph, the data history are to the left of the vertical line, the forecast data are to the right. As I noted, I had very little to do with this forecast. I chose the basic method, but just kept the results from that model. If you’re interested, the model was (1) remove the seasonality (the various spikes – they line up by month) and trend (basically up and then down), (2) fit an ARIMA model  to what’s left (the residuals). There are various ways to remove the seasonality and trend, but what I did was use regression on a time trend, a squared time trend and a cubed time trend, and I included a “dummy variable” for each month. The best fitting ARIMA model turned out to be a 2, 1, 2.

This forecast illustrates the problem in that the prediction is for some modest improvement in retail sales through much of 2011 and then things slowly get worse.  Is this a reasonable forecast? One way to gauge is to see how well the model fits the data.

The model looks pretty good (red is the model, blue is the actual data). So, as a forecaster I could say that things aren’t looking so bright for the economy. This is only retail sales, but retail sales tell a fair amount about economic activity. Because consumer spending accounts for roughly two-thirds of economic activity and forms the basis for much of state and local tax revenue (with their usual balanced budget requirements), retail sales are an important indicator of the health of the economy. But can we trust this model?

To help answer this question, we could look at monthly year-over-year growth rates. Looking at these, we see that one possible inflection point was around February and March of 2008 (that was when retail sales levels started shrinking), and another was around November and December of 2009 (retail sales levels started growing again). Looking at 2010 retail sales data, the monthly year-over-year growth rates look pretty robust, but the problem is that 2008 and 2009 were so bad that although the growth rates look good, the changes in dollar values are relatively small, which could be one reason why our the forecast model is not picking up a true inflection point.

At this point there are three basic choices. One, simply put faith in the forecast model and use the forecast generated by that model.  Two, throw out this model and try a different model. Three, use judgment to simply adjust the forecast. What would I do? I would most likely go back and adjust the model. But the overall problem remains – how to accurately forecast an inflection point (and the related problem of determining the new trend after that point).