Michael Nettles - Clarity and Cloudiness in the Uses of Big Data for Education




Latest News

Print article
Dec 13, 2015
by Michael Nettles
Register for our Newsletter and stay up to date
Register now
Michael Nettles - Clarity and Cloudiness in the Uses of Big Data for Education

Senior Vice President at ETS and session chair addresses the possible innovations and pitfalls of "better" testing and data use in education Michael Nettles addresses his sixth Salzburg Global Seminar program

Can better testing and data accelerate creativity in learning and societies? My answer to that question, by the way, is a resounding, if not surprising, “Yes it can!” I hope it is your answer, too. Or that it will become your answer over the next few days. I am delighted that ETS is co-sponsoring this Salzburg Global Seminar Untapped Talent: can better testing and data accelerate creativity in learning and societies? with the National Science Foundation and the Inter-American Development Bank. I am excited about the few days that we have ahead of us to get acquainted and think together about how we might have an impact on the co-existing and intersecting worlds of assessment and big data.

The Data Ocean

It is certainly a timely question. It has been estimated that Google processes 3.5 billion requests per day, and that Facebook’s data grows by 500 terabytes per day, including 2.7 billion “likes.”[1]  The Large Synoptic Survey Telescope being built in Chile will process and store more than 30 terabytes of data each night to help address fundamental questions about the structure and evolution of the universe.[2

In all, IBM says that every day we create 2.5 QUINTILLION bytes of data from sources as diverse as sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transactions, cell phone GPS signals, and so on.[3

That is Big Data. The possibilities of putting even fractional bits of it to use are breathtaking — at least conceptually. The fact is, we do not yet have the capabilities, or even the know-how, required to achieve our vision for all these data in the workplace, commerce, medicine, education or elsewhere.

We have barely begun training the people to do the work. In the United States, there are more than half a million unfilled jobs in the IT sector. The tech shortage is seen as the biggest problem facing the U.S. technology economy.

Beyond being a competitiveness problem, it is also a shame given that tech jobs pay well — 50 percent more than the average private-sector American job.[4] According to Gartner, by 2020 there will be 1.4 million computer specialist job openings, but that our (US) universities are unlikely to produce enough qualified graduates to fill even 30 percent of them.[5

As is so often the case, people from disadvantaged communities are the furthest away from these opportunities. More than 80 percent of the technical employees at most American tech companies are men, and fewer than 5 percent are Black or Latino.[6]  Hispanics made up just 4 percent of Yahoo’s workforce, and that is TWICE the proportion of African American employees. 

Last year, Facebook employed 81 African Americans among its 5,500 U.S. workers, or 1.4 percent of its workforce.[7

Big Data in Private Industry 

Big Data itself holds the promise of improving diversity within the IT industry. A growing number of consultancies are developing methods to corral, organize, and deploy data from various sources, including social media, web navigation paths, online communities and forums, gaming sites, purchase transactions, the troves of publicly available government data, and a new generation of employer-administered assessments of skills and capabilities.

In doing so, these firms say their algorithms can replace race-, class- and culture-based criteria with demographically blind data-based criteria that remove subjective human evaluators. An online assessment used by Catalyst IT Services, an IT outsourcing firm based in Baltimore, Maryland, generates information about candidates not just based on their answers to questions, but also on how the candidates answer the questions. It uses data analytics to predict workplace performance based on whether a candidate answering a question in an unfamiliar discipline labors over the response, answers quickly and moves on, or skips it altogether.[8]

A San Francisco company called Knack uses mobile games designed by neuroscientists, psychologists and data scientists to identify players with valuable STEM (science, technology, engineering and math) skills, particularly those from disadvantaged and marginalized groups. Another San Francisco company, Gild, develops recruitment data by mining work that people do in open-source communities.[9]  

The explicit promise is that you do not need a computer science degree from Stanford to get a good job in computer science, or a background in math or economics to work in data science; or that you need to know the CEO’s son’s girlfriend’s father to get a job interview; or that you have to be a young White male to work in IT.

Big Data in Education

Big Data and predictive analytics have been slower to come to education and assessment. But the possibilities are even more dramatic. For one thing, the immediate availability of so much data, combined with breakthroughs in data and predictive analytics, stands to render the traditional, highly structured summative assessment in which questions have right or wrong answers, a relic. 

A substantial amount of learning and educational activity already occurs in digital spaces like the cloud: in learning management systems; on web forums and discussions; on social media sites; in online portfolios; and so forth. 

Everything that is done on or with a computer — document edits, gaming collaborations, responses on intelligent tutoring systems, even eye and body movements recorded by body sensors — can be captured, sorted and analyzed for individualized patterns and progress. And it can be kept forever.

Bill Cope and Mary Kalantzis of the University of Illinois at Champaign assert that by generating such “fine-grained data” that were not previously accessible or even visible to teachers, and by making the data immediately available for review and feedback, educational mining and data analytics may be ideally suited to individualized and learner-centered teaching.[10]

In such an environment, the learning process itself can become the best source of evidence of learning, replacing the test. 

The implications for test designers are dramatic. What new approaches to test design will be required in order to generate and cull usable data? What analytical tools and techniques will teachers need in order to discern patterns in individual students’ performance and the forces behind those patterns? What new methods of data science and learning analytics will need to be developed, learned and deployed? Who’s going to do all this cross-domain work? And how will behaviorists, biologists and budget analysts communicate with computer scientists, educators, graphic designers, linguists and psychometricians to make it all happen?

The NIH BD2K Initiative

Those are among the questions that our colleagues at the National Institutes of Health, is working through in its Big Data to Knowledge, or BD2K, initiative.

The NIH launched the BD2K project in 2012 to advance our understanding of human health and disease by “harvesting” the abundance of biomedical research and information from, quote, “the diverse, complex, disorganized, massive, and multimodal data being generated by researchers, hospitals, and mobile devices around the world,” close quote.

BD2K’s major aims are to:

  • facilitate broad use of biomedical digital assets by making them discoverable, accessible, and citable;
  • conduct research and develop the methods, software, and tools needed to analyze biomedical Big Data; 
  • enhance training in the development and use of methods and tools necessary for biomedical Big Data science;
  • and support a data ecosystem that accelerates discovery as part of a digital enterprise.

With only slight modification, the same can be said of almost any industry or discipline that envisions using Big Data. They can also learn from BD2K’s work in assembling and training the cross-disciplinary groups it has assigned to develop the quantitative and computational approaches, technologies, methods and tools needed to put biomedical Big Data to use.


“Big Data,” “Digital Exhaust,” the “Digital Ocean” — whatever the term, the superabundance of information we are throwing off creates truly astonishing possibilities. Every keystroke on a computer can be captured, cataloged, analyzed, and used.

Whether it should be captured, cataloged, analyzed and used is a different, but equally urgent, question. After all, exhaust is sometimes just exhausting.

Cope and Kalantzis note that computer-mediated learning has been in use since 1959, when the PLATO learning system was developed at the University of Illinois.  Half a century later, they assert, the, quote, “overall beneficial effects of computer-mediated learning remain essentially unproven,” close quote.

If we are going to link education to the Big Data wagon, then we will need the evidentiary tools to measure the return on our investment — tools that may not yet exist.

We will also have to proceed with our eyes wide open to the risks to individual privacy and to institutional and even national security. As computer hackers love to demonstrate, one of the things about having access to lots of data on the cloud is that everyone ELSE has access to lots of data on the cloud. And they are not always as well-intentioned as we are.

Entrusting so much of our education, careers and personal lives to algorithms and analytics also runs the risk of replacing our humanness with a blind faith in data processes.

Consider those game-based employment assessments used by Knack and others. They may be powerful tools for eliciting skills and talents, but research has shown that males are more adept at online gaming than females, at least in part because they play games more than females, and maybe because game designers tend to be men.[11

If that is the case, will using game-based assessments only further entrench gender gaps in the workplace?

The era of Big Data is, by definition, built on technology. But in education, public resources, including IT resources, continue to be distributed unequally among socioeconomic communities. And the digital divide grows deeper and wider for the less equal.

It may be hopeful to expect that the tide of computer-mediated learning and assessments will lift all boats in the Digital Ocean. But will it?

Maybe. Or maybe not. We do not know. But we should pay attention to the risks and consider progressive ways to narrow the gaps because the potential setbacks are just as dramatic and breathtaking as the potential rewards.

Taking on this challenge is much the reason that I am looking forward to our work together over the next few days. I think that given the magnificent blend of geographic, demographic and experiential diversity and talent in this session, we will both bring clarity, and discover more cloudiness in the atmospheres of assessment and big data.

The Salzburg Global program Untapped Talent: Can Better Testing and Data Accelerate Creativity in Learning and Societies? is being hosted in collaboration with the Educational Testing Service (ETS)the National Science Foundation, and the Inter-American Development Bank, and in association with the Royal Society of Arts (RSA). More information on the session can be found here: www.salzburgglobal.org/go/558.