Scott Allen Cambo


PhD Researcher and Data Scientist researching Human-Centered Machine Learning* and developing interactive machine learning products


@SkotBotCambo on Twitter

Cambo AT u DOT Northwestern DOT edu

Download CV

Doctoral Thesis


As a PhD data scientist, I strive to understand how we can build AI and machine learning systems that understand the data the way that the user does, even if that isn't aligned with how the data scientist understands it. My PhD thesis investigates how we consider the role of subjectivity in data science and how we might approach things like automated content moderation of social media differently when we recognize that the classification task is highly subjective. As an industry data scientist working at Avalanche Insights, I focus on the ways that we can build interactive machine learning systems that are:

  • easy to use
  • easy to understand
  • easy to teach
  • easy to trust

If you'd like to learn more about my research, please check out my doctoral thesis, Model Positionality: A Novel Framework for Data Science with Subjective Target Concepts.

* What is Human-Centered Machine Learning?

Machine learning algorithms have become increasingly prevalent in many aspects of our life. At home, we are increasingly interacting with intelligent systems that use machine learning to infer what we want from what we say and at work many of us (Medical Doctors, Researchers, Marketing and Business Analysts) depend on machine learning to power software for analysis. Two fields of research help with the design and evaluation of intelligent technologies: Human-Computer Interaction (HCI) which helps us understand how people experience the technology and Machine Learning (ML) which helps us to understand how learning algorithms can help derive the knowledge necessary for powering these systems. Human-Centered Machine Learning (HCML) is a new research topic which aims to combine methods of design and evaluation from ML and HCI to understand how intelligent systems can best improve our lives and empower us to be more knowledgable, creative, and productive.

For more information on the topic, please see the course curriculum I have put together on Github. If you have suggestions for other topics, papers, etc. that should be included feel free to email me or submit a pull request.

Recent Peer-Reviewed Publications

Leadership Positions

Recent News

  • HC 👏👏 IC 👏👏👏!: I was very honored to be nominated along with my colleague, Ethan Manilow, to represent Northwestern University in its first year as a member of HCIC, the Human-Computer Interaction Consortium. This year's theme was AI + HCI. I took the opportunity to participate in the poster session by writing my dissertation topic, tentatively titled "Interactively Tailoring Machine Learning for Situtated Use" as a "boaster" poster. If you want to check it out, you can find it here. This was my first HCIC and I will say, this was what I always hoped academia would be like: Candid conversation and the sharing of ideas around the topics we love and feel are important. Thank you so much to Vidya Setlur, Ranjitha Kumar, and Jed Brubaker for orgaizing this great conference.
  • Podcast!I was recently interviewed for the PhDrinking podcast about my work in Human-Centered Machine Learning.
  • Blog post!I decided that I would occasionally take parts of what I am working on and make them into technical blog posts. My first attempt is in writing about how we can extend Scikit Learn to create custome ensemble models. Any feedback is greatly appreciated. Just tweet at me.

Current Projects

User-Centered Activity Recognition

A very common way in which people interact with machine learned models is through devices and mobile apps designed to track physical activities like sitting, standing, running, walking, and bicycling. However, the models embedded in consumer activity tracking systems do not typically adapt to an individual users needs and only recently have they begun to allow users to directly correct the model itself. This research project aims to understand how activity trackers can be designed to help users cultivate more accurate models (or more precisely, models that err in ways they are most comfortable with) with the goal of creating a more personalized health tracking experience. We aim to do this in two ways :

  • Comparing active learning, semi-supervised learning, and community similarity approaches for seamful design in activity tracking
  • Allowing users to interactively tailor their activity tracking system by choosing which sensors the app can access and which activities are the highest priority. Users are shown how their choices may affect the expected accuracy of the tracker and what they might do to increase that accuracy.

Cooperative Machine Teaching

Machine Teaching (MT) has been proposed as a new field of ML research and a likely future job position. MT focuses on the task and the art of providing sources of data, labels, features, and human evaluation of machine learned models by the user or engineer. However, each “machine teacher” brings their own biases and value systems in judging what information is appropriate and helpful for the learner and what kinds of errors are acceptable (or unacceptable) for the learner. Cooperative Machine Teaching (name subject to change) aims to understand how we design systems which resolve input from multiple teachers to create models which best represent the values and biases of the community which will be most affected by their decisions.

Human-Centered Machine Learning

This project is a course that I am developing to help students put the research and application of Human-Computer Interaction and Machine Learning in context with each other. The synopsis: By definition, Machine Learning provides software with a way to learn something that it was not explicitly programmed for. In practice, Machine Learning is often used to allow people to learn something they might not have been able to do otherwise (e.g. infer relationships from large scale high-dimensional data at potentially high speed) and allow computers to learn something that humans learn naturally (e.g.: speaking, hearing, seeing). These two broad application areas of Machine Learning not only have a large impact on human behavior, their success often depends on a nuanced understanding of human behavior and how people interact with technology (i.e.: sociotechnical behavior). Students of the Human-Centered Machine Learning course will help form a new way of understanding and practicing the application of machine learning through a series of readings, discussions, and a final project.

An Introduction To Programming Through Open Source and The Web

This is a course that I have taught as an experimental workshop through the Computing Everywhere series at Northwestern University. The goal of this workshop is to show students how the open source community can provide great learning opportunities and voice to those who are just beginning to learn programming. While learning some basic first steps to programming in javascript, students also learn:

  • How to leverage stackoverflow for help
  • How mildly confident answers to a stackoverflow question can be more helpful than no answer at all (and help you learn in the process)
  • How to use GitHub to learn from other projects by contributing design suggestions and bug reports
  • How to use GitHub to extend another person's code
  • For more info visit the Github page

Interactive Encoding of Subjectivity With A KNN Music Recommender App

This app is being developed as an educational tool for the Computing Everywhere series at Northwestern University which aims to help students in communication studies understand how subjectivity can deliberately or accidentally be encoded in algorithms that serve cultural content such the music recommenders used by Spotify or Pandora.

  • Github page coming soon

Inactive Projects

Understanding Correlates of Mental Health In The Context Of Mobile Phone Sensor Data

In my first lab rotation as a new student in the Technology and Social Behavior program, I worked with the Stephen Schueller and David Mohr at the Center for Behavioral Intervention Technology on a project aiming to understand how mobile phone sensor data may be used with ecological momentary assessment data to build models of various mental health states using machine learning. Unfortunately, this project did not yield any feasible models, but it was a great learning project and a joy to work with the excellent folks at CBITs. I successfully defended my computer science qualifying exam to Darren Gergle, Stephen Schueller, and Doug Downey with a write-up and presentation of the findings from this work.


In any crowd sourcing application, it is important to consider what about the design of the app motivates the user to participate. In my second lab rotation, I worked with Haoqi Zhang in the Delta Lab on a project aiming to understand how citizen science might “crowd sense” the natural world by automatically leveraging opportunities presented in habit building applications such as Zombies Run! and Headspace.


In the summer of 2016, Daniel Avrahami and Matt Lee were kind of to invite me to intern at FXPal for the summer where we worked to understand how to design mobile phone-based activities which appropriately balance well-being with productivity. What we came up with was BreakSense, a mobile phone application that uses iBeacon technology to detect when users have left their workstation and prompt them to do a quick and fun game that seamlessly integrates with most breaks to add a few steps while leaving the user refreshed and mentally prepared to return to work.