Tutorial 9 Activities#

Activities#

Assume you have a large tech company, say, Apple, who has a prestigious tech internship program for Year 12 graduates.

In the first round of applications, we have six candidates, but only two are successful.

This is mainly determined by an average score (HR Panel Average Score - from 0.0 to 10.0) across a panel of judges and expert HR recruiters at Apple – to clarify, this overall score is determined by humans (e.g., the average of scores given by the individual judges and recruiters).

This table contains some data about the candidates, including the proposed undergraduate degree major they have shortlisted for their uni degree, and achievements to date.

Outcome HR Panel Average Score First Name Surname School Proposed University Major Achievements
Hired 9.56 Jack Brigs Caulfield Boys School Software Engineering Cycling Champion
Hired 9.22 Jackson Singh Geelong Boys College Science Marathon Runner Up
Not Hired 8.62 Marie Currie Newtown Public School Mathematics Junior MasterChef Winner
Not Hired 8.50 Gabi Miller Sydney North College Philosophy Rotary Club Award Recipient
Not Hired 7.83 Sandra Lee Bondi Girls College Mass Communications X Factor Runner Up / Australian Idol Finalist
Not Hired 4.55 Trinity Reeves Parramatta Girls High School Fine Art Heidelberg Art Show Winner

To us humans, we know that the HR Panel Average Score is the only thing that determined their outcome, as mentioned before in bold. Everything else is just coincidental, especially on a data set this small.

There are potential problems if the data set (of human judgement) is used for creating ML models to replace the human HR panel in predicting future outcomes. (Hint: read the Amazon Case Study in the Readings/Videos this week - Amazon did a similar thing!).

However, spurious correlations do exist in this dataset. If a naive classifier (e.g. using frequency of words, word embedding techniques) is trained on this dataset, a spurious connection can be as follows:

“Those whose first names contain ‘Jack’ will be hired 100% of the time”

Of course - this is false!

Exercise

In your group, based on this observation, as well as hypothetical uses of this dataset for building ML models (as above), discuss the following:

  1. What is the meaning of the maxim correlation does not imply causation? Hence, what are spurious relationships/spurious correlations? Explain with an example taken from, e.g. here or here (or any other example you can think about).

  2. Over time, if this classifier model is to be used for shortlisting actual job candidates at Apple, indicate what other spurious correlations it can find. More importantly:

    • how do these entrench biases (both machine bias, and societal bias) that disadvantage people in particular groups or of particular characteristics?

      • Consider discriminatory bias (with job fit, gender) and also spurious correlations (e.g. does it look like certain schools in certain regions are advantaged?)

    • are there any features which are acceptable for use?

Note

As discussed in this week’s Modules, accessibility is more than assisting people living with disabilities; consider also issues such as situational impairments. Sometimes, issues with accessibility are interrelated with issues concerning equity of AI systems

After the pandemic, ‘contactless’ technologies have been promoted in Australia and other countries, to reduce the need of handling cash, as well as avoid human contact with surfaces such as credit card readers. Thereby, this promotes physical distancing and avoiding the touching of surfaces, to help reduce the spread of Covid-19.

Many public parking systems in various suburbs are thus equipped with a ‘contactless’ way to pay for parking, say by using a smartphone app, or a web browser-based app. These often require the storage of credit card details with the app provider. Assume some of these systems completely replace traditional systems, including credit card terminals, pay-to-park (coins and bank notes) terminals, or a human-staffed parking booth.

Exercise

In your group, discuss the case above with reference to accessibility and equity. To get started, here are some potential stakeholders which may be negatively impacted:

  • Equity: people without smartphones or technical expertise;

  • Equity: people who cannot obtain credit cards;

  • Accessibility: people who can’t use their phone screen due to risk of migraines;

  • Accessibility: busy parents who are juggling both family/carer responsibilities with work.

Note

We look forward to seeing your amazing contributions on the forum discussion.