Evaluating & Building Trust in AI: insights from red-teaming talks

Last week, I attended an event on campus titled “Supporting NIST’s Development of Guidelines on Red-teaming for Generative AI.” NIST stands for the National Institute of Standards and Technology, a part of the U.S. Department of Commerce. Initially, my thoughts about red-teaming were around cybersecurity and the potential vulnerabilities — viewing artificial intelligence as a critical digital infrastructure susceptible to manipulation or compromise.

However, the discussions embraced a broader perspective centered around evaluation. A major part of that is how the government perceives the evolution of AI, emphasizing the development of science, practice, and policy surrounding AI safety, with a significant focus on building trust.

The event outlined a strategic federal framework for AI evaluation, encompassing:

  1. Building and expanding the science of AI evaluation.

  2. Developing pertinent guidelines and standards.

  3. Creating robust test environments.

  4. Conducting thorough evaluations.

Three foundational pillars:

  1. Research: improving the science behind AI safety.

  2. Implementation: translating insights into practical tests, guidelines, and risk management frameworks.

  3. Engagement and Operations: facilitating and managing collaborative research with external partners.

As I thought broadly about the idea of establishing trust in a new technology, it made me think about the introduction of electricity, automobiles, and online banking. We’re in another one of those societal transition points right now.

Two presentations really resonated with me:

  • Jason Hong on "User-driven Auditing and WeAudit"
    The goal is to help everyday people collectivity audit AI systems.
    Make it a hub for teaching the general public about AI bias and offer ways to combat it.
    Offer a set of tools and discussion boards for sense-making.
    Initial focus on is text and images via Generative AI
    Taiga: provides side by side image comparisons of Gen AI images to look for bias in representation.
    Humans are biased, but generative AI is worse!
    Ouroboros: using AI to help audit AI.

  • Peter Zhang on "LLMs for Supply Chain"
    Objective: making students responsible users of AI.
    Observations: students are using Gen AI tools heavily, from writing to coding, whether we allow or not, and whether their future employers allow or not. These tools magnify both a person’s strengths and/or weaknesses.
    Approach: recognize the classroom as a safe environment for students to explore this technology in an applied manner. Required them to use these tools in a disciplined way and to document GPT’s success and mistakes. In this: the students become more critical and responsible users after having to rely on both the good and bad aspects of GPT to get good grades.

Previous
Previous

Building Bridges with Information: RIMS at CMU

Next
Next

Untangling the Tech: organizing our digital infrastructure with Obsidian