Discussion about this post

User's avatar
Pedro Avila's avatar

Is the mitigation for instrumental reasoning level 1 effectively just "scratchpads" and if so, do we have a means of knowing that what's being output to the scratchpad is a direct line to the AI's "subconscious", or if it can be telling us what we want to hear there as well?

Expand full comment
Andy X Andersen's avatar

The safety plans of AI companies are very preliminary, because we are very early in the process.

All we have so far are LLM, that predict most likely action based on training data. They can be quirky, but not too competent, and lack a solid model of what they are dealing with that goes beyond text, and maybe images.

Next, there will be agents. Those will likely have more interaction with the real world, can use tools, can iterate, do some reasoning, maybe even gain feedback. But even these are likely not going to be too bright.

As such, while we should be mindful that the industry is moving fast, there's likely still way to go.

The focus for now is best on thorough reliability testing as with regular software.

Expand full comment
10 more comments...

No posts

OSZAR »