That’s unreal. No, you cannot come up with your own scientific test to determine a language model’s capacity for understanding. You don’t even have access to the “thinking” side of the LLM.
You can devise a task it couldn’t have seen in the training data, I mean. Building a comprehensive argument out of them requires a lot more work and time.
You don’t even have access to the “thinking” side of the LLM.
Obviously, that goes for the natural intelligences too, so it’s not really a fair thing to require.
That’s unreal. No, you cannot come up with your own scientific test to determine a language model’s capacity for understanding. You don’t even have access to the “thinking” side of the LLM.
You can devise a task it couldn’t have seen in the training data, I mean. Building a comprehensive argument out of them requires a lot more work and time.
Obviously, that goes for the natural intelligences too, so it’s not really a fair thing to require.