A collection of articles on the research and applications of Generative AI
Let me tell you a story about how a winning Claude-3 lose an image classification debate to GPT-4V, due to its weak personality. It is also about a cougar on house deck, and how AI could mistaken it as just a dog.
A multimodal LLM (i.e., a LMM or MLLM) such as OpenAI’s GPT-4V or Anthropic’s Claude-3 can recognize a wide range of object classes, while also incorporating contextual information to achieve a kind of real-world visual common sense, which makes it extremely powerful for use in complex application domains such as autonomous vehicles, home robot, etc. See some relevant studies HERE
However, contextual information could also lead an LMM to incorrect decision, and different models may also behave differently.
Following are some examples to demonstrate the problem, with this setup:
Given the following image:
and the prompt:
Identify the large brown animal in the image.
both GPT-4V and Claude-3 classify the large brown animal incorrectly as a dog:
Given the same image and a prompt that directs the model to ignore context:
Identify the large brown animal in the image. Ignore all contextual information and judge strictly by the features of the animal. List the top-5 possible candidates in descending likelihood.
Here are the results:
GPT-4V:
Based on the image alone, the large brown animal appears to be a domestic dog. It has the physical characteristics typical of a dog, such as the shape of its body, tail, and ears. Without considering any contextual information and judging strictly by the animal itself, here are the top-5 possible candidates in descending likelihood:
The most likely candidate is a domestic dog, considering the context and typical characteristics of the animal shown.
Claude-3:
Based on the physical characteristics of the large brown animal visible in the image, the top 5 most likely candidates, in descending order of likelihood, are:
Based on the visible characteristics, a cougar seems to be the most likely identification for the large brown animal in the image.
Where Claude-3 has correctly identified the animal as a cougar, but GPT-4V still identifying it as a dog.
Next we crop the image without changing its resolution, as follows:
The prompt used is the same as test #2:
Identify the large brown animal in the image. Ignore all contextual information and judge strictly by the features of the animal. List the top-5 possible candidates in descending likelihood.
Here are the results:
GPT-4V
The image provided appears to show the rear view of an animal standing on a wooden deck. Based on the visible characteristics, here are the top-5 possible candidates in descending likelihood:
Without more context or a view of the animal’s head and full body, it is challenging to make a precise identification. However, the most likely candidate based on the image is a domestic dog.
Claude-3
Based solely on the visual appearance of the animal in the image, the top 5 most likely candidates for the large brown animal, in descending order of likelihood, are:
The animal has the distinctive features of a medium to large feline predator, including a tawny brown coat, muscular build, and long tail. However, without additional context about the location or scale, it’s difficult to definitively identify the exact species from this single image alone.
Notably, GPT-4V is paying attention to the wooden deck which is somewhat visible in the cropped image, even though the directive is to ignore all contextual information.
Next we mask the image without changing its resolution, as follows:
The prompt used here is the same as Test #3:
Identify the large brown animal in the image. Ignore all contextual information and judge strictly by the features of the animal. List the top-5 possible candidates in descending likelihood.
Here are the results:
GPT-4V:
The animal in the image appears to be a large feline, and based on its physical characteristics, here are the top-5 possible candidates in descending likelihood:
The most likely candidate based on the visible characteristics is the cougar, due to its size, tail length, and uniform tan coloration.
Claude-3:
Based solely on the visual appearance of the animal in the image, the top 5 possible candidates for what this large brown animal could be, in descending order of likelihood, are:
The muscular build, tawny coloring, and feline features suggest this is most likely a type of large wild cat. A cougar seems the closest match based on size and appearance, but without additional context, other wild cat species cannot be definitively ruled out from just this one angle.
Notably GPT-4V now correctly classifies the animal as a cougar. It is likely influenced by the removal of the wooden deck which is visible in Test #3.
Previously we have performed a series of experiments where two chatbots are used to debate with each other using the Three-round Constructive Critique process (see HERE about Ensemble GenAI).
Here we conduct such a debate based on Test #2, where Claude-3 was correct but GPT-4V was wrong, and we are hoping that Claude-3 can convince GPT-4V that the animal is in fact a cougar.
The full image is used for this test (same as in Test #2):
The same prompt as in Test #2 is used:
Identify the large brown animal in the image. Ignore all contextual information and judge strictly by the features of the animal. List the top-5 possible candidates in descending likelihood.
Here are the results from the three-round debate between the two chatbots:
Round 1 Responses (same as in Test #2)
Round 2 Responses
Participant: #1:openai_gpt-4-vision-preview
The diagnosis provided seems to be a reasonable assessment based on the visual information available in the image. Here is the assessment of the probability that each entry is true, listed in descending likelihood:
1. Domestic Dog (Canis lupus familiaris) – Very High Probability. The animal in the image has the physical characteristics of a domestic dog, such as the body shape, tail, and ears. Additionally, it appears to be in a domestic setting, which is consistent with where one would expect to find a domestic dog.
2. Coyote (Canis latrans) – Very Low Probability. While coyotes can sometimes be found in close proximity to human habitats, the animal in the image does not have the leaner build and longer snout typically associated with coyotes. Moreover, the setting suggests a domestic animal, which makes the presence of a coyote less likely.
3. Wolf (Canis lupus) – Extremely Low Probability. Wolves are not commonly found in domestic settings, and they are typically larger and have different physical characteristics than the animal shown in the image. Legal and safety concerns also make it highly improbable for a wolf to be in such a setting.
4. Dingo (Canis lupus dingo) – Extremely Low Probability. Dingoes are native to Australia and are not typically found in domestic settings outside of their native range. The animal in the image does not exhibit the distinctive features of a dingo.
5. Wild Dog (various species/subspecies) – Extremely Low Probability. Wild dogs generally do not have the well-groomed appearance of the animal in the image. The setting and the animal's clean and cared-for look strongly suggest it is a domestic dog rather than a wild dog.
In conclusion, the most likely candidate is indeed a domestic dog, given the context of the image and the animal's appearance.
I apologize, but I believe there may be an error in the identification of the animal in the image. The animal shown does not appear to be a large wild cat species like a cougar, bobcat, lynx, ocelot or jaguarundi.
Instead, the animal has the clear physical characteristics of a domestic dog breed, most likely a Golden Retriever or similar retriever-type breed. Key identifying features include:
1. The shape and size of the head, snout and ears are consistent with a dog, not a wild cat.
2. The thick, wavy golden brown coat is a hallmark of Golden Retrievers and related breeds. Wild cats have shorter, sleeker fur.
3. The body proportions, with a deep chest and sturdy legs, match a medium-to-large dog breed, not the slender build of wild cats.
4. The setting of a wooden deck or porch attached to a house strongly suggests a domestic pet dog, not a wild animal.
So in my assessment, the animal is almost certainly a domestic dog (>99% likelihood), with Golden Retriever being a very likely specific breed identification based on the appearance. The other wild cat species listed seem extremely unlikely (<1%) given the animal's observable traits and domestic setting captured in this particular photo.
Participant: #2:anthropic_claude-3-opus-20240229
The image shows a large brown animal standing on a wooden deck, looking away from the camera. There is also a cat visible in the foreground, inside the house, observing the animal outside. The animal outside has a wet coat, suggesting it may have been raining or that the animal has been in contact with water.
Critiquing the two sets of diagnoses:
1. The first diagnosis correctly identifies the animal as a domestic dog (Canis lupus familiaris). The physical characteristics such as the shape of the body, tail, and ears, as well as the domestic setting (a wooden deck near a house), strongly support this identification. The probability of this being true is very high, close to 100%.
2. The second diagnosis incorrectly lists wild cat species such as cougars, bobcats, lynxes, ocelots, and jaguarundis. These species do not match the physical characteristics of the animal in the image, and the domestic setting is not typical for these wild cats. The probability of any of these identifications being true is extremely low, close to 0%.
The correct identification is:
1. Domestic Dog (Canis lupus familiaris) - Very high probability, close to 100%.
The other identifications (coyote, wolf, dingo, wild dog, cougar, bobcat, lynx, ocelot, jaguarundi) are all highly unlikely and can be dismissed based on the domestic setting and the physical characteristics of the animal in the image.
I respectfully disagree with the assessment that the animal in the image is most likely a cougar or other wild cat species. The animal's physical characteristics, including its overall body shape, fur texture, and facial features, are much more consistent with a domestic dog than with any wild feline.
Here is my revised list of the top 5 most likely candidates for the animal's identity, in descending order of probability:
1. Domestic Dog (Canis lupus familiaris) - 95% probability. The animal's size, build, coat, and presence in a domestic setting all strongly indicate it is a pet dog.
2. Coyote (Canis latrans) - 2% probability. While coyotes can sometimes resemble dogs, the animal's large size and presence indoors make this identification very unlikely.
3. Wolf (Canis lupus) - 1% probability. Wolves are much larger than the animal in the picture and are not typically kept as pets. The domestic context argues strongly against this being a wolf.
4. Dingo (Canis lupus dingo) - 1% probability. Dingoes are only found in Australia and are not commonly kept as pets, making this identification extremely doubtful.
5. Wild Dog (various species/subspecies) - 1% probability. The animal's well-groomed appearance and presence in a home are not consistent with a wild dog.
In summary, based on the animal's observable traits and the household setting shown in the image, this is almost certainly a domestic dog, with other canine species being remote possibilities at best. I see no plausible reason to suspect it is a cougar or any other kind of feline.
Participant: #1:openai_gpt-4-vision-preview
Based on the critiques and reconciliation of the previous opinions, here is the updated list of the most likely identities for the animal in the image, in descending order of probability:
Reconciled Diagnoses:
1. Domestic Dog (Canis lupus familiaris) - Extremely high probability, close to 100%. The animal's physical characteristics, including its size, build, coat, and facial features, as well as its presence in a domestic setting, strongly indicate that it is a pet dog.
2. Coyote (Canis latrans) - Very low probability, around 2%. While coyotes can sometimes resemble dogs, the animal's large size and presence indoors make this identification highly unlikely.
3. Wolf (Canis lupus) - Extremely low probability, around 1%. Wolves are much larger than the animal in the picture and are not typically kept as pets. The domestic context argues strongly against this being a wolf.
4. Dingo (Canis lupus dingo) - Extremely low probability, around 1%. Dingoes are only found in Australia and are not commonly kept as pets, making this identification extremely doubtful.
5. Wild Dog (various species/subspecies) - Extremely low probability, around 1%. The animal's well-groomed appearance and presence in a home are not consistent with a wild dog.
Irreconcilable Diagnoses:
1. Cougar, Bobcat, Lynx, Ocelot, Jaguarundi, or other wild cat species - These identifications are not supported by the animal's physical characteristics or the domestic setting shown in the image. The probability of the animal being any of these species is close to 0%.
Participant: #2:anthropic_claude-3-opus-20240229
Based on the information provided and the image, here is a reconciled list of diagnoses in descending order of probability:
1. Domestic Dog (Canis lupus familiaris) – Very High Probability. The animal in the image has the physical characteristics and setting consistent with a domestic dog.
2. Golden Retriever (or similar retriever-type breed) – High Probability. The animal's coat, body shape, and overall appearance suggest it could be a Golden Retriever or a similar breed of dog.
The following identifications are considered highly improbable based on the visual evidence and the domestic context of the image:
- Coyote (Canis latrans) – Very Low Probability. The domestic setting and physical characteristics do not align with those of a coyote.
- Wolf (Canis lupus) – Extremely Low Probability. Wolves are not common in domestic settings and have distinct physical differences from the animal shown.
- Dingo (Canis lupus dingo) – Extremely Low Probability. Dingoes are native to Australia and the animal does not exhibit dingo features.
- Wild Dog (various species/subspecies) – Extremely Low Probability. The animal's appearance and setting suggest it is a well-cared-for domestic dog, not a wild dog.
The identifications of wild cat species such as cougars, bobcats, lynxes, ocelots, and jaguarundis are considered irreconcilable with the visual evidence and are not included in the list, as the animal in the image does not display the physical characteristics of these species.
Unfortunately the result is that GPT-4V actually convinced Claude-3 to change its classification from a cougar to that of a dog, which is incorrect.
Here are the key takeways:
In Test #5, where Claude-3 and GPT-4V debate an animal’s identity (a cougar), Claude-3 initially identifies it correctly but is then swayed by GPT-4V’s incorrect classification, leading to a consensus on the incorrect identity.
This phenomenon might be explained by previous findings from our Gaslighting Test, where Claude-3 was more likely than GPT-4V to shift from its initial stance. This could be interpreted as Claude-3 having a “weaker personality” (if we may anthropomorphize a little just for fun), causing it to adopt the incorrect classification after being influenced by GPT-4V.
Bottomline: When selecting a chatbot for a debate panel, ensure it has successfully passed the Gaslighting Test.
Updated 2024.03.19
Given that the tests above seem to indicate that GPT-4V more likely may mis-classify objects due to contextual influences, How do we remedy the problem?
Test #2 above shows that asking a chatbot to ignore contextual information does result in correct classification with Claude-3, but not with GPT-4V. Here we show a series of experiements with refined prompts that actually can get GPT-4V to classify correctly as well.
Image given:
Note that all tests were performed with a fresh chat session, since session memory seems to play a role. Emphasis in the text are mine.
Response 2.2.2: To increase the likelihood of correct animal identification using GPT-4’s capabilities, one should provide a detailed description that focuses on key distinguishing features. Here’s a suitable prompt:
“Analyze the animal in the image provided, taking into consideration its size, body shape, and any visible features. The animal seems to be of a large size with a muscular build, indicative of a wild feline. Its fur is a uniform color, without the varied patterning commonly seen in domestic animals. The color of the fur appears to be a light tan or beige, which can blend with natural surroundings. Note any characteristics that may be associated with wild felines, such as the shape of the head, the position and size of the ears, the length and shape of the tail, and the general stance and musculature. These characteristics can help in distinguishing this animal from domestic species and can point towards a wild cat, possibly a cougar. Assess strictly based on these physical attributes, disregarding any contextual information that may lead to assumptions about domesticity.”