• Ever wanted an RSS feed of all your favorite gaming news sites? Go check out our new Gaming Headlines feed! Read more about it here.
  • We have made minor adjustments to how the search bar works on ResetEra. You can read about the changes here.

wingkongex

Member
Aug 25, 2019
2,200
openai.com

GPT-4

We’ve created GPT-4, the latest milestone in OpenAI’s effort in scaling up deep learning. GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that, while less capable than humans in many real-world scenarios, exhibits human-level performance on various...

We've created GPT-4, the latest milestone in OpenAI's effort in scaling up deep learning. GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that, while less capable than humans in many real-world scenarios, exhibits human-level performance on various professional and academic benchmarks.

Over the past two years, we rebuilt our entire deep learning stack and, together with Azure, co-designed a supercomputer from the ground up for our workload. A year ago, we trained GPT-3.5 as a first "test run" of the system. We found and fixed some bugs and improved our theoretical foundations. As a result, our GPT-4 training run was (for us at least!) unprecedentedly stable, becoming our first large model whose training performance we were able to accurately predict ahead of time. As we continue to focus on reliable scaling, we aim to hone our methodology to help us predict and prepare for future capabilities increasingly far in advance—something we view as critical for safety.

We are releasing GPT-4's text input capability via ChatGPT and the API (with a waitlist). To prepare the image input capability for wider availability, we're collaborating closely with a single partner to start. We're also open-sourcing OpenAI Evals, our framework for automated evaluation of AI model performance, to allow anyone to report shortcomings in our models to help guide further improvements.

Screenshot-2023-03-14-at-1-59-20-PM.png


Screenshot-2023-03-14-at-2-00-30-PM.png


Welp.
 

mugurumakensei

Elizabeth, I’m coming to join you!
Member
Oct 25, 2017
11,382
openai.com

GPT-4

We’ve created GPT-4, the latest milestone in OpenAI’s effort in scaling up deep learning. GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that, while less capable than humans in many real-world scenarios, exhibits human-level performance on various...





Screenshot-2023-03-14-at-1-59-20-PM.png


Screenshot-2023-03-14-at-2-00-30-PM.png


Welp.
Looking at where it performs best, it works best in known problem spaces (simple standardized tests) and performs worst on things requiring awareness and problem solving (see writing only being 54th percentile and USNCO local section and AP calculus)
 

DaciaJC

Banned
Oct 29, 2017
6,685
Genuinely surprised it only got a 4 on the AP Calc exam, would have expected it to smash that.
 

Senator Toadstool

Attempted to circumvent ban with alt account
Banned
Oct 25, 2017
16,651
openai.com

GPT-4

We’ve created GPT-4, the latest milestone in OpenAI’s effort in scaling up deep learning. GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that, while less capable than humans in many real-world scenarios, exhibits human-level performance on various...




Screenshot-2023-03-14-at-2-00-30-PM.png


Welp.

This is absolute garbage and I hate how they constantly trot this out.
This isn't how tests are taken. This isn't what they're testing. Anybody with access to information can master these tests with these scores. They intentionally exclude you from using that for that reason.

I've scanned a cal/law/biology book and can open it to any page at anytime! I'm so smart!

So AI is better than humans at everything now? That happened fast.
No, not at all. This is an misleading ad for them to get funding.
 

Midramble

Force of Habit
The Fallen
Oct 25, 2017
10,484
San Francisco
How many parameters are we up to in GPT4? Last I heard it was going to be somewhere in orders of magnitude over 3.

Edit: If old articles are right, this is a jump from 175 billion parameters with GPT-3 to 100 Trillion with GPT-4

Though not an apples to apples comparison at all, the human brain is at roughly 15 trillion parameters.
 
Last edited:

Senator Toadstool

Attempted to circumvent ban with alt account
Banned
Oct 25, 2017
16,651
Looking at where it performs best, it works best in known problem spaces (simple standardized tests) and performs worst on things requiring awareness and problem solving (see writing only being 54th percentile and USNCO local section and AP calculus)
because its not thinking. it's looking at statistical patterns based on previous works to infer solutions so it makes sense that problems that have known solutions and those have been written about and have similar linguistical or logical patterns it just spits out old answers because these things don't change.
 

gutshot

Member
Oct 25, 2017
4,457
Toscana, Italy
GPT has been a real time-saver when having to write and debug code, although it occasionally makes errors. A faster and less error-prone version will be great.
 

sedael

Member
Oct 16, 2020
908
A faster and less error-prone version will be great.

Yeah we are gonna go through a software development renaissance for the years between these models and it replacing us entirely. Copilot and CodeWhisper and whatever other models are just so useful for automating the annoying parts
 

collige

Member
Oct 31, 2017
12,772
Despite its capabilities, GPT-4 has similar limitations as earlier GPT models. Most importantly, it still is not fully reliable (it "hallucinates" facts and makes reasoning errors). Great care should be taken when using language model outputs, particularly in high-stakes contexts, with the exact protocol (such as human review, grounding with additional context, or avoiding high-stakes uses altogether) matching the needs of a specific use-case.
GPT-4 generally lacks knowledge of events that have occurred after the vast majority of its data cuts off (September 2021), and does not learn from its experience. It can sometimes make simple reasoning errors which do not seem to comport with competence across so many domains, or be overly gullible in accepting obvious false statements from a user. And sometimes it can fail at hard problems the same way humans do, such as introducing security vulnerabilities into code it produces.

GPT-4 can also be confidently wrong in its predictions, not taking care to double-check work when it's likely to make a mistake. Interestingly, the base pre-trained model is highly calibrated (its predicted confidence in an answer generally matches the probability of being correct). However, through our current post-training process, the calibration is reduced.
I would be a lot more cool with all these caveats if they weren't also in the processing of doing a wide rollout of the shittier last gen version at the same time. OpenAI being a sorta-kinda-not-for-profit now makes all this weird tbh

Looking at where it performs best, it works best in known problem spaces (simple standardized tests) and performs worst on things requiring awareness and problem solving (see writing only being 54th percentile and USNCO local section and AP calculus)
Sounds about right. It's still GPT at the end of the day. I have questions about how these tests are administered too, but the human equivalents aren't open-book exams anyway so it's an apples to oranges comparison.
 

Cymbal Head

Member
Oct 25, 2017
2,384
Describing the joke in that image is genuinely impressive, but It didn't get any better at the writing part of the GRE?
 

eso76

Prophet of Truth
Member
Dec 8, 2017
8,167
I haven't used it much but I tested it for After Effects expressions and it's amazing.
It won't just suggest an expression, it will go on and explain what every line and variable does.
 

Stencil

Member
Oct 30, 2017
10,459
USA
Is Microsoft involved with this? They mention Azure in the abstract, but not sure what MS involvement is, if anything.
 

T the Talking Clock

The Fallen
Jul 12, 2018
140
because its not thinking. it's looking at statistical patterns based on previous works to infer solutions so it makes sense that problems that have known solutions and those have been written about and have similar linguistical or logical patterns it just spits out old answers because these things don't change.

If it spouts out correct and coherent answers, what's the difference? It's pretty much the Chinese Box thought experiment.
 

Jordan117

Member
Oct 27, 2017
2,023
Alabammy
Is Microsoft involved with this? They mention Azure in the abstract, but not sure what MS involvement is, if anything.
Huge. MS invested $10 billion in OAI (49% stake iirc), custom-built them a supercomputer for training, and have exclusive access to the codebase to build products like Bing Chat and various Windows integrations that need more than just the API.
 

Jordan117

Member
Oct 27, 2017
2,023
Alabammy
For context, GPT 3.5 scored in the bottom 10% for a simulated bar exam, while 4 gets in the top 10%.

Its facility with language (outperforming state-of-the-art English models when given tests translated into other languages) is impressive.

I'm most interested in what the larger context window unlocks. 32k tokens is about 50 pages, imagine something this powerful maintaining coherency for that long. You could auto-summarize long essays, synthesize novellas, generate more than just toy programs. And it's surprisingly affordable.
 

BigSkinny0310

Attempted to circumvent ban with alt account.
Banned
Dec 7, 2017
2,940
I wonder if I should spend $20 trying this out. I want to test its legal writing capabilities.
 
Oct 27, 2017
42,918
So AI is better than humans at everything now? That happened fast.
How is that your takeaway from
while less capable than humans in many real-world scenarios, exhibits human-level performance on various professional and academic benchmarks.


I'm surprised it didn't get perfect scores for math related tasks, although a lot of those require showing work so maybe that's where it lost points
 

Zeliard

Member
Jun 21, 2019
10,971
Yeah as others have noted, it's on Bing right now if you have access to it, for free.

blogs.bing.com

Confirmed: the new Bing runs on OpenAI’s GPT-4

Congratulations to our partners at Open AI for their release of GPT-4 today. We are happy to confirm that the new Bing is running on GPT-4, which we’ve customized for search. If you’ve used the new Bing preview at any time in the last five weeks, you’ve already experienced an early version of...

Also restrictions are now 15/150.
 

Yoga Flame

Alt-Account
Banned
Sep 8, 2022
1,674
I use ChatCPT all day, it's actually a invaluable resource for coding, and providing insight into other people's code. It's part of my workflow.

It's like having an extremely knowledgeable companion, I can't go back.

IDE on one screen, ChatGPT in another. I get it to write my regex for instance. Unbelievable stuff.