ChatGPT and Identity

ChatGPT and Identity

All things are difficult before they are easy

Thomas Fuller

Creative destruction is the essential fact about capitalism

Joseph A. Schumpeter

ChatGPT is upon us. I have been approaching it with anxiety. In fact, I started writing this article with a very opinionated point of view. ( disclosure: I am a product manager, so being opinionated is normal)

But, the more I understand it, the more I struggle to hold on to my original premise.

As I watched this video unfold on my laptop screen, my emotion shifted from anxiety to excitement. This reminds me of the hot research areas in Computer Science in the ’80s and ’90s. 4GL and 5GL. Programming by intent instead of by writing algorithms using procedural or functional code. It never really manifested. The best we got was drag-n-drop no-code pipelines and template-based workflow tools.

It is now real!

ChatGPT or more precisely, Codex (not GPT3) has delivered the promises of 4GL and 5GL from decades back. It does so using the magic of LLM (large language models). I am not qualified to give you a primer on LLM. Rest assured, every third person on the internet wants to teach you about LLM :) What we care most about is the fact that LLMs are trained on enormous amounts of training data. Codex for instance

OpenAI Codex is a descendant of GPT-3; its training data contains both natural language and billions of lines of source code from publicly available sources, including code in public GitHub repositories. OpenAI Codex is most capable in Python, but it is also proficient in over a dozen languages including JavaScript, Go, Perl, PHP, Ruby, Swift and TypeScript, and even Shell.

Now we enter the realm of identity. For one, the people behind the data used for training the LLMs are not universally thrilled at this development.

On the other side, if you are using code generated by Github CoPilot, you really don’t know much about the libraries in the generated code. Remember the goal here is convenience and getting past the commodity code. Going back to the video I mentioned above,right around the 19 to 20-minute mark, you see the presenter ask ChatGPT to replace the “fetch API with an Axis library”. This presumes a deep understanding of the generated code.

The reason this technology is spreading like wildfire is the opposite. It provides people a way to write code to solve issues without needing to understand the code under the cover. This should bring us memories of the SolarWinds attack. In that case, the culprit was a compromised .dll file. How do we know a library included in the generated code from ChatGPT is safe? We don’t.

That was the premise I started with. But, as I dig more into the technology, I realize we are in the 5GL world, and there is no going back. The benefits far outweigh the security risks. We have always prized convenience over security. Security has typically been on the reactive side of this eternal tug-of-war. This time is no different. We survived Wikipedia and the era of citing unknown sources. This is similar though the stakes are higher. We now enter the era of code with unknown authors. The question is are we ready?

What do you think? Does the benefit easily justify the risks? There are identity issues on both the supply and demand side of this technology. I wonder if those voices will even get a fighting chance to be heard. We love the next big thing, and apparently, it is here. Rejoice but with caution, my friends….

Reply

or to participate.