Generative AI: Uses, Outlook & Regulation
After having the opportunity to speak with hundreds of marketers from diverse backgrounds – junior and senior, technical and non-technical, agency and in-house – three things are abundantly clear to me:
- There is a chasm between discourse about Generative AI on media networks like X and LinkedIn and the on-the-ground, in-the-account realities for 90%+ of marketers. Put another way: adoption isn’t nearly as high as certain “influencers” would have you believe.
- Most people are genuinely curious and excited about practical applications, but their day-to-day workflows, responsibilities and headcounts have yet to be materially altered by any of these technologies.
- There’s a LOT of misinformation out there that’s driving well-meaning marketers to make potentially-dangerous choices regarding their Generative AI usage and deployment.
I can’t say I’m surprised. About a year ago, I wrote an article entitled How Will ChatGPT Change The World – which made some bold predictions and ruffled quite a few feathers. This week’s edition is a Part 2, with a particular focus on applications + use cases where we’ve had success.
Let’s Talk LLMs – The Good, The Bad & The Ugly
Large Language Models (LLMs) are one of the foundational building blocks of Generative AI. The combination of LLMs with Machine Learning (ML) and Natural Language Processing (NLP) has created the ability to automate both the input and the output. Previously, with devices like Alexa or Google Home, you could ask it anything (in the input), but it could only respond effectively to a fixed list of things, each of which had to be built by someone.This is an example of a deterministic model: based on the input, there is a correct answer. It’s IF/THEN logic on a more complex scale. In essence, voice assistant devices could accept unlimited inputs, but ran headlong into a scaling problem on the output.
LLMs solve the output problem in much the same way that ML solved the input problem: by translating conceptual and logic problems into statistics problems. That means you can ask the model anything, and it will generate content (images, code, text) that matches the pattern based on its corpus of data. As that corpus of data gets larger and more comprehensive, the model gets progressively more sophisticated at identifying the patterns within the text and matching its responses to those patterns.
Today, we’re training models like Gemini 1.5 on 1M tokens – orders of magnitude larger and more complex than the ChatGPT that was launched just 15 months ago, and the outputs are (generally speaking) significantly better than they were in December 2022. In a very real sense, we’ve created a machine that can make other machines, which – conceptually – can scale infinitely. You can ask ChatGPT or GEMINI or Jasper AI anything, and (generally speaking) get an answer that is, in some sense, responsive to that query.
But, despite the (very real) advances in response quality, the underlying problem still remains: this is a probabilistic, generalized black box that is likely to be an enabling layer for a host of new technologies, companies and the like. There’s a lot to unpack there, but here are the major points:
Probabilistic:
LLMs generate responses using probabilistic models, not deterministic models. These output a “correct” or “right” answer only in a probabilistic sense, not a binary sense. An LLM will give you a response that matches the pattern of a correct response. Whether or not that response is actually correct, or helpful, is a different matter entirely.
For some queries, this isn’t an issue: if I’m writing an article on apple pie, and wondering about different occasions where one might be served, the accuracy isn’t (generally speaking) a problem. I can easily work from a draft from ChatGPT and arrive at a useful, comprehensive bit of text. If, on the other hand, I’m looking to create an actual recipe for apple pie – a probabilistic answer is unlikely to produce a tasty apple pie (and yes, I’ve tried it).
Generalized:
What constitutes an acceptable output from an LLM varies substantially based on the context, use case and the domain expertise of the searcher. If I’m an attorney looking for precedent for a particular matter, I don’t want things that look like precedents – I want the actual precedent(s) applicable to my particular case. That’s not something an LLM is particularly well-suited to provide (which this lawyer found out). The same is true in countless other areas – if you are seeking a precise, binary answer (true/false), a probabilistic answer is more likely than not to fall short of your expectation.
Then, there’s the opinion questions – ones where there isn’t an obvious “correct” answer, or ones where the obvious “correct” answer isn’t the right answer to provide. A classic example of this is suicide or self-harm searches: there have been plenty of studies done, and there are (for example) more and less painful ways to pursue this. If someone asks ChatGPT for information on this topic – should it provide it? The answer is probably no; the optimal response to this type of query is to provide information on help hotlines or other resources. This is an example of a case where a “correct” answer isn’t the “right” answer.
And finally, there are questions where correct today could be wrong tomorrow. Take, for instance, a python script to segment an audience. There are well-defined ways to correctly segment in Python, with specific functions and sequences. But, those well-defined ways aren’t set in stone. Python could update its functionality at any point, and a previously-correct response is now incorrect. The same is true of current events, records, etc. – something hasn’t happened until it happened. It was true that no one had ever done X, right until someone did it.
Teaching a model the difference between each of these different types of responses is a fantastically difficult problem – to the point where AI researchers and practitioners vehemently disagree if it is possible.
Black Box:
LLMs are a black box – we genuinely don’t understand exactly how the model works, or the specific nature of the rules that led to a given output at a given time. That’s both a feature (this is part of what enables the machine to build the machine) and a bug (we have no idea how inputs and outputs are connected).
In more than a few ways, working with any LLM is a game of battleship: we input a prompt, and we hope that the content generated is responsive to the query in the way we wanted. If it isn’t, we rinse & repeat.
This is a radical departure from most other systems, where the process is visible: if we’re coding a website, I can easily understand the relationship between a change to a style sheet and a resulting alteration to the site’s appearance. The same is true in excel: I can click on a cell, and view exactly which functions were run that resulted in the final valuation displayed within it.
Without some level of insight into input-output relationship, it can be maddeningly difficult to get the system to provide the desired response. Current LLMs have started to address this with different feedback loops, from binary (thumbs up / thumbs down) to response + revision prompting (i.e., modify the previous response in the following way). None of these are perfect, and it’s unclear if we’ll ever get to a point where an identical prompt produces an identical output with any level of consistency.
All of that to say, there are areas where LLMs excel, and areas where they struggle. I’ve summarized those in the table below:
Areas Where LLMs Thrive | Areas Where LLMs Struggle |
Language Translation | Current Events |
Content Summarization | Common Sense |
Coding/Programming | Math + Counting |
Pattern Recognition | Black Swan Scenarios |
Persuasive Arguments | Humor + Sarcasm |
Grammar & Spelling | Strategy Under Uncertainty |
Statistical Work + Data Science | 100% Factual Content |
Classification | Understanding Content |
Simplifying Complex Content | Nuance & EQ |
Stylized Writing | Consistency/Reliability (Drift) |
Personalization | Reasoning & Logic |
Prompt Engineering | Pattern Breaking |
Speech Recognition | Diverse Perspectives |
Writing Support | Sensitive Topics |
Formulaic Tasks | Input/Output Clarity |
How Can We Deploy Generative AI Today?
We’re all still working out exactly what Generative AI / LLMs are – and the specifics on how they can be leveraged in our day-to-day activities. From a more fundamental standpoint, I view all of these tools as productivity enhancers – they’re not much different from having a near-unlimited number of interns.
Over the past week, I’ve probably heard 100 different use cases for Generative AI – some legitimately interesting, some mundane and some downright silly.
In my presentation at PubCon, I shared seven different use cases we’ve experimented with, along with the actual prompts used and outputs. These are examples of areas where I think LLMs can provide significant value today:
- Research – LLMs are – at their core – pattern recognition + inference engines. A non-insignificant part of audience + customer research is (drumroll please) pattern recognition and inference. We’ve had success using LLMs to create customer personas, conduct target audience research, and map those findings to anything from zip codes to publications to Meta and Google Audiences.
- Content Planning – While I have been a vocal opponent of generating content using LLMs (and yes, I feel validated by the March 2024 Core Update), I do think there’s a role for LLMs to play in the content creation process: research & planning.
We’ve given an LLM audience research from (1), or a list of competitors or top-ranking sites, and tasked it to (i) identify commonalities between high-ranking articles, and (ii) use that insight to craft outlines for our own pillar content. The results have been excellent in most contexts – the outlines are robust and accurate, and reviewing them with a subject matter / brand expert for our client allows us to quickly spot areas of opportunity or gaps that we can include in our article.
It certainly isn’t perfect, but even getting 80% of the way there can save 3+ hours of research time, which is significant. - Content Distribution – I truly love Ross Simmond’s concept of “create once, distribute forever” – and LLMs remove one of the primary barriers to actioning it: translating your article into the requisite forms.
This is one of my favorite use cases, as it’s what we’ve been using to translate previous Digital Download issues into content for my @DigitalSamIAm accounts. You’ll likely need to make some minor revisions, but that’s exponentially easier than starting from a blank page. - Creative – I’ve found LLMs to be remarkably good at helping me to translate my initial concepts into rough sketches, which can be shared with our creative team for them to see exactly what I intended. As someone who can not draw a stick figure or design a checklist in Canva, the ability to create a legitimately good visual is invaluable.
It’s also incredibly helpful at generating social images without having to ask someone, like this banger:
It isn’t great, but it got the job done – and took less than 5 minutes to make. At this point, it’s often faster for me to generate a generic image vs. searching for the right stock photo. - Automating Boring Tasks / PPC Management – Perhaps the most-dreaded, soul-crushingly tedious tasks in PPC Management are Search Terms & Placements management. If you’re unfamiliar, both of these involve reviewing hundreds, if not thousands, of search terms to find the ones that aren’t appropriate for a given client or campaign.
Fortunately, this is another task where Gemini/ChatGPT can be quite useful: export that Search Terms Report (STR), export the positive keywords, and upload. It took us awhile to get this to work properly, but once we figured it out, we ended up with two incredible outputs:
Output #1: A list of every term (out of ~25,000) that we should exclude from our campaigns moving forward
Output #2: An n-gram analysis of those omitted terms, along with the conceptual “buckets” into which each fell (i.e. wrong audience, wrong location, etc.).
Using these two outputs, we were able to create multiple negative KW lists. Huge win. From a practical standpoint, this could be automated further using a Google Sheets plugin (something we’re working on now) and a script: the script exports the search terms report to a Google Sheet, the ChatGPT identifies the terms to exclude and copies them to another sheet, deduplicates it, and a script then uploads the revised list of terms to Google Ads.
The same principle works for Placement reports. - Data Analysis – This is (for my money) one of the best exciting use cases for ChatGPT/Gemini. We’ve used it for segmentation (the use case I shared was an RFM segmentation, but it was also able to perform K Means, BIRCH, DBSCAN, and Gaussian Mixture Models (among others).
We tested this using an actual customer export from Shopify – loaded it up to GPT-4, asked it to conduct an RFM analysis of the data, visualize the outputs, then export the customers belonging to each segment. The same approach was successful at segmenting customers by LTV, purchase path and multiple other variables. Each of these tasks would take a data scientist hours to perform; we were able to do them in a fraction of that time.
Given that one of the biggest roadblocks to more advertisers fully leveraging their existing customer data is their inability to perform these operations, this has the potential to be a massive unlock.
I still believe (and have seen ample evidence in client accounts) that well-constructed Lookalike Audiences can improve account performance. The challenge (historically) has been that lookalikes are incredibly sensitive to initial conditions:
Only uploading your “All Customers” file is likely to result in LALs that miss significant (60%+) of other relevant audiences. The good news is that you can actually test this for yourself in less than the time it takes to order a coffee at Starbucks. - Multichannel Campaigns / Integrated Applications: Each of the above six applications can be quite useful – but nowhere near as powerful as combining them together. One example:
Reviews To Ad Creatives: we exported all Google Review Data (takeout.google.com) for a restaurant, uploaded that to Gemini, and tasked it to perform an analysis of those reviews: what was the underlying distribution, what dishes were most and least likely to be mentioned in a positive or negative review, etc. We then cross-referenced those findings with a list of the current dishes being promoted to identify ones that customers loved (review data) but were NOT being promoted. Finally, we then gave it a link to the restaurant site, which contained photos of each dish – and asked it to create Meta ads featuring each dish identified above.
This entire process took about an hour the first time we did it (substantially less the next time) – and actually produced ads that were ~90% of the way there. Historically, review mining is a process that takes 3-5 hours (or more, depending on the number of reviews) to do well.
There are plenty of other examples of this – from using CRM Data + a standard prompt/brand voice to generate customized thank-you notes to prospects, to automating your analysis of web analytics and deploying it for CRO, to using it to create and send personalized texts from a single seed (which is exactly what Recart’s AI-Powered SMS does).
The Next Enabling Layer?
Like the mainframe, the graphical user interface (GUI), the internet and the smartphone, LLMs are likely an enabling layer – something that serves as a foundational component of many different applications. Candidly, that’s what makes them exciting – it’s a blank canvas, a world of possibility just waiting for us to use it do something.
There are many ways the next stage of LLM evolution could go: we could get a few exponentially larger, more complex and more sophisticated general-purpose models, we could get many specialized models, or there could be some weird hybrid of both.
We’re seeing the first versions of “thin” GPT wrappers being shipped now: the prompt is an API call. Maybe there are buttons and customizations – but those are API parameters, which is just prompt engineering. Much of this is reminiscent of when the GUI was invented, but instead of a command line, we have prompts. Of course, from these humble beginnings evolved the world wide web as we know it, applications, cloud computing smartphones, apps and everything else that essentially enables modern commerce.
There will be transformative enterprises built on top of LLMs – the questions are which ones, and how thin will the wrappers be, and what can be done to mitigate many of the very real concerns with how these models operate.
Automation & Replacing Jobs
There’s a widely-held sentiment among many AI influencers (and more than a few PubCon attendees) that, in relatively short order, Generative AI will replace a significant number of roles. To be blunt, I think that’s nonsense, for (at least) two reasons:
Reason #1: The Human In The Process
I’ve spent way too much time using these tools over the past year – and one thing is abundantly clear: any successful application (including the ones above) has a human involved in the process. The level of human involvement can (and likely will) decline as integrations become automated and as response quality improves, but the need for a human will remain.
When I start to think of actual, oft-discussed applications of this technology – from proposal generation to contract review, content generation, user segmentation to medical diagnosis and customer support – it’s clear that there is substantial risk of letting it operate unsupervised. That risk comes with a cost – economic, reputational, social – that most organizations (especially large organizations) are simply going to be unwilling to pay for the foreseeable future.
Reason #2: Jevon’s Paradox
Simply stated, Jevon’s paradox states that, as efficiency increases, so too does demand. It was coined in the 1600’s, when English economist William Jevons stated that, as steam power became more efficient, England would require more (not less) coal – and more things would be developed that used it.
If 400 year old examples aren’t you think, consider that when the spreadsheet was developed, people hired more accountants, not fewer. The same thing will happen here: when organizations realize they can do more, they’re going to do more.
So, What’s Next?
No matter your view on LLMs, I think it’s safe to say that they are here to stay. The recent negative coverage and setbacks are not the beginning of the end, but perhaps, the end of the beginning.
If you’re curious about the deck I presented at PubCon, you can download it here.
Cheers,
Sam