Artificial intelligence — the new frontier.
Table of Contents
It seems like only yesterday that A.I. had surfaced as the new promise for technological advancement. Google themselves have driven down their energy costs using AI by a huge amount, just by letting machine learning do its thing. Everyone has heard about AlphaGo, AlphaZero, and even AlphaStar taking the mental-sporting field by storm, with ChatGPT causing its own quake in the AI scene. But few truly realize what goes into these deep learning achievements, and feel it’s a big mystery that only Big Tech can solve. So why does it feel like AI is so out of reach for the average person or business? Well, I think it’s because there just isn’t enough accessible information out there. Just like everything else worthwhile, it takes a bit of time, and effort, to learn a new skill. I’m here to help you understand the basics, and how to leverage this technology today in improving your Google rankings with Dataiku.
The A.I. Basics
Although they’re spoken of interchangeably, machine learning, AI, and deep learning, all have their own meaning behind them. Let me explain. Artificial Intelligence, otherwise known as A.I., refers to the broad idea that machines can be trained to perform a task requiring a certain degree of creativity. In other words, it’s the science of creating a machine capable of mimicking human tasks or abilities. Deep learning, and machine learning, are both subsets of A.I., they help recognize patterns, build correlations, and typically attempt to mirror basic ways humans learn. One great example of this is a neural network. Neural networks are a kind of machine learning made up of units connected similarly to the neurons of a human brain. In this way, the computer can learn more abstract ideas such as video games with uncertain information, such as Starcraft, or Dota. Natural Language Processing is what we use to analyze and understand the complexity of languages. If you’ve ever spoken with Siri or Alexa, you have spoken with one of these NLP AIs. Understanding these terms is important in taking your first steps in using artificial intelligence.
Using AI To Rank On Google
As you know, Google’s SERPs are all determined by the RankBrain algorithm, which is an artificial intelligence. By using machine learning, we can reverse-engineer parts of Google’s ranking factors by collecting the key SERPs data and running it through predictive modelling.
STEP 1 | Collect The Data
There are hundreds of ranking factors, and there’s no way we can collect every single one, even if we tried. However, there is a lot of on-page and off-page data we can collect in bulk using a few helpful tools. In this guide, we’ll be using:
Scrape SERP URLs Using Ahref Keyword Research Tool
The first thing you’ll want to do is to gather a list of URLs that are ranking for your targeted keyword. In this case, I simply used the Ahref keyword research tool to return a list of the top 100 SERPs for a given keyword. You can export just one keyword SERP, but the more data you have, the more accurate the predictive modelling will be. So I suggest you export at least 5 related keyword SERPS (~500 results) to work with.
Run List of URLs Through Screaming Frog
Screaming frog is an SEO favourite when it comes to scraping tools, as it gives you loads of useful data about any list of URLs or domains. First, you’ll need to use the “list” function on Screaming Frog to manually upload the list of URLs. Once uploaded, press start, and allow the crawl to run its course. When the crawl is complete click on export, and save the CSV to your computer.
Run the URL List Through BatchSpeed
With Google’s recent announcement on Core Web Vitals, it is a good idea to also test our dataset URLs for those same vitals. Batchspeed.com will allow you to crawl your URL list in bulk, and receive critical webpage performance metrics. The “Core Web Vitals”, as Google calls them, are:
- Largest Contentful Paint, or LCP, is the time it takes for the largest content element on a webpage to become visible.
- First Input Delay, or FID, is the time between the user’s first interaction with the website and its response to that interaction.
- Cumulative Layout Shift, or CLS, measures how often unexpected shifts occur on a webpage.
Using BatchSpeed, you’ll be able to export mobile and desktop performance data that can help build correlations between Core Web Vitals and a URL’s position on Google.
Combine the Data
Once all of the data is collected, it’s time to combine the data into a single sheet, so our machine learning platform can access all of the pertinent information from one place. An easy way to port data across spreadsheet tabs is the VLOOKUP formula, which can help you match the data with the proper URL.
STEP 2 | Download Dataiku
Dataiku is a free machine learning platform that is beginner-friendly and offers done-for-you algorithms. There is no better platform to start with if you are looking to learn the ropes of artificial intelligence and it’s applications. You’ll need to, first, visit the Dataiku website, where you can then click on the “get started” button. When you land on the pricing page, look for the “free” pricing package and click “install now”. Now you only need to follow the instructions for your operating system. It’s important to note that if you are using Windows OS, the installation process will be slightly more complicated. Once you have Dataiku running, you’re ready to start your machine-learning project!
STEP 3 | Create a New Project
To create a new project, simply click the “New Project” button on the top right of the dashboard. When prompted, give your project a name. I chose “Top Ranking Factors”.
STEP 4 | Import Your Dataset
Once you reach your new project dashboard, click on the “+ IMPORT YOUR FIRST DATASET” button. The button will take you to an upload page, where you’ll want to click “upload your files” under the “Files” category. You’ll be taken to an upload terminal where you can now import your dataset of ranking factors. Now that your file is uploaded, you can click “create” on the top right of the screen.
STEP 5 | Create a Visual Analysis
Continue by clicking on the circular icon on the top left of the toolbar for visual analyses. Now click on “+ New Analysis”, select your dataset with the dropdown, and click “Create Analysis”. You will now be presented with the data you’ve imported. If needed, you can make changes like deleting irrelevant data columns, and more.
STEP 6 | Create a Machine Learning Model
This is where things get fun. Head over to the “Models” tab on the top right of your page, and then click “create first model”. Choose a “Prediction” task. Select your target variable, in this case, that’s “Rank”. Also, click “Automated Machine Learning”. Then, select “High-Performance Models” with the “In-memory” engine. We can run this model as-is, but we still want to make a couple of changes before we pull the trigger. Head to the “Design” tab. Under “Modeling”, click “Algorithms”. Then make sure you only have “Random Forest” and “XG Boost” algorithms active. You should also change the “Number of trees” to 1000. That’s it, we’re ready to go. Click “Train” at the top right, name your training session, and confirm. Allow the magic of machine learning to take its course. This may take some time.
STEP 7 | Analyze Your Results
When the algorithms have finished processing, they’ll return a number of the most important variables according to their calculations. You can click on the title of the algorithm report to view more detailed information. In the case of the Random Forest session, it has been evaluated that “Dofollow Domains” is the number one variable related to top-ten rankings, and XGBoost agrees with that assessment. Feel free to look around the additional charts and information within the models by clicking on them after they’ve finished calculating their reports.
STEP 8 | Put Your Data Into Action
Now that you have a clear model of what ranking factors have the most impact on your given keywords, you can now put them into action for your SEO campaigns. By prioritizing your budget, time, and resources toward the metrics that matter most, you can drive organic growth at a faster pace, and at a lower cost. If your results are showing a high “variable importance” for do-follow backlinks, with low variable importance for on-page factors, consider allocating more of your campaign budget towards link-building and vice versa.
Conclusion
I hope you found this guide helpful in getting started with machine learning, and how it can benefit your business or agency. Again, to make the most out of A.I. it’s important to continue learning, and experimenting with more models, algorithms, and datasets to find new ways it can bring value to you or your clients. If you think of some ideas on what you can do with this technology, give it a try. You never know, you might just discover something extraordinary.