The end goal of a search engine, when it’s crawling a page, is to try to determine what a web page looks like to regular people. But the search engine can’t see a page like we do. Instead, it sees the code that web servers send back to our browsers, and to help illustrate this, let’s take a look at the Explore California home page.
To us humans, we see a rich and colorful web page with lots of content on it. Pictures, text, menus, and videos, and all kinds of colors and styles. It’s visually appealing, and we know how to absorb all of this information and navigate through it. But to a search engine crawler, the same page looks like this instead.
This may not look like the same page to the untrained eye, but it is. All of this markup and code is really just a bunch of instructions web browsers can follow in order to render a great-looking web page onto our screens. And the important part is that this is what search engines look at when trying to understand what your page is about, and how it should be ranked.
Web pages are ultimately created with HTML code: scripts and other markup, which helps browsers figure out where to find and download all the files needed to produce this pretty page, where everything is visibly placed on the page, how things are laid out, which fonts, colors, and sizes to use, what side menus will look like, where links will point to, and where content elements are going to be placed.
If you take a close look, you can see that there’s lots of stuff in the HTML that may not end up on the screen. And these items provide us with extra opportunities to help search engines understand our content better. HTML, or hypertext markup language, is also responsible for referencing and loading styles sheets, which are extra instructions that help define the visible attributes of a page: font coloring, content sizing, line spacing, background images, page conventions, all kinds of rules for the visual representation of a page can be found here. And HTML is not the only language that browsers can understand.
These days, web pages are made more interactive through the use of things like traditional JavaScript, JQuery, Angular, AJAX, HTML5 elements, and more. These advanced frameworks and languages make possible things like animation, slideshows, dynamic menus, and lots more. You can also find code that produces different types of non-text content.
For example, this is the block of code that’s responsible for rendering the video we see on the home page. While we as humans can watch that video and hear its message, this block of code is all that a search engine can see. While in this context we are not going to cover concepts of web designing or programming, it’s important to understand the perspective of a search engine as we examine what it sees.
As you can probably guess, making sure that your website’s code is clean, efficient, and free of any coding errors, will help ensure that your pages are displaying properly to your users, but also save the search engine some confusion. The cleaner your code, the easier it will be for you to make adjustments to improve your on-page optimization, and the more search engines will trust that your pages will be a good experience for your users.
It’s important to understand how search engines discover new content on the web. As well as how they interpret the locations of these pages. One way that search engines identify new content is by following links. Much like you and I will click through links to go from one page to the next. Search engines do the exact same thing to find an indexed content. Only they click on every link they can find. If you want to make sure that search engines pick up on your new content, one of the easiest and most important things that you can do is make sure that you have links pointing to it.
One great way to do this is to create an HTML sitemap. Link to from the footer of every page of your website that mirrors the exact structure of your site with links to all of your important content. Another way for search engines to discover new content, is from an XML sitemap.
XML stands for extensible markup language. And it’s a different type of meta language that like HTML is used to share data on the web. Unlike the HTML sitemap, which is a list of links on a webpage. An XML sitemap is a listing of your site’s content in a special format that search engines can easily read through. You or your webmaster can learn more about the specific syntax and how to create XML sitemaps by visiting sitemaps.org.
Once you’ve generated your HTML and XML sitemaps, you can submit them directly to the search engines and this gives you one more way to let them know when you add or change things on your site. Another important thing to recognize is that while search engines will always try to crawl your links for as much additional content as they can find, you may not always want this. There can be times when you might have pages on your site that you don’t want search engines to find. Think of test pages or members only areas of your site that you don’t want showing up on search engine results pages.
To control how search engines crawl through your website, you can set rules in what’s called a robots.txt file. This is a file that you or your webmaster can create in the main root folder of your site. And when search engines see it, they’ll read it and follow those rules that you’ve set. Robots.txt blocks can help control bandwidth stream and make your site more crawlable. Helping to more readily surface important pages. But there’s a downside, as well. A robots.txt block will not stop a page from being indexed or ranked.
To stop pages from showing up in search engine results entirely, a noindex meta tag is preferred to a robots.txt block. Which method you use really comes down to why you need it. To control how easily a site is crawled, use the robots.txt file. To ensure that pages are never returned in search results, use the noindex meta tag. And if you use the noindex meta tag method, be sure you don’t also block the page in the robots.txt or the tag will never be found.
With a robots.txt file, you can set rules that are specific to different browsers and search engine crawlers. And you can specify which areas of your website they can and can’t see. This can get a bit technical. And you can learn more about creating robots.txt rules by visiting robotstxt.org.
Again, once search engines discover your content, they’ll index it by URL. Which stands for uniform resource locator. As the name implies, URLs are the locations of webpages on the internet. It’s important that each page on your site has a single, unique URL, so that search engines can differentiate that page from all the others. And the structure of this URL can also help them understand the structure of your entire website.
There are lots of ways that search engines can find your pages. And while you can’t control how the crawlers actually do their job. By creating links for them to follow, unique and structured URLs, sitemaps for them to read, meta tags to inform them and robots.txt files to guide them. You’ll be doing everything you can to get your pages in the index as fast as possible.
As search engines try to find and index all the pages they find on the internet, they rely on unique URLs as pointers to each piece of content. Well there should be a single unique URL for every page on the internet. Often our webpages can introduce slightly varied URLs for the same piece of content, resulting in duplicate URLs in the search engine’s index.
A common reason for this is the use of URL parameters. These are extra bits of data that are appended to the end of URLs, and they can be used to do a lot of different things.
Sometimes they can actually control what content shows up on the page. And in those cases, the different URLs actually are different pages. Other times though, they have nothing to do with the content. They could be used to store session IDs or tracking parameters, and while the URL may be different, the content is unaffected. The problem is search engines can only make guesses about which URL parameters are important for content and which aren’t. And sometimes those guesses are wrong. If search engines see two URLs with the same content on them, and index and then rank them both, then you’ll end up competing with yourself. And this can lead to reduced visibility in search as the search engines start to prefer pages from competing sites that are less confusing.
One way to resolve this issue on your site is to use the rel canonical meta tag. This tag is something that you add to your page that acts as an instruction for search engines, telling them that no matter what URL might be showing up in the address bar for this content, make sure to index this URL as the primary URL for this content.
Another way to clear up any confusion about how your site uses URL parameters for content is to tell the search engines directly through Google Search Console and Bing Webmaster Tools. Here’s an example in Search Console where we’ve gone to the crawl URL parameters area to define certain URL parameters that should be ignored.
Another reason that duplicate content may exist is because content may have been moved from one location to another on your website. The old location and the new location could potentially be in the search engine’s index at the same time. And to avoid this situation, whenever you move content it’s important to implement redirect rules. There are a few redirect types that you or your webmaster can use. But let’s take a look at two in particular.
The first is known as a 302, or temporary redirect. This should only be used for short-term content moves, like when you want to show an alternate page while your site is down for maintenance. It tells a search engine that the page it’s looking for isn’t there now, but it will be back very shortly. So please don’t do anything to your index.
For long-term or permanent content moves, which search engines are really concerned with, you’ll want to use a 301, or a permanent redirect. These tell search engines that although they may have indexed a previous URL for that content, that old URL is no good anymore. The search engine should take everything it knew about that old URL, and apply it to the new one where that content now lives.
One thing to be mindful of is how you use these two redirect types. If you leave temporary 302 redirects up for too long, search engines may eventually start to treat them as permanent redirects. They don’t disclose exactly how this is done, and it will vary from site to site and situation to situation.
You should also note that some technologies that are commonly used to redirect URLs, like JavaScript or meta refresh tags, are not properly picked up or processed by some search engines, which can result in the wrong page indexing indefinitely, and ranking slipping over time. Ensuring that the search engines know which URLs your pages live on, and that you have unique URLs for each of your pages, will help them index your pages properly. And this is a building block on the path to the top of the search results.
Search engines do a good job of identifying what overall content of a web page is about. But you may have parts of a web page that contain very specific types of content, like product reviews, embedded video, or even a food recipe. Search engines can stand to benefit from a little help in understanding the semantic focus of these bits of content and fortunately, we can give them some assistance.
One universal code format that will help us to do this is the schema.org defined microdata. Microdata gives us a special syntax to use to help search engines identify very specific types of content on your pages. It’s important to note that along with schema microdata there are other markup types in use, including microformats like Open Graph and RDFa. But increasingly, schema microdata is becoming the dominant markup of choice. Schema not only helps search engines identify specific pieces of content, it also allows them to identify very specific attributes of that content.
Here’s an example of some recipe text. We can look at this quickly and identify it as a food recipe. But for a search engine, the short sentences and many line breaks are a bit awkward and they can’t possibly understand what each line means. By augmenting the code behind this recipe text using the schema.org microdata for recipes, you have the opportunity to explicitly tell search engines exactly what this content is. You can see there are properties for ingredients, prep, and cook times and just about anything else you could think of for a recipe. From a search engine’s perspective, this is great. It not only confirms that this is definitely a food recipe, but it also includes all of this metadata around the recipe that will help to return this content to users that are looking for it. If someone is searching for a particular chef’s recipes or has an abundance of bananas and needs something to do with them, the search engines will have much deeper semantic understanding of what this content truly is and can return it in the search results for an array of relevant search queries.
Head over to schema.org and browse the various types of content that have supported microdata. Recipes are just one of many. And you can learn about the various implementation methods, like JSON-LD or JavaScript Object Notation for Linked Data, embedded or inline microdata, or RDFa or Resource Description Framework in Attributes microformatting.
As of 2018, both Google and Bing recommend implementation via JSON-LD but they do support other noted methods. Whichever you choose, among lots of categories, you could use schema to describe a book with things like the title, author, publishing date, and number of pages. Or you could use it to identify an upcoming event by name, location, dates, or even pricing. If you have a brick and mortar business or you’re doing e-commerce sales, make sure you’re using schema for your local business content or your product content.
As a general rule, anytime you can specifically identify content for search engines, you probably should. Google and Bing are always expanding their support for various schema types, so it never hurts to be ahead of the curve in your implementation. So explore the different types of schemas to see what may be relevant for the different types of content on your site and get started sharing all that great information with the search engines and your visitors alike.
Well content and links can affect your website search engine visibility, your web server can also play a big role in how search engines view you website. The key here is to make sure that you’re serving up pages fast, and you’re serving them up reliably.
Remember, a search engine is trying to give its users the best experience possible and sending them to a page on a server that’s down half the time, one that takes an eternity to load is not going to be a quality experience. First and foremost, a web server is just a computer and the performance of any computer relies in part on the hardware and resources it has available. Things like the number and type of processors, the amount of memory, the quality of the network and the connection to the internet can all be important. You want to talk to the people responsible For hosting and managing your web server to make sure the resources are appropriate to serve pages quickly and minimize any downtime.
The physical location of your web server can also affect your search engine visibility. As visitors interact with your website, search engines will often collect data around how fast all the elements of your pages are loading for them.
If a visitor is in one country and your web server is located on the other side of the world from him, the page may be loading very slowly which is a concern for search engines. Generally, you want to make sure your web server is geographically located where most of your potential website visitors will come from.
If you expect a significant amount of traffic from across the entire world, you may want to consider a web hosting solution that can help distribute requests for your pages across a global network of computers and even if you’re serving up pages locally, you may also want to consider speeding things up by using Content Delivery Network (CDN) to help serve big files like images or a video from these servers that are located all over the world.
As a side note, there are some other considerations you want to be aware of when it comes to International SEO and you can learn more at the International SEO fundamentals Course here at LinkedIn Learning. Another thing that will help your pages load quickly is cashing. Your website may be configured to pull content and other information from a database on your web server every time a user requests one of your pages.
Content management systems like WordPress, Dribbble, Jumla and more work this way and virtually every product page you’ve ever seen on an Ecoma’s website is being constructed from calls to a database. One way to minimize the time consuming database workload in these situations is to enable server site cashing.
This is where your web server interacts with your data base only once in order to generate a given page and then it saves a copy of that content on the server for a period of time. Once that copy has been made, each subsequent view of that page will load the content that’s been saved on the server, bypassing any redundant database work. Many content management and Ecommerce systems have plug-ins or settings built in to help you accomplish this.
Another thing to consider is that search engines like Google have expressed a preference for secure sites using the HTTPS protocol, and Google has even indicated that this can be a ranking signal. In fact, as of July 2018, the Chrome browser, a Google product, began warning users that they are accessing an HTTP site and what that means for the web security making it even more valuable to choose HTTPS.
And last but not least, you want to make sure that your web server is consistently running and never experiencing any downtime. If your server is constantly down, search engines will consider you site unreliable and they won’t want to suggest it to their users. There are several online services that can help monitor your server uptime and downtime if this is a particular area of concern.
Making sure you’re doing everything you can from the web hosting perspective to load content fast, securely and reliably will not only keep your users happy with your website, but it will also make your site more attractive to the search engines who send them there.
Let’s take a look at how to use the very basics of Google Search Console, formally known as Google Webmaster Tools, to learn what information Google has about your website, how it views your website’s overall performance and search, and how we can use it to provide Google with instructions around how to best crawl and index your pages.
The first step is to go to search.google.com/search-console and sign in to your account. This requires a Google account and if you don’t already have one, you can head over to google.com/accounts to create one. Once you’re logged in, you’ll need to submit the exact version and protocol of the live site for the domain you want to manage.
And in this case, we’ll continue using our example site, Explore California. Remember there’s a difference between the https and http protocols. And to make sure we’re working with the right one, we’ve matched it to the http version. When you first sign up, in order to protect your account and your website, Google will need to verify that you actually own this domain and that you’re authorized to see some critical details around this website. So there are a few verification methods that you can choose from. The options that you or your webmaster have include uploading a specific HTML file to your site, or adding a specific meta tag to your source code, or making a small change to your site’s DNS record.
Another helpful option is the Google analytics or tag manager access method, which you can use if you’re using these services and have administrative access to the account.
Once you’ve verified a website, you’ll see a listing for it in the top left corner of Search Console dashboard. Clicking into this website will bring up a menu and all the different areas of the search console.
As you can see, there are many features to Google Search Console that you should explore. And there are always more new features coming out. All in all, Google has done a pretty good job of letting you know how it views your pages and allowing you to provide input into what it knows about you. Staying on top of Google Search Console month after month is certainly an endeavor that will pay its dividends.
Google isn’t the only search engine out there with free tools and another one you’ll want to get familiar with is Bing Webmaster Tools, which you can find at bing.com/toolbox/webmaster.
Much like Google search console, Bing Webmaster Tools will allow you to learn what information Bing has about the pages of your site and gives you a chance to provide Bing with a few instructions about how to index them.
Once you first log in, you’ll need to submit your domain to gain access. Your next step is to verify that you own and control this website. In order to prove that, much like with Google’s process, you can choose between uploading a specific file to your web server, copying and pasting a meta tag into your default page, or making a small change to your site’s DNS record.
Once you’ve verified a website, you can click into and you’ll see a dashboard containing statistics that can give you an idea or your search visibility for clicks and impressions over time, as well as any recent crawling and indexing trends.
Scrolling down, you can look through overviews of the search keywords and inbound links reports and clicking the see all links will take you in to the full reports. Clicking into the search keywords report shows clicks and impressions for each keyword, as well as average rankings and click-through rates for a given period of time.
The inbound links report displays a graph showing the count of inbound links to your pages that Bing knows about over time and the table of the inbound links page by page below it. You can click into any of your pages to see who’s linking to them.
Another important report is the crawl information report which you can find in the left-hand menu under reports and data. Here you can identify any crawl errors that Bing has found and if you see any, it’s important to implement fixes for those errors. You’ll also notice information around those 301 and 302 redirects that you may be using, so that you can ensure that your content moves are being handled appropriately.
Well, you should certainly take a look at the other reports in the reports and data section. It’s also important to make sure that you’re providing Bing with whatever information you can about your site and this is done in the configure my site section. Here you can manage your XML sitemaps, configure any URL parameter rules that will help Bing understand your URLs, control how Bing crawls your site, tell it what pages it’s allowed to see, and more.
Last, don’t forget to explore the diagnostics and tools section where you’ll find a host of tools to help you further optimize your site. The keyword research tool works in much the same way as others we’ve looked at and can be a great way to find even more keyword ideas. Just make sure to remember that the numbers you’re seeing are from the Bing’s search engine and these can often be quite a bit lower than estimates for Google.
You should also take some time to play with the SEO Analyzer tool. This works a lot like the Moz On-Page Grader that we looked at earlier, providing you information about any errors or issues you can fix for the page you’ve entered in. The major difference to note here is that this tool doesn’t take into account a specific keyword like the Moz tool does.
Although Bing’s share of the search market is certainly much smaller than Google’s, it’s still a very sizeable group of people that you can’t afford to ignore and not only do Bing’s webmaster tools give you the ability to optimize your Bing presence as best you can, they can also provide a richer data set and an alternate point of view for your overall SEO strategy.