While analyzing some of our ecommerce clients’ Google Search Console (GSC) accounts recently, we noticed some odd URLs with non-English characters and some with excessive English characters that they didn’t create. These showed up as large numbers of non-indexed pages in GSC, specifically as server errors, soft 404s, and/or 404 warnings. After doing some research, we found that the URLs depended on the client’s platform, and typically were caused by website exploits to unused or often-forgotten-about pages.
I’ll break down the list of common website exploits and vulnerabilities per platform and what to do about them. If you’re in a hurry, jump straight to your platform here: Shopify | Magento | WordPress | BigCommerce
If Your Website Platform Is Shopify
Issue: Almost all Shopify sites have the page /collections/vendors – but not all sites use this page. Hackers know this and find ways to inject junk code (and sometimes junk content) into these pages.
Example URLs:
- site.com/collections/vendors?q=国外代购买东西划算吗【www·biqubiqu·com】国外代购网站app靠谱vMB%2C南特彩票yRZyqf
- site.com/collections/vendors?q=%E5%9B%BD%E5%A4%96%E4%BB%A3%E8%B4%AD%E4%B9%B0%E4%B8%9C%E8%A5%BF%E5%88%92%
E7%AE%97%E5%90%97%E3%80%90www%C2%B7biqubiqu%C2%B7com%E3%80%91%E5%9B%BD%E5%A4%96
%E4%BB%A3%E8%B4%AD%E7%BD%91%E7%AB%99app%E9%9D%A0%E8%B0%B1vMB%2C%E5%8D%97%E7
%89%B9%E5%BD%A9%E7%A5%A8yRZyqf
Solution: If you are not using these pages, keep them out of SERPs by ensuring all /collections/vendors?q= yield a 404-status code and adding a meta robots “noindex” tag to the section. Doing this will prevent the pages from being indexed and wasting your site’s crawl budget.
While crawl budget isn’t often an issue anymore, it can be if Googlebot has to crawl through thousands or hundreds of thousands of unnecessary URLs that you didn’t create and don’t consider important.
How to do it:
- Go to Online Store (in Shopify Admin) > Navigation > View URL Redirects link at the top of the page.
- Redirect /collections/vendors to /404
- Note, if you are using this page path, check for the issue at /collections/vendors?= and redirect that to /404, if necessary.
- Edit your theme.liquid file by adding the following in the section:
- {%- if request.path == ‘/collections/vendors’ -%}
<meta name=”robots” content=”noindex”>
{%- endif -%}
- {%- if request.path == ‘/collections/vendors’ -%}
For more information about how to fix this security loophole on your Shopify site, visit this Shopify Community thread.
Shopify sites should also check their internal site search results pages. This was found to be a source of indexed, non-English character URLs for a client’s site recently. This would look something like: site.com/search?q=홍콩클라우드서버⌒텐… To block these pages from being indexed (or to force them out of SERPs), add a meta robots tag to the section of these pages’ template on your site. See the Magento recommendations below for more information.
If Your Website Platform Is Magento
Issue: Search results pages on Magento sites are indexable by default. If you’re a current or past ROI Revolution SEO client, you know we always recommend you “noindex” your search results pages (because anything that’s visible via your site search results should also be navigable to on your site in another way). In this exploit, hackers inject junk code into indexable search results pages to make your site appear to be full of spammy URLs.
Example URLs:
- site.com/catalogsearch/result/?q=南京代孕公司哪个医院成功率最高-%28微信38332747%29-加拿大代孕生子最好的-香港代孕机构收费价格-杭州代孕哪里做比较好-%28微信38332747%29-长沙代孕公司哪个医院成功率最高YH
- site.com/catalogsearch/result/?q=天津%20代孕-%28微信38332747%29-香港双胞胎代孕-上海代孕生子多少钱-郑州代孕哪里找-%28微信38332747%29-广州代孕多少钱
Solution: Googlebot doesn’t like crawling infinite spaces that lead to low-quality or empty pages/soft 404s. Keep these search results pages out of SERPs by adding a snippet of code to the page template.
How to do it: Add the following to the section of your /catalogsearch/result/ pages:
<html>
<head>
<meta name="robots" content="noindex">
(...)
</head>
<body>(...)</body>
</html>
If Your Website Platform Is WordPress
Issue: WordPress pages have a search results page (/search/) that may be indexable by default. This could allow hackers to inject junk code and create hundreds of useless pages for Googlebot to waste time spidering through.
Example URLs:
- /search/%25F0%259F%2593%25BF%25F0%259F%25A7%25BFwww.datesol.xyz%25F0%259F%2593%25BF%25F0%25
9F%25A7%25BFdating%2Bgood%2Bthai%2Bgirl%2Bsong%2B%25F0%259F%2593%25BF%25F0%259F%25A7%25B
F%2BDATING%2BSITE%25F0%259F%2593%25BF%2Bdating%2Bgood%2Bthai%2Bgirl%2Bsong%2Bzbycmrupwn%2
Bdating%2Bgood%2Bthai%2Bgirl%2Bsong%2Bqdwibtugal%2Bdating%2Bgood%2Bthai%2Bgirl%2Bsong%2Bvajynxdoz
k%2Bdating%2Bgood%2Bthai%2Bgirl%2Bsong%2Bwemchxpalb%2Bdating%2Bgood%2Bthai%2Bgirl%2Bsong%2Bkhjx
vfbgap%2Bdating%2Bgood%2Bthai%2Bgirl%2Bsong%25F0%259F%2593%25BF%25F0%259F%25A7%25BFwww.date
sol.xyz%25F0%259F%2 - /search/%25F0%259F%25AA%2580%25E2%259D%25A4%25EF%25B8%258F%25EF%25B8%258Fthe%2Blove%2Bma
chine%2Btv%2Bseries%2Bdating%2Bshows%2Buk%25F0%259F%25AA%2580%25E2%259D%25A4%25EF%25B8%25
8F%25EF%25B8%258Fwww.weke.xyz%25F0%259F%25AA%2580%25E2%259D%25A4%25EF%25B8%258F%25EF%
25B8%258F/feed/rss2/paged-12/4/
Solution: Make sure these pages yield a 404-status code and apply a “noindex” meta robots tag to keep them out of SERPs (or to remove them if they’re already in there).
How to do it: If you’re using Yoast, this setting has likely already been applied for you. If you’re not using Yoast, consider adding it for an easy (read: hands-off!) way to edit your internal search results pages setting.
If Your Website Platform Is BigCommerce
BigCommerce does not allow you to edit individual pages’ meta robots tags, but there is a disallow statement in robots.txt for /search.php by default. Unfortunately, I have seen evidence of Googlebot indexing the /search.php page for some clients, but I have not seen any instances with the excessive character usage mentioned above. This may be a non-issue for BigCommerce users, but you’ll want to keep an eye on Google Search Console to make sure it stays that way.
Tying It All Together: Website Exploits & Securing Your Internal Site Search Results Pages
Taking proactive steps now to secure any potential loopholes in your internal site search results pages can save major headaches down the line. Protect your site from hackers looking for easy website exploit opportunities using the guidelines above.
Noticing other concerns in Google Search Console and not sure what to do? Check out our post about finding and fixing GSC errors.