What precisely IS an API? They’re these issues that you simply copy and paste lengthy unusual codes into Screaming Frog for hyperlinks knowledge on a Website Crawl, proper?
I’m right here to let you know there’s a lot extra to them than that – for those who’re prepared to take just some little steps. However first, some fundamentals.
What’s an API?
API stands for “utility programming interface”, and it’s simply the best way of… utilizing a factor. The whole lot has an API. The online is a big API that takes URLs as enter and returns pages.
However particular knowledge companies just like the Moz Hyperlinks API have their very own algorithm. These guidelines fluctuate from service to service and generally is a main stumbling block for individuals taking the following step.
When Screaming Frog offers you the additional hyperlinks columns in a crawl, it’s utilizing the Moz Hyperlinks API, however you’ll be able to have this functionality wherever. For instance, all that tedious guide stuff you do in spreadsheet environments will be automated from data-pull to formatting and emailing a report.
Should you take this subsequent step, you will be extra environment friendly than your rivals, designing and delivering your individual search engine optimization companies as an alternative of relying upon, paying for, and being restricted by the following proprietary product integration.
GET vs. POST
Most APIs you’ll encounter use the identical knowledge transport mechanism as the net. Which means there’s a URL concerned similar to a web site. Don’t get scared! It’s simpler than you suppose. In some ways, utilizing an API is rather like utilizing a web site.
As with loading net pages, the request could also be in considered one of two locations: the URL itself, or within the physique of the request. The URL is known as the “endpoint” and the usually invisibly submitted additional a part of the request is known as the “payload” or “knowledge”. When the information is within the URL, it’s referred to as a “question string” and signifies the “GET” methodology is used. You see this on a regular basis while you search:
https://www.google.com/search?q=moz+hyperlinks+api <-- GET methodology
When the information of the request is hidden, it’s referred to as a “POST” request. You see this while you submit a type on the net and the submitted knowledge doesn’t present on the URL. While you hit the again button after such a POST, browsers often warn you in opposition to double-submits. The explanation the POST methodology is usually used is that you could match much more within the request utilizing the POST methodology than the GET methodology. URLs would get very lengthy in any other case. The Moz Hyperlinks API makes use of the POST methodology.
Making requests
An internet browser is what historically makes requests of internet sites for net pages. The browser is a kind of software program referred to as a consumer. Shoppers are what make requests of companies. Extra than simply browsers could make requests. The flexibility to make consumer net requests is usually constructed into programming languages like Python, or will be damaged out as a standalone software. The preferred instruments for making requests exterior a browser are curl and wget.
We’re discussing Python right here. Python has a built-in library referred to as URLLIB, however it’s designed to deal with so many several types of requests that it’s a little bit of a ache to make use of. There are different libraries which might be extra specialised for making requests of APIs. The preferred for Python is known as requests. It’s so standard that it’s used for nearly each Python API tutorial you’ll discover on the net. So I’ll use it too. That is what “hitting” the Moz Hyperlinks API appears to be like like:
response = requests.submit(endpoint, knowledge=json_string, auth=auth_tuple)
Provided that every little thing was arrange accurately (extra on that quickly), this may produce the next output:
{'next_token': 'JYkQVg4s9ak8iRBWDiz1qTyguYswnj035nqrQ1oIbW96IGJsb2dZgGzDeAM7Rw==', 'outcomes': [{'anchor_text': 'moz', 'external_pages': 7162, 'external_root_domains': 2026}]}
That is JSON knowledge. It is contained inside the response object that was returned from the API. It’s not on the drive or in a file. It’s in reminiscence. As long as it’s in reminiscence, you are able to do stuff with it (usually simply saving it to a file).
Should you needed to seize a chunk of information inside such a response, you might check with it like this:
response['results'][0]['external_pages']
This says: “Give me the primary merchandise within the outcomes record, after which give me the external_pages worth from that merchandise.” The consequence can be 7162.
NOTE: Should you’re really following alongside executing code, the above line received’t work alone. There’s a certain quantity of setup we’ll do shortly, together with putting in the requests library and establishing just a few variables. However that is the essential concept.
JSON
JSON stands for JavaScript Object Notation. It’s a method of representing knowledge in a method that’s simple for people to learn and write. It’s additionally simple for computer systems to learn and write. It’s a quite common knowledge format for APIs that has considerably taken over the world for the reason that older methods had been too tough for most individuals to make use of. Some individuals may name this a part of the “restful” API motion, however the far more tough XML format can also be thought-about “restful” and everybody appears to have their very own interpretation. Consequently, I discover it greatest to only give attention to JSON and the way it will get out and in of Python.
Python dictionaries
I lied to you. I mentioned that the information construction you had been taking a look at above was JSON. Technically it’s actually a Python dictionary or dict datatype object. It’s a particular type of object in Python that’s designed to carry key/worth pairs. The keys are strings and the values will be any kind of object. The keys are just like the column names in a spreadsheet. The values are just like the cells within the spreadsheet. On this method, you’ll be able to consider a Python dict as a JSON object. For instance right here’s making a dict in Python:
my_dict = { "identify": "Mike", "age": 52, "metropolis": "New York" }
And right here is the equal in JavaScript:
var my_json = { "identify": "Mike", "age": 52, "metropolis": "New York" }
Just about the identical factor, proper? Look intently. Key-names and string values get double-quotes. Numbers don’t. These guidelines apply persistently between JSON and Python dicts. In order you may think, it’s simple for JSON knowledge to circulate out and in of Python. This can be a nice present that has made fashionable API-work extremely accessible to the newbie by means of a software that has revolutionized the sphere of information science and is making inroads into advertising and marketing, Jupyter Notebooks.
Flattening knowledge
However beware! As knowledge flows between methods, it’s not unusual for the information to subtly change. For instance, the JSON knowledge above is likely to be transformed to a string. Strings may look precisely like JSON, however they’re not. They’re only a bunch of characters. Generally you’ll hear it referred to as “serializing”, or “flattening”. It’s a delicate level, however value understanding as it would assist with one of many largest obstacles with the Moz Hyperlinks (and most JSON) APIs.
Objects have APIs
Precise JSON or dict objects have their very own little APIs for accessing the information within them. The flexibility to make use of these JSON and dict APIs goes away when the information is flattened right into a string, however it would journey between methods extra simply, and when it arrives on the different finish, will probably be “deserialized” and the API will come again on the opposite system.
Knowledge flowing between methods
That is the idea of moveable, interoperable knowledge. Again when it was referred to as Digital Knowledge Interchange (or EDI), it was a really huge deal. Then alongside got here the net after which XML after which JSON and now it’s only a regular a part of doing enterprise.
Should you’re in Python and also you need to convert a dict to a flattened JSON string, you do the next:
import json my_dict = { "identify": "Mike", "age": 52, "metropolis": "New York" } json_string = json.dumps(my_dict)
…which might produce the next output:
'{"identify": "Mike", "age": 52, "metropolis": "New York"}'
This appears to be like virtually the identical as the unique dict, however for those who look intently you’ll be able to see that single-quotes are used across the complete factor. One other apparent distinction is that you could line-wrap actual structured knowledge for readability with none unwell impact. You may’t do it so simply with strings. That’s why it’s introduced all on one line within the above snippet.
Such stringifying processes are carried out when passing knowledge between completely different methods as a result of they aren’t all the time appropriate. Regular textual content strings then again are appropriate with virtually every little thing and will be handed on web-requests with ease. Such flattened strings of JSON knowledge are regularly known as the request.
Anatomy of a request
Once more, right here’s the instance request we made above:
response = requests.submit(endpoint, knowledge=json_string, auth=auth_tuple)
Now that you simply perceive what the variable identify json_string is telling you about its contents, you shouldn’t be stunned to see that is how we populate that variable:
data_dict = { "goal": "moz.com/weblog", "scope": "web page", "restrict": 1 } json_string = json.dumps(data_dict)
…and the contents of json_string appears to be like like this:
'{"goal": "moz.com/weblog", "scope": "web page", "restrict": 1}'
That is considered one of my key discoveries in studying the Moz Hyperlinks API. That is in widespread with numerous different APIs on the market however journeys me up each time as a result of it’s a lot extra handy to work with structured dicts than flattened strings. Nevertheless, most APIs count on the information to be a string for portability between methods, so now we have to transform it on the final second earlier than the precise API-call happens.
Pythonic masses and dumps
Now it’s possible you’ll be questioning in that above instance, what a dump is doing in the course of the code. The json.dumps() perform is known as a “dumper” as a result of it takes a Python object and dumps it right into a string. The json.masses() perform is known as a “loader” as a result of it takes a string and masses it right into a Python object.
The explanation for what seem like singular and plural choices are literally binary and string choices. In case your knowledge is binary, you utilize json.load() and json.dump(). In case your knowledge is a string, you utilize json.masses() and json.dumps(). The s stands for string. Leaving the s off means binary.
Don’t let anyone let you know Python is ideal. It’s simply that its tough edges aren’t excessively objectionable.
Task vs. equality
For these of you fully new to Python or programming basically, what we’re doing once we hit the API is known as an task. The results of requests.submit() is being assigned to the variable named response.
response = requests.submit(endpoint, knowledge=json_string, auth=auth_tuple)
We’re utilizing the = signal to assign the worth of the proper aspect of the equation to the variable on the left aspect of the equation. The variable response is now a reference to the thing that was returned from the API. Task is completely different from equality. The == signal is used for equality.
# That is task: a = 1 # a is now equal to 1 # That is equality: a == 1 # True, however depends that the above line has been executed
The POST methodology
response = requests.submit(endpoint, knowledge=json_string, auth=auth_tuple)
The requests library has a perform referred to as submit() that takes 3 arguments. The primary argument is the URL of the endpoint. The second argument is the information to ship to the endpoint. The third argument is the authentication data to ship to the endpoint.
Key phrase parameters and their arguments
It’s possible you’ll discover that a number of the arguments to the submit() perform have names. Names are set equal to values utilizing the = signal. Right here’s how Python capabilities get outlined. The primary argument is positional each as a result of it comes first and likewise as a result of there’s no key phrase. Keyworded arguments come after position-dependent arguments. Belief me, all of it is smart after some time. All of us begin to suppose like Guido van Rossum.
def arbitrary_function(argument1, identify=argument2): # do stuff
The identify within the above instance is known as a “key phrase” and the values that are available in on these areas are referred to as “arguments”. Now arguments are assigned to variable names proper within the perform definition, so you’ll be able to check with both argument1 or argument2 wherever inside this perform. Should you’d wish to be taught extra concerning the guidelines of Python capabilities, you’ll be able to examine them right here.
Establishing the request
Okay, so let’s allow you to do every little thing vital for that success assured second. We’ve been exhibiting the essential request:
response = requests.submit(endpoint, knowledge=json_string, auth=auth_tuple)
…however we haven’t proven every little thing that goes into it. Let’s do this now. Should you’re following alongside and don’t have the requests library put in, you are able to do so with the next command from the identical terminal surroundings from which you run Python:
pip set up requests
Typically instances Jupyter can have the requests library put in already, however in case it doesn’t, you’ll be able to set up it with the next command from inside a Pocket book cell:
!pip set up requests
And now we are able to put all of it collectively. There’s only some issues right here which might be new. Crucial is how we’re taking 2 completely different variables and mixing them right into a single variable referred to as AUTH_TUPLE. You’ll have to get your individual ACCESSID and SECRETKEY from the Moz.com web site.
The API expects these two values to be handed as a Python knowledge construction referred to as a tuple. A tuple is an inventory of values that don’t change. I discover it fascinating that requests.submit() expects flattened strings for the knowledge parameter, however expects a tuple for the auth parameter. I suppose it is smart, however these are the delicate issues to grasp when working with APIs.
Right here’s the total code:
import json import pprint import requests # Set Constants ACCESSID = "mozscape-1234567890" # Change along with your entry ID SECRETKEY = "1234567890abcdef1234567890abcdef" # Change along with your secret key AUTH_TUPLE = (ACCESSID, SECRETKEY) # Set Variables endpoint = "https://lsapi.seomoz.com/v2/anchor_text" data_dict = {"goal": "moz.com/weblog", "scope": "web page", "restrict": 1} json_string = json.dumps(data_dict) # Make the Request response = requests.submit(endpoint, knowledge=json_string, auth=AUTH_TUPLE) # Print the Response pprint(response.json())
…which outputs:
{'next_token': 'JYkQVg4s9ak8iRBWDiz1qTyguYswnj035nqrQ1oIbW96IGJsb2dZgGzDeAM7Rw==', 'outcomes': [{'anchor_text': 'moz', 'external_pages': 7162, 'external_root_domains': 2026}]}
Utilizing all higher case for the AUTH_TUPLE variable is a conference many use in Python to point that the variable is a continuing. It’s not a requirement, however it’s a good suggestion to comply with conventions when you’ll be able to.
It’s possible you’ll discover that I didn’t use all uppercase for the endpoint variable. That’s as a result of the anchor_text endpoint isn’t a continuing. There are a selection of various endpoints that may take its place relying on what kind of lookup we needed to do. The alternatives are:
-
anchor_text
-
final_redirect
-
global_top_pages
-
global_top_root_domains
-
index_metadata
-
link_intersect
-
link_status
-
linking_root_domains
-
hyperlinks
-
top_pages
-
url_metrics
-
usage_data
And that leads into the Jupyter Pocket book that I ready on this matter positioned right here on Github. With this Pocket book you’ll be able to prolong the instance I gave right here to any of the 12 out there endpoints to create quite a lot of helpful deliverables, which would be the topic of articles to comply with.