Then I put those in a dictionary and send them along with my request.But did you know that theres a source of structured data that virtually every website on the internet supports automatically, by default.If it goes offline or gets horribly mangled, no one really notices.
![]() Just spend some time browsing the site until you find the data you need and figure out some basic access patterns which well talk about next. Unlike APIs however, theres really no documentation so you have to be a little clever about it. Youll need to start by finding your endpoints the URL or URLs that return the data you need. Pay attention to the URLs and how they change as you click between sections and drill down into sub-sections. Try typing in a few different terms and again, pay attention to the URL and how it changes depending on what you search for. Youll probably see a GET parameter like q that always changes based on you search term. Make sure that theres always a beginning to start the query string and a between each keyvalue pair. Most regular APIs do this as well, to keep single requests from slamming the database. Try changing this to some really high number and see what response you get when you fall off the end of the data. Again, look for a new GET parameter to be appended to the URL which indicates how many items are on the page. ![]() On the contrary If a site is using AJAX to load the data, that probably makes it even easier to pull the information you need. Zoom up and down through the DOM tree until you find the outermost around the item you want. It probably has some class attribute which you can use to easily pull out all of the other wrapper elements on the page. You can then iterate over these just as you would iterate over the items returned by an API response. Its possible that the DOM you see in the inspector has been modified by Javascript or sometime even the browser, if its in quirks mode. Spend some time doing research for a good HTML parsing library in your language of choice. Depending how sophisticated those protections are, you may run into additional challenges. Just fire off a request to your endpoint and parse the returned data. I just browse the site in my web browser and then grab all of the headers that my browser is automatically sending.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |