Changeset 5:21ed311b3edc
Updated site and added post
author | unexist |
---|---|
date | Wed, 07 Oct 2020 13:31:09 +0200 |
parents | 9b5050a448f8 |
children | 8e0b56f91cec |
files | _posts/2020-10-07-lessons-learned.markdown _site/feed.xml _site/index.html _site/tag/chromedriver.html _site/tag/headless.html _site/tag/ruby.html _site/tag/testing.html _site/tag/tools.html _site/tag/watir.html |
diffstat | 9 files changed, 85 insertions(+), 9 deletions(-) [+] |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/_posts/2020-10-07-lessons-learned.markdown Wed Oct 07 13:31:09 2020 +0200 @@ -0,0 +1,76 @@ +--- +layout: post +title: "Lessons learned" +date: 2020-10-07 12:00:00 +0200 +author: Christoph Kappel +tags: tools testing ruby watir chromedriver headless +--- +A friend of mine approached me with a request to - let us say automate - +a web request to a not-to-be-named raffle. I looked into it and was +instantly hooked. Both price and challenge are interesting. + +I wrote quite a few scraper in my time, so after a few lines good ol' +Ruby, with the help of [mechanize](https://github.com/sparklemotion/mechanize), +the first shot was ready and failed miserably due to *CSRF*. + +After a quick check, yes they really use a pesudo-random token, which is injected +into the DOM and a hidden input field via JS. + +I had two options now: + +1. Understand the code that writes the CSRF into the dom +2. Find a scraper with a JS engine + +In my day job, we always like to play with with e2e-testing, which mostly +involves scripts, that remote-control a web browser. + +That said, I had a few glances at this stack again. And after some more reading, +I settled on [watir](http://watir.com/) and [chromedriver](https://chromedriver.chromium.org/). + +### Watir + +The API of [watir](http://watir.com) is really amazing and easy to use: + +```ruby +require "watir" + +browser = Watir::Browser::new + +browser.goto "https://blog.unexist.dev" + +browser.close +``` + +I think the example is pretty self-explanatory, it opens up a remote session and +points the browser to the given url. When you start that in e.g. irb, you can +REPL your way to the desired outcome. + +```ruby +browser.link(visible_text: /GitHub/).when_present.click +``` +The above example looks for a link with *GitHub* in its visible text and click +it, when present. Easy as that. + +### Headless? + +One problem solved, this runs nicely on my *local* machine. Now it would be best, +if I can just deploy it on a server without installing the whole docker stack. + +Since we are targeting Linux, headless support is kind of built-in. And after a +quick search I found [headless](https://github.com/leonid-shevtsov/headless). + +This *gem* wraps the handling of a virtual framebuffer for you and, as it turns +out, works pretty well with my stack: + +```ruby +require "watir" +require "headless" + +Headless.ly do + browser = Watir::Browser::new + + browser.goto "https://blog.unexist.dev" + + browser.close +end +```
--- a/_site/feed.xml Wed Oct 07 13:28:29 2020 +0200 +++ b/_site/feed.xml Wed Oct 07 13:31:09 2020 +0200 @@ -1,4 +1,4 @@ -<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.1.1">Jekyll</generator><link href="http://localhost:9000/feed.xml" rel="self" type="application/atom+xml" /><link href="http://localhost:9000/" rel="alternate" type="text/html" /><updated>2020-10-07T13:27:54+02:00</updated><id>http://localhost:9000/feed.xml</id><title type="html">unexist.dev</title><subtitle>Random bits and rants about tech, software design and me</subtitle><entry><title type="html">Lesson learned</title><link href="http://localhost:9000/2020/10/07/lesson-learned.html" rel="alternate" type="text/html" title="Lesson learned" /><published>2020-10-07T12:00:00+02:00</published><updated>2020-10-07T12:00:00+02:00</updated><id>http://localhost:9000/2020/10/07/lesson-learned</id><content type="html" xml:base="http://localhost:9000/2020/10/07/lesson-learned.html"><p>A friend of mine approached me with a request to - let us say automate - +<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.1.1">Jekyll</generator><link href="http://localhost:9000/feed.xml" rel="self" type="application/atom+xml" /><link href="http://localhost:9000/" rel="alternate" type="text/html" /><updated>2020-10-07T13:30:54+02:00</updated><id>http://localhost:9000/feed.xml</id><title type="html">unexist.dev</title><subtitle>Random bits and rants about tech, software design and me</subtitle><entry><title type="html">Lessons learned</title><link href="http://localhost:9000/2020/10/07/lessons-learned.html" rel="alternate" type="text/html" title="Lessons learned" /><published>2020-10-07T12:00:00+02:00</published><updated>2020-10-07T12:00:00+02:00</updated><id>http://localhost:9000/2020/10/07/lessons-learned</id><content type="html" xml:base="http://localhost:9000/2020/10/07/lessons-learned.html"><p>A friend of mine approached me with a request to - let us say automate - a web request to a not-to-be-named raffle. I looked into it and was instantly hooked. Both price and challenge are interesting.</p>
--- a/_site/index.html Wed Oct 07 13:28:29 2020 +0200 +++ b/_site/index.html Wed Oct 07 13:31:09 2020 +0200 @@ -27,8 +27,8 @@ <h2 class="post-list-heading">Posts</h2> <ul class="post-list"><li><span class="post-meta">Oct 7, 2020</span> <h3> - <a class="post-link" href="/2020/10/07/lesson-learned.html"> - Lesson learned + <a class="post-link" href="/2020/10/07/lessons-learned.html"> + Lessons learned </a> </h3></li><li><span class="post-meta">Sep 25, 2020</span> <h3>
--- a/_site/tag/chromedriver.html Wed Oct 07 13:28:29 2020 +0200 +++ b/_site/tag/chromedriver.html Wed Oct 07 13:31:09 2020 +0200 @@ -34,7 +34,7 @@ · <span class="tags"><a href="/tag/tools.html" rel="tag">tools</a>, <a href="/tag/testing.html" rel="tag">testing</a>, <a href="/tag/ruby.html" rel="tag">ruby</a>, <a href="/tag/watir.html" rel="tag">watir</a>, <a href="/tag/chromedriver.html" rel="tag">chromedriver</a>, <a href="/tag/headless.html" rel="tag">headless</a></span> </div> - <a href="/2020/10/07/lesson-learned.html">Lesson learned</a> + <a href="/2020/10/07/lessons-learned.html">Lessons learned</a> </li> </ul>
--- a/_site/tag/headless.html Wed Oct 07 13:28:29 2020 +0200 +++ b/_site/tag/headless.html Wed Oct 07 13:31:09 2020 +0200 @@ -34,7 +34,7 @@ · <span class="tags"><a href="/tag/tools.html" rel="tag">tools</a>, <a href="/tag/testing.html" rel="tag">testing</a>, <a href="/tag/ruby.html" rel="tag">ruby</a>, <a href="/tag/watir.html" rel="tag">watir</a>, <a href="/tag/chromedriver.html" rel="tag">chromedriver</a>, <a href="/tag/headless.html" rel="tag">headless</a></span> </div> - <a href="/2020/10/07/lesson-learned.html">Lesson learned</a> + <a href="/2020/10/07/lessons-learned.html">Lessons learned</a> </li> </ul>
--- a/_site/tag/ruby.html Wed Oct 07 13:28:29 2020 +0200 +++ b/_site/tag/ruby.html Wed Oct 07 13:31:09 2020 +0200 @@ -34,7 +34,7 @@ · <span class="tags"><a href="/tag/tools.html" rel="tag">tools</a>, <a href="/tag/testing.html" rel="tag">testing</a>, <a href="/tag/ruby.html" rel="tag">ruby</a>, <a href="/tag/watir.html" rel="tag">watir</a>, <a href="/tag/chromedriver.html" rel="tag">chromedriver</a>, <a href="/tag/headless.html" rel="tag">headless</a></span> </div> - <a href="/2020/10/07/lesson-learned.html">Lesson learned</a> + <a href="/2020/10/07/lessons-learned.html">Lessons learned</a> </li> </ul>
--- a/_site/tag/testing.html Wed Oct 07 13:28:29 2020 +0200 +++ b/_site/tag/testing.html Wed Oct 07 13:31:09 2020 +0200 @@ -34,7 +34,7 @@ · <span class="tags"><a href="/tag/tools.html" rel="tag">tools</a>, <a href="/tag/testing.html" rel="tag">testing</a>, <a href="/tag/ruby.html" rel="tag">ruby</a>, <a href="/tag/watir.html" rel="tag">watir</a>, <a href="/tag/chromedriver.html" rel="tag">chromedriver</a>, <a href="/tag/headless.html" rel="tag">headless</a></span> </div> - <a href="/2020/10/07/lesson-learned.html">Lesson learned</a> + <a href="/2020/10/07/lessons-learned.html">Lessons learned</a> </li> <li>
--- a/_site/tag/tools.html Wed Oct 07 13:28:29 2020 +0200 +++ b/_site/tag/tools.html Wed Oct 07 13:31:09 2020 +0200 @@ -34,7 +34,7 @@ · <span class="tags"><a href="/tag/tools.html" rel="tag">tools</a>, <a href="/tag/testing.html" rel="tag">testing</a>, <a href="/tag/ruby.html" rel="tag">ruby</a>, <a href="/tag/watir.html" rel="tag">watir</a>, <a href="/tag/chromedriver.html" rel="tag">chromedriver</a>, <a href="/tag/headless.html" rel="tag">headless</a></span> </div> - <a href="/2020/10/07/lesson-learned.html">Lesson learned</a> + <a href="/2020/10/07/lessons-learned.html">Lessons learned</a> </li> <li>
--- a/_site/tag/watir.html Wed Oct 07 13:28:29 2020 +0200 +++ b/_site/tag/watir.html Wed Oct 07 13:31:09 2020 +0200 @@ -34,7 +34,7 @@ · <span class="tags"><a href="/tag/tools.html" rel="tag">tools</a>, <a href="/tag/testing.html" rel="tag">testing</a>, <a href="/tag/ruby.html" rel="tag">ruby</a>, <a href="/tag/watir.html" rel="tag">watir</a>, <a href="/tag/chromedriver.html" rel="tag">chromedriver</a>, <a href="/tag/headless.html" rel="tag">headless</a></span> </div> - <a href="/2020/10/07/lesson-learned.html">Lesson learned</a> + <a href="/2020/10/07/lessons-learned.html">Lessons learned</a> </li> </ul>