Scrapy – Call function when spider closed

Hello Wednesday,

Today at work I had the chance to play with Scrapy. It is quite fast and really easy to use.
I will write posts about this crawling framework later, but for the time being I will take a quick note about how to call a function when spider closed.

After you crawl all the items and wanna do something else. Simply use the closed function.


# -*- coding: utf-8 -*-
import scrapy

class TcvStockSpider(scrapy.Spider):
	def closed(self, reason):
           #do something

In their document, there is another way to achieve this but I myself think this is simple and easy to achieve way.

retry failed kue in nodejs

Hello there, today I’m gonna write about how to clear kue and retry failed kue.

Kue is a priority job queue backed by redis, built for node.js. Sometimes, our jobs will failed, that is when they are marked as a failure, and remain that way until you intervene. In this case you will want to remove or re-attempt them.

First of all you need to find all those failed jobs:

kue.Job.rangeByState( 'failed', 0, n, 'asc', function( err, jobs ) {
// you have an array of maximum n Job objects here
});

Next, if you want to remove those jobs from, simply throw this code:

async.eachSeries(jobs, function(job, cb) {
    job.remove(cb);
}, cb);

In case you want to retry them, change their state from
failed to inactive:

async.eachSeries(jobs, function(job, cb) {
    job.state('inactive').save(cb);
}, cb);

To sum up, the whole function to retry fail jobs is:


var retryFailedKue = function(cb) {
    var n = 2000;
    var kue = require('kue');
    async.waterfall([
        function (cb) {
            kue.Job.rangeByState('failed', 0, n, 'asc', cb);
        },
        function (jobs, cb) {
            async.eachSeries(jobs, function(job, cb) {	
                job.state('inactive').save(cb);
            }, cb);
        }],
        function (err) {
            if (err) console.log(err);
            process.exit();
        });
};

Git – Untrack pyc files from source control

Why do we need to do this? What is a pyc file?
Python automatically compiles script to compiled code before execute it. Doing this will help your script run more smooth. And because this is automatically generated files, there is no use to commit a pyc file to your project’s source control.

$ find . -name '*.pyc' | xargs -n 1 git rm --cached

Beautify URL – Filter values before submitting Form

Case: Say you are using pjax and your site’s search page has a form with 3 inputs (A, B and C), when user only changes value of filter C, and submit the form, your site’s URL will transform into something like:

yoursite.com/search/?filterA=&filterB=&filterC=random

Problem: Ugly URL

Solution: Filter values before submitting form

Predicted result:

yoursite.com/search/?filterC=random

Code:

$formSearch.submit(function(event) {
    var $form = $(this);
    var options = {};
    options.data = $form.find(":input").filter(function() {
        return $(this).val() !== '';
    }).serializeArray();
    $.pjax.submit(event, '#pjax-container', options);
});

error message: Thread 1: signal SIGABRT in class AppDelegate

You have a lingering connection to a no-longer-existent outlet. Double check this via the main storyboard file. What exactly you need to do is right click on each view to see if you got any outlets that are not being used? If yes, then simply remove that outlet.

Kill a process running a particular port

Sometimes, when I vagrant up my dev environment, the below warning appears.

Vagrant cannot forward the specified ports on this VM, since they would collide with some other application that is already listening on these ports. The forwarded port to 9200 is already in use on the host machine.

So what we need to do is to list any process listening to the port 9200. Type in your command line:

lsof -i:9200

It will return something like this:

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
644 Smith IPv6 0x106de0781c6c7bd3 0t0 TCP [fe80:1::1]:wap-wsp (LISTEN)
644 Smith IPv6 0x106de0781c6c7673 0t0 TCP localhost:wap-wsp (LISTEN)

Finally kill the process with its PID:

kill -9 644