Skip to content

Latest commit

 

History

History
145 lines (123 loc) · 4.19 KB

README.md

File metadata and controls

145 lines (123 loc) · 4.19 KB

Mastino

A Golang library for scraping web page. It can be used with any language ( php, java, c# ... ) since Mastino uses json as input/output interchange format.

Usage

run commands

git clone https://github.com/angeloLed/mastino.git mastino
cd mastino
go get -u

You can also run Mastino like this:

#compiled
./mastino "{\"url\":\"https://blog.golang.org/error-handling-and-go\", \"tags\": [{\"type\": \"div\",\"attributes\": [{\"key\": \"class\",\"value\": \"code\"}]}]}"

# or not compiled
go run main.go "{\"url\":\"https://blog.golang.org/error-handling-and-go\", \"tags\": [{\"type\": \"div\",\"attributes\": [{\"key\": \"class\",\"value\": \"code\"}]}]}"

Manifest configuration Example of manifest configuration:

{
	"url": "https://google.com",
	"tags": [{
			"type": "a",
			"attributes": [{
				"key": "class",
				"value": "bg1"
			}, {
				"type": "span"
			}]
		},
		{
			"type": "div",
			"attributes": [{
			    "key": "css"
	        }]
    }]
}
key type description
url string target url
tags array each tag that Mastino must try to bite. The key contains all the matching rules. Contains attributes type
type string DOM element
attributes array each attributes of tag to verify. Contais key value
key string the key of attribute
value string the value of attribute

Possibile matching result

  • tag is specified without attribute
  • attribute key is specified without value
  • attribute key is specified and value match with value of DOM

Json Output Mastino returns a JSON with the same structure of the provided input JSON. Each match found is returned us a matches entry inside tags element. matches is an array of strings ( the DOM ).

{
	"url": "https://blog.golang.org/error-handling-and-go",
	"tags": [{
		"type": "div",
		"attributes": [{
			"key": "class",
			"value": "code"
		}],
		"matches": ["<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">", "<div class=\"code\">"]
	}],
	"error": false,
	"message": ""
}

Using Mastino with Other languages PHP:

$jsonManifest = "{\"url\":\"https://blog.golang.org/error-handling-and-go\", \"tags\": [{\"type\": \"div\",\"attributes\": [{\"key\": \"class\",\"value\": \"code\"}]}]}";
$output = shell_exec("/path/to/mastino {$jsonManifest}");
var_dump(json_decode($output, true));

C# :

using System;
using System.Diagnostics;

class Runshell
{
  static void Main()
  {
    ProcessStartInfo psi = new ProcessStartInfo();
    psi.FileName = "/path/to/mastino";
    psi.UseShellExecute = false;
    psi.RedirectStandardOutput = true;

    psi.Arguments = "{\"url\":\"https://blog.golang.org/error-handling-and-go\", \"tags\": [{\"type\": \"div\",\"attributes\": [{\"key\": \"class\",\"value\": \"code\"}]}]}";
    Process p = Process.Start(psi);
    string strOutput = p.StandardOutput.ReadToEnd();
    p.WaitForExit();
    Console.WriteLine(strOutput);
  }
}

Nodejs:

var sys = require('sys')
var exec = require('child_process').exec;
function puts(error, stdout, stderr) { sys.puts(stdout) }
var manifest = "{\"url\":\"https://blog.golang.org/error-handling-and-go\", \"tags\": [{\"type\": \"div\",\"attributes\": [{\"key\": \"class\",\"value\": \"code\"}]}]}";
exec("/path/to/mastino " + manifest, puts);

Golang:

package main

import "os/exec"

func main() {
    app := "/path/to/mastino"

    manifest := "{\"url\":\"https://blog.golang.org/error-handling-and-go\", \"tags\": [{\"type\": \"div\",\"attributes\": [{\"key\": \"class\",\"value\": \"code\"}]}]}"

    cmd := exec.Command(app, manifest, nil, nil, nil)
    stdout, err := cmd.Output()

    if err != nil {
        println(err.Error())
        return
    }

    print(string(stdout))
}

License

MIT