• Aucun résultat trouvé

WebOOB (Web Outside Of Browser)

N/A
N/A
Protected

Academic year: 2022

Partager "WebOOB (Web Outside Of Browser)"

Copied!
16
0
0

Texte intégral

(1)

WebOOB

(Web Outside Of Browser)

Use the Web from your shell!

François Revol [email protected]

(2)

Who needs browsers anyway?

“The Web is about transmitting information to everyone regardless the platform” (Tim B. Lee)

Browsers need to load 2 or 3MB of images and JS even when you just need the data itself.

JS runs untrusted non-free code on your machine

You can't easily pipe the browser into grep, cut or sed �

(3)

WebOOB,

a Web client for your shell

A python framework for web scraping

Several capabilities (video, bank, message…)

CLI & GUI applications using capabilities

To search and collect data, submit forms…

Modules implementing some capabilities

Youtube, Europarl (video), PhpBB (message)…

(4)

WebOOB framework

A set of python classes

Browser functions

HTTP[S] engine

HTML parser…

Settings for application and backends

Module discovery…

(5)

Applications

Command line

Interactive (FTP-like commands) or not

Formatters for CSV, JSON, HTML, plain text…

GUI (PyQt)

Simple GUI for a single task

“There's an OOB for that!”

(6)

(Some) Applications

[Q]Boobmsg

[Q]Cineoob

[Q]Cookboob

[Q]HaveDate

[Q]Videoob

Boobank

Boobill

Boobtracker

Comparoob

Pastoob…

(7)

Modules

Support one or more capabilities for a website

Instantiated for a specific website = backend

[vimeo]

_enabled = 1

_module = vimeo

[redminedemo]

_module = redmine

url = http://demo.redmine.org/

username = import

(8)

(Some) Modules

Redmine

Github (tickets)

FreeMobile (bills)

Many (french) banks

Chronopost

Collissimo…

Youtube

Europarl (videos)

Vimeo

Dailymotion…

RMLL \o/ (videos)

(9)

Development status

Not all modules support all wanted capabilities

Some video modules lack search function…

Browser2 class makes writing modules easier

Some still needs rewriting from old Browser class

Used professionally for banking websites

(10)

*nix commands composition

Now you can

Redirect stdout to the Web

Redirect stdin from it as well

Automate things with your shell of choice

Support new sites without changing the workflow

(11)

Creating 200 tickets from a CSV?

Configure a backend with the redmine or github account

Parse the CSV, generate an mbox-like file / line

Properties as headers

Description as body

for f in *.txt; do boobtracker -d post $account < $f; done

Profit!

(12)

Converting forum posts to slides

boobmsg -q -b phpbb formatter json ';' export_thread 36.1681 > talks.json

Some python to generate html slide templates:

python gen-desc.py talks.json

Convert them to PDF: lowriter talks.html

(13)

Forum posts to slides

(14)

Forum posts to slides

(15)

References

http://weboob.org/

http://git.symlink.me/?p=weboob/devel.git

http://people.symlink.me/~rom1/blog/weboob/

(16)

Conclusion

There are other ways to browse the web

WebOOB puts it in a (nut)shell.

Scraping can be fragile (depends on HTML)

But sometimes it's the only solution

And is saves a lot of time!

Références

Documents relatifs

The Tor Browser [62] blocks by default APIs like Canvas or WebGL and the Brave browser [34] provides a built-in fingerprinting protection [46] against techniques like Canvas, WebGL,

The Clinical Informatics team at the Medical College of Wisconsin has developed ClinMiner, a clinical research portal for clinical and diagnostic information on patients in

That said, not only the respective tools and input data need to be published, but also information on the installation and invocation are required to replay experiments..

Studies of how users search, manage, and refind information on the Web often involve presenting tasks to users and observing their behaviors (e.g. web pages visited, links

The first one provides a view of a Twitter 4 account (last Tweets of the considered account and the last public Tweets). The second one, an Event Visualiser,

by data aggregation, a supermarket can infer if a customer is pregnant, and estimate her due date (Forbes.com, 2012)... web identities for different domains are not related. The

This schedule-like layout provides the video previews with temporal context, supporting situations where users might prefer finding a video by navigating the program over a key-

In this paper, we take the offloading concept and apply it in a distributed streaming infrastructure [4] where clients and the cloud are tightly coupled to form stream