Gambaran
Ok, ini dia. Ini adalah solusi terprogram dalam bentuk skrip:
#!/bin/bash
# NAME: pdflinkextractor
# AUTHOR: Glutanimate (http://askubuntu.com/users/81372/), 2013
# LICENSE: GNU GPL v2
# DEPENDENCIES: wget lynx
# DESCRIPTION: extracts PDF links from websites and dumps them to the stdout and as a textfile
# only works for links pointing to files with the ".pdf" extension
#
# USAGE: pdflinkextractor "www.website.com"
WEBSITE="$1"
echo "Getting link list..."
lynx -cache=0 -dump -listonly "$WEBSITE" | grep ".*\.pdf$" | awk '{print $2}' | tee pdflinks.txt
# OPTIONAL
#
# DOWNLOAD PDF FILES
#
#echo "Downloading..."
#wget -P pdflinkextractor_files/ -i pdflinks.txt
Instalasi
Anda harus memiliki wget
dan lynx
menginstal:
sudo apt-get install wget lynx
Pemakaian
Script akan mendapatkan daftar semua .pdf
file di situs web dan membuangnya ke output baris perintah dan ke file teks di direktori kerja. Jika Anda mengomentari perintah "opsional" wget
, skrip akan melanjutkan untuk mengunduh semua file ke direktori baru.
Contoh
$ ./pdflinkextractor http://www.pdfscripting.com/public/Free-Sample-PDF-Files-with-scripts.cfm
Getting link list...
http://www.pdfscripting.com/public/FreeStuff/PDFSamples/JSPopupCalendar.pdf
http://www.pdfscripting.com/public/FreeStuff/PDFSamples/ModifySubmit_Example.pdf
http://www.pdfscripting.com/public/FreeStuff/PDFSamples/DynamicEmail_XFAForm_V2.pdf
http://www.pdfscripting.com/public/FreeStuff/PDFSamples/AcquireMenuItemNames.pdf
http://www.pdfscripting.com/public/FreeStuff/PDFSamples/BouncingButton.pdf
http://www.pdfscripting.com/public/FreeStuff/PDFSamples/JavaScriptClock.pdf
http://www.pdfscripting.com/public/FreeStuff/PDFSamples/Matrix2DOperations.pdf
http://www.pdfscripting.com/public/FreeStuff/PDFSamples/RobotArm_3Ddemo2.pdf
http://www.pdfscripting.com/public/FreeStuff/PDFSamples/SimpleFormCalculations.pdf
http://www.pdfscripting.com/public/FreeStuff/PDFSamples/TheFlyv3_EN4Rdr.pdf
http://www.pdfscripting.com/public/FreeStuff/PDFSamples/ImExportAttachSample.pdf
http://www.pdfscripting.com/public/FreeStuff/PDFSamples/AcroForm_BasicToggle.pdf
http://www.pdfscripting.com/public/FreeStuff/PDFSamples/AcroForm_ToggleButton_Sample.pdf
http://www.pdfscripting.com/public/FreeStuff/PDFSamples/AcorXFA_BasicToggle.pdf
http://www.pdfscripting.com/public/FreeStuff/PDFSamples/ConditionalCalcScripts.pdf
Downloading...
--2013-12-24 13:31:25-- http://www.pdfscripting.com/public/FreeStuff/PDFSamples/JSPopupCalendar.pdf
Resolving www.pdfscripting.com (www.pdfscripting.com)... 74.200.211.194
Connecting to www.pdfscripting.com (www.pdfscripting.com)|74.200.211.194|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 176008 (172K) [application/pdf]
Saving to: `/Downloads/pdflinkextractor_files/JSPopupCalendar.pdf'
100%[===========================================================================================================================================================================>] 176.008 120K/s in 1,4s
2013-12-24 13:31:29 (120 KB/s) - `/Downloads/pdflinkextractor_files/JSPopupCalendar.pdf' saved [176008/176008]
...