NextIT: Webscraper API

/GUMainWebsite/Academics/Colleges-and-Schools/School-of-Engineering-and-Applied-Science/Center-for-Engineering-Design/11Projects/img/cs1.jpgcs1.jpg

CPSC 3 Team: Colton Lammers, Andrew Purpura, Ryan Naccarato
Advisor: Dr. Christopher Smith
Liaison: Joe Dumoulin

At the request of Next IT Corp, we built a flexible web scraping application programming interface (API) capable of handling a variety of situations and tasks. Our API focuses on accuracy, capable of rendering pages in order to deal with the dynamic environments of most websites while also being easy to use. It will visit every hyperlink that a user chooses to visit and gather any information needed during the scraping process.

The end result produces files of the site in a directory tree that mirrors the structure of the site. The scraper was built to be flexible because the tool could be used for a large variety of tasks such as helping to automate manual content analysis during the creation of artificial intelligence agents, grabbing large quantities of natural language from websites or forums, and possibly incorporated with ongoing natural language research within Next IT.

Click to visit SEAS Homepage/GUMainWebsite/Academics/Colleges-and-Schools/School-of-Engineering-and-Applied-Science/inc/logologocopyright
/GUMainWebsite/Academics/Colleges-and-Schools/School-of-Engineering-and-Applied-Science/inc/quicklinksquicklinkscopyright
SCHOOL OF ENGINEERING & APPLIED SCIENCE
502 E. Boone Avenue
Spokane, WA 99258-0026
Phone: (509) 313-3523
Fax: (509) 313-5871
Email: seas@gonzaga.edu
/GUMainWebsite/Academics/Colleges-and-Schools/School-of-Engineering-and-Applied-Science/inc/contactboxcontactboxcopyright
/GUMainWebsite/Academics/Colleges-and-Schools/School-of-Engineering-and-Applied-Science/inc/menubarmenubarcopyright/