Extract Text from PDF with Python

Chart Explorers
Chart Explorers
38.5 هزار بار بازدید - 3 سال پیش - In this video we learn
In this video we learn how to extract text from a PDF file with Python using PyPDF2.  We also learn how to convert PDF to a text file. We start off with a simple example of extracting text from a single page. We then extract the text from all the pages in the pdf. After this we use an example of getting text from pages that meet a certain condition (i.e., containing the word Waldo). With this example we learn how to extract text from multiple PDF pages that we specified. Next we write those extracted PDF Pages to a new PDF document. Finally we extract only the sentences that contain Waldo and the pages that those sentences were located on.

This is based on a real project I did for work where I had to extract pertinent information about specific people from thousands of PDFs that contained many pages each.

►►GitHub: https://github.com/bvalgard/working-w...
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
$15 off Annual Dataquest subscription
app.dataquest.io/referral-signup/qybqz3r8/
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

My Number 1 course recommendation for self learners (affiliate link): bit.ly/GoogleAnalyticsProfessionalCertificate

Udemy Recommendations that I have Personally Taken (affiliate links):
►►Learn Statistics http://bit.ly/Statistics4DSUdemyCE
►►Learn Python http://bit.ly/LearnPythonCE
►►Learn SQL http://bit.ly/LearnSQLCE
►►Learn Data Analysis (this goes into advanced concepts - learn up to and including Logistic regression - you don't need this before you start applying for jobs but it can help) http://bit.ly/PythonMLDS_CE
►►Learn Business Intelligence http://bit.ly/LearnBI_CE
►►Learn Time Series Analysis (this is an important skill in SOME jobs, but you don't need this before you start applying for jobs) http://bit.ly/TimeSeries_CE

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

More or my videos You may be interested in
►►Create PDF with Pyhton | Part 1 Create PDF with Python | Part 1

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

Merch: https://bit.ly/PythonAndDataMerch

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

Consider subscribing for weekly tips, tricks, and tutorials. @chartexplorers

Join my Discord Server Discord: discord

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

References
https://realpython.com/creating-modif...

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

0:00 Intro - Where's Waldo
0:36 pip install
0:59 Extract Text
1:20 Step 1
2:09 Step 2
2:58 All Pages to txt
4:20 Where's Waldo Pages
5:51 Write to PDF
6:21 Get Text from Specific Pages
8:15 Waldo Sentences
3 سال پیش در تاریخ 1400/03/26 منتشر شده است.
38,580 بـار بازدید شده
... بیشتر