Compare commits

..

2 Commits

2 changed files with 24 additions and 17 deletions

View File

@ -15,28 +15,31 @@ if not API_KEY:
openai.api_key = API_KEY
# Given prompt template
PROMPT_TEMPLATE = """
Please come up with a title for the deck and a set of 10 index cards for memorization,
including a title, front, and back for each card. The index cards should completely
capture the main points and themes of the text. In addition, they should contain any
numbers or data that humans might find difficult to remember. The goal of the index
card set is that one who memorizes it can provide a summary of the text to someone
else, conveying the main points and themes.
Please craft a title for the deck along with a set of index cards that succinctly capture the main ideas and themes of the provided text. Ensure the following:
You will provide the deck title, and the titles, questions, and answers for each card
in a structured format as follows:
1. Every card should have a title, a question on the front, and an answer on the back.
2. Each answer must contain at least one concrete fact not evident from its question.
3. Include any numbers, data, or details that may be challenging to remember.
4. The aim is for a person who learns this set to effectively convey the text's main themes and specifics to another individual.
5. You should craft at least one index card for every 3-5 sentences of content, depending on how much information is packed in the content.
6. Each index card should focus on answering a single question
7. Each index card answer shall be no longer than 3 sentences.
Structure your response as:
```
Deck Title: Title of the Deck
Deck Title: [Title of the Deck]
Cards:
- Title: Card Title 1
Front: What is the capital of New York?
Back: Albany
- Title: Card Title 2
Front: Where in the world is Carmen San Diego?
Back: Nobody knows
- Title: [Card Title 1]
Front: [Question 1]
Back: [Answer 1]
- Title: [Card Title 2]
Front: [Question 2]
Back: [Answer 2]
... and so on
```
Content for reference:
{content}
"""

View File

@ -44,7 +44,11 @@ def convert_image(image_path):
def ocr_image(image_path):
logging.info(f"OCR'ing {image_path}...")
text_filename = os.path.basename(image_path).replace(".jpg", ".txt")
base_name = os.path.basename(image_path)
root_name, _ = os.path.splitext(base_name)
text_filename = f"{root_name}.txt"
text_path = os.path.join(CONVERTED_DIR, text_filename)
cmd = ["tesseract", image_path, text_path.replace(".txt", "")]
try: