overhauled the project to get away from files (a little)

2023-09-11 19:09:18 +03:00 · 2023-09-11 19:09:18 +03:00 · 2bb66f037a
commit 2bb66f037a
parent a13f92548c
11 changed files with 231 additions and 169 deletions
--- a/.gitignore
+++ b/.gitignore
@ -2,3 +2,4 @@ venv/
 *.pyc
 __pycache__/
 .env
--- a/.vscode/launch.json
+++ b/.vscode/launch.json
@ -0,0 +1,27 @@
 {
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Python: Flask",
            "type": "python",
            "request": "launch",
            "module": "flask",
            "envFile": "${workspaceFolder}/.env",
            "env": {
                "FLASK_APP": "server.py",
                "FLASK_ENV": "development",
                "FLASK_DEBUG": "0"
            },
            "args": [
                "run",
                "--no-debugger",
                "--no-reload"
            ],
            "jinja": true,
            "justMyCode": true
        }
    ]
 }
--- a/README.md
+++ b/README.md
@ -1,26 +1,26 @@
-# csv2ankicards
+# json2ankicards
 A comprehensive toolkit that offers:
- Conversion of CSV files into Anki deck packages (.apkg files).
+- Conversion of JSON files into Anki deck packages (.apkg files).
 - Conversion of image files in a directory to a text file using Optical Character Recognition (OCR).
- Generation of CSV format question-answer pairs from textual content using OpenAI's GPT-3 model.
+- Generation of JSON format question-answer pairs from textual content using OpenAI's GPT-3 model.
 - **RESTful API endpoint to upload and convert multiple images directly into an Anki deck package.**
 ## Features
- Converts a CSV file with questions and answers into an Anki deck package.
+- Converts a JSON file with questions and answers into an Anki deck package.
 - Converts image files from a specified directory to a single text file using OCR.
- Generates CSV formatted question-answer pairs based on a given text content, ideal for studying or summarization.
+- Generates JSON formatted question-answer pairs based on a given text content, ideal for studying or summarization.
- For CSV: there are only two columns in the CSV file, separated by the first comma encountered.
+- For JSON: there are only two columns in the JSON file, separated by the first comma encountered.
- CSV files should have a "Front" column for questions and a "Back" column for answers.
+- JSON files should have a "Front" column for questions and a "Back" column for answers.
 - **API endpoint that accepts multiple image uploads, processes them through the pipeline, and returns an Anki deck package.**
 ## Installation
 1. Clone this repository:
   ```bash
-   git clone https://git.rudefox.io/bj/anki-csv2ankicards.git
+   git clone https://git.rudefox.io/bj/anki-json2ankicards.git
-   cd csv2ankicards
+   cd json2ankicards
   ```
 2. Set up a virtual environment and activate it:
@ -36,7 +36,7 @@ A comprehensive toolkit that offers:
 ## Configuration
-Before using the `text2csvdeck.py` script, ensure that you have set the `OPENAI_API_KEY` environment variable:
+Before using the `text2jsondeck.py` script, ensure that you have set the `OPENAI_API_KEY` environment variable:
 ```bash
 export OPENAI_API_KEY=your_openai_api_key_here
@ -75,7 +75,7 @@ To convert a directory of images directly to an Anki deck package:
 python pipeline.py /path/to/your/image_directory/
 ```
-This will process the images, extract text, convert text to a set of questions and answers in CSV format, and then produce an `output.apkg` file ready for import into Anki.
+This will process the images, extract text, convert text to a set of questions and answers in JSON format, and then produce an `output.apkg` file ready for import into Anki.
 ### Image to Text Conversion
@ -91,31 +91,31 @@ This will produce a `final.txt` file which contains the text extracted from the
 Currently supported formats for the images are: `.png`, `.jpg`, and `.jpeg`.
-### Text to CSV Deck Generation
+### Text to JSON Deck Generation
-To generate a CSV deck of question-answer pairs from a given text file:
+To generate a JSON deck of question-answer pairs from a given text file:
 ```bash
-python text2csvdeck.py /path/to/your/textfile.txt
+python text2jsondeck.py /path/to/your/textfile.txt
 ```
-This will analyze the content of the given text file and generate a corresponding `_deck.csv` file with questions and answers that capture the main points and themes of the text.
+This will analyze the content of the given text file and generate a corresponding `_deck.json` file with questions and answers that capture the main points and themes of the text.
 **Note:** This script uses the OpenAI GPT-3 model. Ensure you have the necessary API key and OpenAI Python client installed.
-### CSV to Anki Conversion
+### JSON to Anki Conversion
-To convert a CSV file into an Anki deck package:
+To convert a JSON file into an Anki deck package:
 ```bash
-python csv2ankicards.py /path/to/your/csvfile.csv output.apkg
+python json2ankicards.py /path/to/your/jsonfile.json output.apkg
 ```
 This will produce an `output.apkg` file which can then be imported into Anki.
-#### CSV Format
+#### JSON Format
-The CSV file should follow this format:
+The JSON file should follow this format:
 ```
 Front,Back
--- a/ankiai.py
+++ b/ankiai.py
@ -0,0 +1,21 @@
 import sys
 from images2text import main as ocr_images
 from prompt4cards import prompt_for_card_content, response_to_json
 from json2deck import to_package
 def images_to_package(directory_path, outfile):
    ocr_text = ocr_images(directory_path)
    response_text = prompt_for_card_content(ocr_text)
    deck_json = response_to_json(response_text)
    to_package(deck_json).write_to_file(outfile)
    print(f"Deck created at: {outfile}")
 if __name__ == "__main__":
    if len(sys.argv) != 2:
        print("Usage: python ankiai.py <directory_path_containing_images>")
        sys.exit(1)
    images_to_package(sys.argv[1])
--- a/csv2ankicards.py
+++ b/csv2ankicards.py
@ -1,49 +0,0 @@
 import csv
 import genanki
 import sys
 # Create a new model for our cards. This is necessary for genanki.
 MY_MODEL = genanki.Model(
    1607392319,
    "Simple Model",
    fields=[
        {"name": "Question"},
        {"name": "Answer"},
    ],
    templates=[
        {
            "name": "Card 1",
            "qfmt": "{{Question}}",
            "afmt": "{{FrontSide}}<hr id='answer'>{{Answer}}",
        },
    ])
 def csv_to_anki(csv_path, output_path):
    with open(csv_path, 'r', encoding='utf-8') as f:
        reader = csv.reader(f)
        # Skipping the header row
        next(reader, None)
        my_deck = genanki.Deck(2059400110, "CSV Deck")
        for row in reader:
            # Use row directly without splitting
            question = row[0]
            answer = ",".join(row[1:])
            note = genanki.Note(
                model=MY_MODEL,
                fields=[question, answer]
            )
            my_deck.add_note(note)
    genanki.Package(my_deck).write_to_file(output_path)
 if __name__ == "__main__":
    if len(sys.argv) != 3:
        print("Usage: python convert.py <input_csv> <output_apkg>")
        sys.exit(1)
    input_csv = sys.argv[1]
    output_apkg = sys.argv[2]
    csv_to_anki(input_csv, output_apkg)
    print(f"Deck created at: {output_apkg}")
--- a/images2text.py
+++ b/images2text.py
@ -80,7 +80,7 @@ def main(directory_path):
        f.write("\n".join(final_text))
    print(f"All images processed! Final output saved to {FINAL_OUTPUT}")
-    return FINAL_OUTPUT  # Add this line
+    return final_text  # Add this line
 if __name__ == "__main__":
--- a/json2deck.py
+++ b/json2deck.py
@ -0,0 +1,55 @@
 import json
 import genanki
 import sys
 # Create a new model for our cards. This is necessary for genanki.
 MY_MODEL = genanki.Model(
    1607372319,
    "Simple Model",
    fields=[
        {"name": "Title"},
        {"name": "Question"},
        {"name": "Answer"},
    ],
    templates=[
        {
            "name": "{{Title}}",
            "qfmt": "{{Question}}",
            "afmt": "{{FrontSide}}<hr id='answer'>{{Answer}}",
        },
    ])
 def json_file_to_package(json_path):
    with open(json_path, 'r', encoding='utf-8') as f:
        json_data = json.load(f)
        package = to_package(json_data)
    return package
 def to_package(deck_json):
    deck_title = deck_json["DeckTitle"]
    deck = genanki.Deck(1607372319, deck_title)
    for card_json in deck_json["Cards"]:
        title = card_json["Title"]
        question = card_json["Question"]
        answer = card_json["Answer"]
        note = genanki.Note(
                model=MY_MODEL,
                fields=[title, question, answer]
            )
        deck.add_note(note)
    return genanki.Package(deck)
 if __name__ == "__main__":
    if len(sys.argv) != 3:
        print("Usage: python convert.py <input_json> <output_apkg>")
        sys.exit(1)
    input_json = sys.argv[1]
    output_apkg = sys.argv[2]
    json_file_to_package(input_json).write_to_file(output_apkg)
    print(f"Deck created at: {output_apkg}")
--- a/pipeline.py
+++ b/pipeline.py
@ -1,27 +0,0 @@
 import sys
 import os
 from images2text import main as images_to_text
 from text2csvdeck import text_file_to_csv_deck
 CSV_DECK_NAME = "output_deck.csv"
 APKG_NAME = "output.apkg"
 def pipeline(directory_path):
    # 1. Convert images in the directory to a text file
    text_file_name = images_to_text(directory_path)
    # 2. Convert the text file to a CSV deck using ChatGPT
    text_file_to_csv_deck(text_file_name)
    # 3. Convert the CSV deck to an Anki package
    os.system(f"python csv2ankicards.py {CSV_DECK_NAME} {APKG_NAME}")
 if __name__ == "__main__":
    if len(sys.argv) != 2:
        print("Usage: python pipeline.py <directory_path_containing_images>")
        sys.exit(1)
    pipeline(sys.argv[1])
--- a/prompt4cards.py
+++ b/prompt4cards.py
@ -0,0 +1,104 @@
 import openai
 import sys
 import os
 import json
 CHAT_MODEL = "gpt-3.5-turbo"
 OUTPUT_FILENAME = "output_deck.json"
 API_KEY = os.environ.get("OPENAI_API_KEY")
 if not API_KEY:
    raise ValueError("Please set the OPENAI_API_KEY environment variable.")
 openai.api_key = API_KEY
 # Given prompt template
 PROMPT_TEMPLATE = """
 Please come up with a title for the deck and a set of 10 index cards for memorization, 
 including a title, front, and back for each card. The index cards should completely 
 capture the main points and themes of the text. In addition, they should contain any 
 numbers or data that humans might find difficult to remember. The goal of the index 
 card set is that one who memorizes it can provide a summary of the text to someone 
 else, conveying the main points and themes.
 You will provide the deck title, and the titles, questions, and answers for each card 
 in a structured format as follows:
 ```
 Deck Title: Title of the Deck
 Cards:
 - Title: Card Title 1
  Front: What is the capital of New York?
  Back: Albany
 - Title: Card Title 2
  Front: Where in the world is Carmen San Diego?
  Back: Nobody knows
 ```
 {content}
 """
 def prompt_for_card_content(text_content):
    # Prepare the prompt
    prompt = PROMPT_TEMPLATE.format(content=text_content)
    # Get completion from the OpenAI ChatGPT API
    response = openai.ChatCompletion.create(
      model=CHAT_MODEL,
      messages=[
          {"role": "user", "content": prompt}
      ],
      temperature=0,
    )
    # Extract content from response and save to a new file
    return response.choices[0]['message']['content']
 def response_to_json(response_text):
    lines = [line.strip() for line in response_text.split("\n") if line.strip()]
    deck_title = None
    cards = []
    current_card = {}
    for line in lines:
        if "Deck Title:" in line and not deck_title:
            deck_title = line.split("Deck Title:", 1)[1].strip()
        elif "Title:" in line:
            if current_card:  # If there's a card being processed, add it to cards
                cards.append(current_card)
                current_card = {}
            current_card["Title"] = line.split("Title:", 1)[1].strip()
        elif "Front:" in line:
            current_card["Question"] = line.split("Front:", 1)[1].strip()
        elif "Back:" in line:
            current_card["Answer"] = line.split("Back:", 1)[1].strip()
    if current_card:  # Add the last card if it exists
        cards.append(current_card)
    return {
        "DeckTitle": deck_title,
        "Cards": cards
    }
 if __name__ == "__main__":
    if len(sys.argv) != 2:
        print("Usage: python text2jsondeck.py <text_file_path>")
        sys.exit(1)
    text_file_path = sys.argv[1]
    # Read the text content
    with open(text_file_path, 'r') as file:
        text_content = file.read()
    response_text = prompt_for_card_content(text_content)
    deck_json = response_to_json(response_text)
    with open(OUTPUT_FILENAME, 'w') as json_file:
        json.dump(deck_json, json_file)
    print(f"Saved generated deck to {OUTPUT_FILENAME}")
--- a/server.py
+++ b/server.py
@ -4,7 +4,7 @@ import os
 import tempfile  
 import shutil    
-from pipeline import pipeline
+from ankiai import images_to_package
 app = Flask(__name__)
@ -35,7 +35,7 @@ def deck_from_images():
    save_uploaded_images(images, TEMP_DIR)
    try:
-        pipeline(TEMP_DIR)
+        images_to_package(TEMP_DIR)
        return send_from_directory('.', OUTPUT_FILE, as_attachment=True)
    except Exception as e:  # Consider catching more specific exceptions
        return jsonify({'error': str(e)}), 500
--- a/text2csvdeck.py
+++ b/text2csvdeck.py
@ -1,70 +0,0 @@
 import openai
 import sys
 import os
 CHAT_MODEL = "gpt-3.5-turbo"
 OUTPUT_FILENAME = "output_deck.csv"
 API_KEY = os.environ.get("OPENAI_API_KEY")
 if not API_KEY:
    raise ValueError("Please set the OPENAI_API_KEY environment variable.")
 openai.api_key = API_KEY
 # Given prompt template
 PROMPT_TEMPLATE = """
 Please come up with a set of 10 index cards for memorization, including front and back. 
 The index cards should completely capture the main points and themes of the text. 
 In addition, they should contain any numbers or data that humans might find difficult to remember. 
 The goal of the index card set is that one who memorizes it can provide a summary of the text to someone else, conveying the main points and themes.
 You will provide the questions and answers to me in CSV format, as follows:
 ```
 Front,Back
 What is the capital of New York?,Albany
 Where in the world is Carmen San Diego?,Nobody knows
 ```
 The question/answer pairs shall not be numbered or contain any signs of being ordered.
 {content}
 """
 def text_file_to_csv_deck(text_file_path):
    # Read the text content
    with open(text_file_path, 'r') as file:
        text_content = file.read()
    content_to_csv(text_content)
 def content_to_csv(text_content):
    # Prepare the prompt
    prompt = PROMPT_TEMPLATE.format(content=text_content)
    # Get completion from the OpenAI ChatGPT API
    response = openai.ChatCompletion.create(
      model=CHAT_MODEL,
      messages=[
          {"role": "user", "content": prompt}
      ],
      temperature=0,
    )
    # Extract CSV content from response and save to a new file
    csv_content = response.choices[0]['message']['content']
    with open(OUTPUT_FILENAME, 'w') as csv_file:
        csv_file.write(csv_content)
    print(f"Saved generated deck to {OUTPUT_FILENAME}")
 if __name__ == "__main__":
    if len(sys.argv) != 2:
        print("Usage: python text2csvdeck.py <text_file_path>")
        sys.exit(1)
    text_file_to_csv_deck(sys.argv[1])