overhauled the project to get away from files (a little)

This commit is contained in:
B.J. Dweck 2023-09-11 19:09:18 +03:00
parent a13f92548c
commit 2bb66f037a
11 changed files with 231 additions and 169 deletions

1
.gitignore vendored
View File

@ -2,3 +2,4 @@ venv/
*.pyc
__pycache__/
.env

27
.vscode/launch.json vendored Normal file
View File

@ -0,0 +1,27 @@
{
// Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes.
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
"version": "0.2.0",
"configurations": [
{
"name": "Python: Flask",
"type": "python",
"request": "launch",
"module": "flask",
"envFile": "${workspaceFolder}/.env",
"env": {
"FLASK_APP": "server.py",
"FLASK_ENV": "development",
"FLASK_DEBUG": "0"
},
"args": [
"run",
"--no-debugger",
"--no-reload"
],
"jinja": true,
"justMyCode": true
}
]
}

View File

@ -1,26 +1,26 @@
# csv2ankicards
# json2ankicards
A comprehensive toolkit that offers:
- Conversion of CSV files into Anki deck packages (.apkg files).
- Conversion of JSON files into Anki deck packages (.apkg files).
- Conversion of image files in a directory to a text file using Optical Character Recognition (OCR).
- Generation of CSV format question-answer pairs from textual content using OpenAI's GPT-3 model.
- Generation of JSON format question-answer pairs from textual content using OpenAI's GPT-3 model.
- **RESTful API endpoint to upload and convert multiple images directly into an Anki deck package.**
## Features
- Converts a CSV file with questions and answers into an Anki deck package.
- Converts a JSON file with questions and answers into an Anki deck package.
- Converts image files from a specified directory to a single text file using OCR.
- Generates CSV formatted question-answer pairs based on a given text content, ideal for studying or summarization.
- For CSV: there are only two columns in the CSV file, separated by the first comma encountered.
- CSV files should have a "Front" column for questions and a "Back" column for answers.
- Generates JSON formatted question-answer pairs based on a given text content, ideal for studying or summarization.
- For JSON: there are only two columns in the JSON file, separated by the first comma encountered.
- JSON files should have a "Front" column for questions and a "Back" column for answers.
- **API endpoint that accepts multiple image uploads, processes them through the pipeline, and returns an Anki deck package.**
## Installation
1. Clone this repository:
```bash
git clone https://git.rudefox.io/bj/anki-csv2ankicards.git
cd csv2ankicards
git clone https://git.rudefox.io/bj/anki-json2ankicards.git
cd json2ankicards
```
2. Set up a virtual environment and activate it:
@ -36,7 +36,7 @@ A comprehensive toolkit that offers:
## Configuration
Before using the `text2csvdeck.py` script, ensure that you have set the `OPENAI_API_KEY` environment variable:
Before using the `text2jsondeck.py` script, ensure that you have set the `OPENAI_API_KEY` environment variable:
```bash
export OPENAI_API_KEY=your_openai_api_key_here
@ -75,7 +75,7 @@ To convert a directory of images directly to an Anki deck package:
python pipeline.py /path/to/your/image_directory/
```
This will process the images, extract text, convert text to a set of questions and answers in CSV format, and then produce an `output.apkg` file ready for import into Anki.
This will process the images, extract text, convert text to a set of questions and answers in JSON format, and then produce an `output.apkg` file ready for import into Anki.
### Image to Text Conversion
@ -91,31 +91,31 @@ This will produce a `final.txt` file which contains the text extracted from the
Currently supported formats for the images are: `.png`, `.jpg`, and `.jpeg`.
### Text to CSV Deck Generation
### Text to JSON Deck Generation
To generate a CSV deck of question-answer pairs from a given text file:
To generate a JSON deck of question-answer pairs from a given text file:
```bash
python text2csvdeck.py /path/to/your/textfile.txt
python text2jsondeck.py /path/to/your/textfile.txt
```
This will analyze the content of the given text file and generate a corresponding `_deck.csv` file with questions and answers that capture the main points and themes of the text.
This will analyze the content of the given text file and generate a corresponding `_deck.json` file with questions and answers that capture the main points and themes of the text.
**Note:** This script uses the OpenAI GPT-3 model. Ensure you have the necessary API key and OpenAI Python client installed.
### CSV to Anki Conversion
### JSON to Anki Conversion
To convert a CSV file into an Anki deck package:
To convert a JSON file into an Anki deck package:
```bash
python csv2ankicards.py /path/to/your/csvfile.csv output.apkg
python json2ankicards.py /path/to/your/jsonfile.json output.apkg
```
This will produce an `output.apkg` file which can then be imported into Anki.
#### CSV Format
#### JSON Format
The CSV file should follow this format:
The JSON file should follow this format:
```
Front,Back

21
ankiai.py Normal file
View File

@ -0,0 +1,21 @@
import sys
from images2text import main as ocr_images
from prompt4cards import prompt_for_card_content, response_to_json
from json2deck import to_package
def images_to_package(directory_path, outfile):
ocr_text = ocr_images(directory_path)
response_text = prompt_for_card_content(ocr_text)
deck_json = response_to_json(response_text)
to_package(deck_json).write_to_file(outfile)
print(f"Deck created at: {outfile}")
if __name__ == "__main__":
if len(sys.argv) != 2:
print("Usage: python ankiai.py <directory_path_containing_images>")
sys.exit(1)
images_to_package(sys.argv[1])

View File

@ -1,49 +0,0 @@
import csv
import genanki
import sys
# Create a new model for our cards. This is necessary for genanki.
MY_MODEL = genanki.Model(
1607392319,
"Simple Model",
fields=[
{"name": "Question"},
{"name": "Answer"},
],
templates=[
{
"name": "Card 1",
"qfmt": "{{Question}}",
"afmt": "{{FrontSide}}<hr id='answer'>{{Answer}}",
},
])
def csv_to_anki(csv_path, output_path):
with open(csv_path, 'r', encoding='utf-8') as f:
reader = csv.reader(f)
# Skipping the header row
next(reader, None)
my_deck = genanki.Deck(2059400110, "CSV Deck")
for row in reader:
# Use row directly without splitting
question = row[0]
answer = ",".join(row[1:])
note = genanki.Note(
model=MY_MODEL,
fields=[question, answer]
)
my_deck.add_note(note)
genanki.Package(my_deck).write_to_file(output_path)
if __name__ == "__main__":
if len(sys.argv) != 3:
print("Usage: python convert.py <input_csv> <output_apkg>")
sys.exit(1)
input_csv = sys.argv[1]
output_apkg = sys.argv[2]
csv_to_anki(input_csv, output_apkg)
print(f"Deck created at: {output_apkg}")

View File

@ -80,7 +80,7 @@ def main(directory_path):
f.write("\n".join(final_text))
print(f"All images processed! Final output saved to {FINAL_OUTPUT}")
return FINAL_OUTPUT # Add this line
return final_text # Add this line
if __name__ == "__main__":

55
json2deck.py Normal file
View File

@ -0,0 +1,55 @@
import json
import genanki
import sys
# Create a new model for our cards. This is necessary for genanki.
MY_MODEL = genanki.Model(
1607372319,
"Simple Model",
fields=[
{"name": "Title"},
{"name": "Question"},
{"name": "Answer"},
],
templates=[
{
"name": "{{Title}}",
"qfmt": "{{Question}}",
"afmt": "{{FrontSide}}<hr id='answer'>{{Answer}}",
},
])
def json_file_to_package(json_path):
with open(json_path, 'r', encoding='utf-8') as f:
json_data = json.load(f)
package = to_package(json_data)
return package
def to_package(deck_json):
deck_title = deck_json["DeckTitle"]
deck = genanki.Deck(1607372319, deck_title)
for card_json in deck_json["Cards"]:
title = card_json["Title"]
question = card_json["Question"]
answer = card_json["Answer"]
note = genanki.Note(
model=MY_MODEL,
fields=[title, question, answer]
)
deck.add_note(note)
return genanki.Package(deck)
if __name__ == "__main__":
if len(sys.argv) != 3:
print("Usage: python convert.py <input_json> <output_apkg>")
sys.exit(1)
input_json = sys.argv[1]
output_apkg = sys.argv[2]
json_file_to_package(input_json).write_to_file(output_apkg)
print(f"Deck created at: {output_apkg}")

View File

@ -1,27 +0,0 @@
import sys
import os
from images2text import main as images_to_text
from text2csvdeck import text_file_to_csv_deck
CSV_DECK_NAME = "output_deck.csv"
APKG_NAME = "output.apkg"
def pipeline(directory_path):
# 1. Convert images in the directory to a text file
text_file_name = images_to_text(directory_path)
# 2. Convert the text file to a CSV deck using ChatGPT
text_file_to_csv_deck(text_file_name)
# 3. Convert the CSV deck to an Anki package
os.system(f"python csv2ankicards.py {CSV_DECK_NAME} {APKG_NAME}")
if __name__ == "__main__":
if len(sys.argv) != 2:
print("Usage: python pipeline.py <directory_path_containing_images>")
sys.exit(1)
pipeline(sys.argv[1])

104
prompt4cards.py Normal file
View File

@ -0,0 +1,104 @@
import openai
import sys
import os
import json
CHAT_MODEL = "gpt-3.5-turbo"
OUTPUT_FILENAME = "output_deck.json"
API_KEY = os.environ.get("OPENAI_API_KEY")
if not API_KEY:
raise ValueError("Please set the OPENAI_API_KEY environment variable.")
openai.api_key = API_KEY
# Given prompt template
PROMPT_TEMPLATE = """
Please come up with a title for the deck and a set of 10 index cards for memorization,
including a title, front, and back for each card. The index cards should completely
capture the main points and themes of the text. In addition, they should contain any
numbers or data that humans might find difficult to remember. The goal of the index
card set is that one who memorizes it can provide a summary of the text to someone
else, conveying the main points and themes.
You will provide the deck title, and the titles, questions, and answers for each card
in a structured format as follows:
```
Deck Title: Title of the Deck
Cards:
- Title: Card Title 1
Front: What is the capital of New York?
Back: Albany
- Title: Card Title 2
Front: Where in the world is Carmen San Diego?
Back: Nobody knows
```
{content}
"""
def prompt_for_card_content(text_content):
# Prepare the prompt
prompt = PROMPT_TEMPLATE.format(content=text_content)
# Get completion from the OpenAI ChatGPT API
response = openai.ChatCompletion.create(
model=CHAT_MODEL,
messages=[
{"role": "user", "content": prompt}
],
temperature=0,
)
# Extract content from response and save to a new file
return response.choices[0]['message']['content']
def response_to_json(response_text):
lines = [line.strip() for line in response_text.split("\n") if line.strip()]
deck_title = None
cards = []
current_card = {}
for line in lines:
if "Deck Title:" in line and not deck_title:
deck_title = line.split("Deck Title:", 1)[1].strip()
elif "Title:" in line:
if current_card: # If there's a card being processed, add it to cards
cards.append(current_card)
current_card = {}
current_card["Title"] = line.split("Title:", 1)[1].strip()
elif "Front:" in line:
current_card["Question"] = line.split("Front:", 1)[1].strip()
elif "Back:" in line:
current_card["Answer"] = line.split("Back:", 1)[1].strip()
if current_card: # Add the last card if it exists
cards.append(current_card)
return {
"DeckTitle": deck_title,
"Cards": cards
}
if __name__ == "__main__":
if len(sys.argv) != 2:
print("Usage: python text2jsondeck.py <text_file_path>")
sys.exit(1)
text_file_path = sys.argv[1]
# Read the text content
with open(text_file_path, 'r') as file:
text_content = file.read()
response_text = prompt_for_card_content(text_content)
deck_json = response_to_json(response_text)
with open(OUTPUT_FILENAME, 'w') as json_file:
json.dump(deck_json, json_file)
print(f"Saved generated deck to {OUTPUT_FILENAME}")

View File

@ -4,7 +4,7 @@ import os
import tempfile
import shutil
from pipeline import pipeline
from ankiai import images_to_package
app = Flask(__name__)
@ -35,7 +35,7 @@ def deck_from_images():
save_uploaded_images(images, TEMP_DIR)
try:
pipeline(TEMP_DIR)
images_to_package(TEMP_DIR)
return send_from_directory('.', OUTPUT_FILE, as_attachment=True)
except Exception as e: # Consider catching more specific exceptions
return jsonify({'error': str(e)}), 500

View File

@ -1,70 +0,0 @@
import openai
import sys
import os
CHAT_MODEL = "gpt-3.5-turbo"
OUTPUT_FILENAME = "output_deck.csv"
API_KEY = os.environ.get("OPENAI_API_KEY")
if not API_KEY:
raise ValueError("Please set the OPENAI_API_KEY environment variable.")
openai.api_key = API_KEY
# Given prompt template
PROMPT_TEMPLATE = """
Please come up with a set of 10 index cards for memorization, including front and back.
The index cards should completely capture the main points and themes of the text.
In addition, they should contain any numbers or data that humans might find difficult to remember.
The goal of the index card set is that one who memorizes it can provide a summary of the text to someone else, conveying the main points and themes.
You will provide the questions and answers to me in CSV format, as follows:
```
Front,Back
What is the capital of New York?,Albany
Where in the world is Carmen San Diego?,Nobody knows
```
The question/answer pairs shall not be numbered or contain any signs of being ordered.
{content}
"""
def text_file_to_csv_deck(text_file_path):
# Read the text content
with open(text_file_path, 'r') as file:
text_content = file.read()
content_to_csv(text_content)
def content_to_csv(text_content):
# Prepare the prompt
prompt = PROMPT_TEMPLATE.format(content=text_content)
# Get completion from the OpenAI ChatGPT API
response = openai.ChatCompletion.create(
model=CHAT_MODEL,
messages=[
{"role": "user", "content": prompt}
],
temperature=0,
)
# Extract CSV content from response and save to a new file
csv_content = response.choices[0]['message']['content']
with open(OUTPUT_FILENAME, 'w') as csv_file:
csv_file.write(csv_content)
print(f"Saved generated deck to {OUTPUT_FILENAME}")
if __name__ == "__main__":
if len(sys.argv) != 2:
print("Usage: python text2csvdeck.py <text_file_path>")
sys.exit(1)
text_file_to_csv_deck(sys.argv[1])