Unicode, Betacode, and the OGL Repos


From the Digital Classics Mailing list:
"Just under the text from Herodotus you can access to its xml by clicking on
view as XML next , and the display of Greek is just fine :


Ἡροδότου Ἁλικαρνησσέος ἱστορίης ἀπόδεξις ἥδε, ὡς μήτε τὰ γενόμενα ἐξ ἀνθρώπων τῷ χρόνῳ ἐξίτηλα γένηται, μήτε ἔργα μεγάλα τε καὶ θωμαστά, τὰ μὲν Ἕλλησι τὰ δὲ βαρβάροισι ἀποδεχθέντα, ἀκλεᾶ γένηται, τά τε ἄλλα καὶ δι᾽ ἣν αἰτίην ἐπολέμησαν ἀλλήλοισι.

Then you can cut and paste to Oxygen, and there is no problem with the display of Greek.

But if you need as I do to work on the whole text and not only on a section, you can download it under :

“An XML version of this text is available for download, with the additional restriction that you offer Perseus any modifications you make. Perseus provides credit for all accepted changes, storing new additions in a versioning system.”

My problem is, when I open the file downloaded with Oxygen the text is in latin characters :

milestone unit=“para” />*(hrodo/tou *(alikarnhsse/os i(stori/hs a)po/decis h(/de, w(s mh/te ta\ geno/mena e)c a)nqrw/pwn tw=| xro/nw| e)ci/thla ge/nhtai, mh/te e)/rga mega/la te kai\ qwmasta/, ta\ me\n *(/ellhsi ta\ de\ barba/roisi a)podexqe/nta, a)klea= ge/nhtai, ta/ te a)/lla kai\ di’ h(\n ai)ti/hn e)pole/mhsan a)llh/loisi.


First answer:
"The link to the whole Herodotus seems to still point to an older betacode version of Herodotus, while the shorter chunk has the unicode version for some reason. By far the best way to retrieve a full XML version of the text of Perseus and the Open Greek and Latin Project is to go to one of their GitHub repositories.

Perseus Greek: https://github.com/PerseusDL/canonical-greekLit
OGL First1kGreek: http://opengreekandlatin.github.io/First1KGreek/

The text you are looking for is here: https://raw.githubusercontent.com/PerseusDL/canonical-greekLit/master/data/tlg0016/tlg001/tlg0016.tlg001.perseus-grc2.xml "


Question clarified:
“The thing is, Herodotus was just the example taken by Maurizio : I need Aristophanes’ comedies, and they don’t seem to be in the GitHub repository of OGL. On Perseids, it seems that there is only Frogs. So I guess I have to find a way to upgrade the betacode version of Aristophanes into unicode?”


Strategies for finding and retrieving workgroups in PerseusDL Canonical:
"Here the PerseusDL repository helps. If you look for an author you can look up their identifier in the Perseus Catalog and when you type in Aristophanes in the search field, you will see that you are looking for the workgroup tlg0019: http://catalog.perseus.org/?utf8=✓&search_field=all_fields&q=aristophanes. This workgroup you can find by following the GitHub folder structure from here https://github.com/PerseusDL/canonical-greekLit click data (https://github.com/PerseusDL/canonical-greekLit/tree/master/data), then click tlg0019: https://github.com/PerseusDL/canonical-greekLit/tree/master/data/tlg0019

There are 11 subfolders with the comedies of Aristophanes in unicode."