Python Forum
Partitioning when splitting data into train and test-dataset
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Partitioning when splitting data into train and test-dataset
#1
[Image: mDDYhdn]
In this image you can see a simplified example from how my dataset looks like.

My goal is to create a text-classifier which can be used to predict whether a paragraph from a document has one or more labels. (Multi-label classification) but my very first step is to split the data into train and test-data. The CSV-file with the data contains many paragraphs from multiple documents.
The issue is that I need to make the split on document level to make sure that there are no paragraphs from one document in the train-set and other paragraphs from thesame document in the test-set.

I know how sklearn's train_test_split() works but doing this and also making sure that the documents from the train-set are not present in the test-set is something where i've already done research on but still have no clue about it :/.

Could anyone give me a help in telling me how i can make this happen? I would really appreciate that.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Getting KeyError when trying to train my model. alexanderDennisEnviro500 0 199 May-18-2024, 04:50 AM
Last Post: alexanderDennisEnviro500
  Supervised learning, tree based model - problems splitting data Pixel 0 716 May-16-2023, 05:25 PM
Last Post: Pixel
  How to test likelihood hypothesis on dataset? iiiioooo 0 957 Apr-18-2022, 01:00 PM
Last Post: iiiioooo
  Mann Whitney U-test on several data sets rybina 2 2,186 Jan-05-2021, 03:08 PM
Last Post: rybina
  Using Autoencoder for Data Augmentation of numerical Dataset in Python Marvin93 2 3,461 Jul-10-2020, 07:18 PM
Last Post: Marvin93
  Generate Test data (.csv) using Pandas Ashley 5 3,154 Jun-15-2020, 02:51 PM
Last Post: jefsummers
  Why is my train and test accuracy so low? python420 0 2,116 Dec-08-2019, 08:51 PM
Last Post: python420
  Need help; iris-train Karin 2 2,752 Apr-12-2019, 02:16 AM
Last Post: Karin
  Join Predicted values with test dataset bhuwan 4 10,550 Mar-28-2019, 12:42 AM
Last Post: bhuwan
  Read CSV data into Pandas DataSet From Variable? Oliver 7 14,187 Jul-05-2018, 03:29 AM
Last Post: answerquest

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020